LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-17-2009, 10:25 PM   #16
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179

Quote:
I dont know...I tested the code without putting the null character and apparently the compiler knows where the string ends without problems at all.
Your test was weak. It essentially worked because of the Magic of Uninitialized Variables(tm). It's black magic that you should avoid at all costs. Here, try this shell script as a test. It will show you the danger of not putting the NUL character (all bits off) at the end of your string.
Code:
#!/bin/bash

cat > 1.c <<EOD
#include <stdio.h>
#include <string.h>

void put_new_25(char *arg_string)
{
  arg_string[ 0]='a';
  arg_string[ 1]='b';
  arg_string[ 2]='c';
  arg_string[ 3]='d';
  arg_string[ 4]='e';
  arg_string[ 5]='f';
  arg_string[ 6]='g';
  arg_string[ 7]='h';
  arg_string[ 8]='i';
  arg_string[ 9]='j';
  arg_string[10]='k';
  arg_string[11]='l';
  arg_string[12]='m';
  arg_string[13]='n';
  arg_string[14]='o';
  arg_string[15]='p';
  arg_string[16]='q';
  arg_string[17]='r';
  arg_string[18]='s';
  arg_string[19]='t';
  arg_string[20]='u';
  arg_string[21]='v';
  arg_string[22]='w';
  arg_string[23]='x';
  arg_string[24]='y';

  /* We won't put a NUL character at the end, although we should! */

} /* put_new_25() */

int main(void)
{
  char the_string[100];

  strcpy(the_string,"ABCDEFGHIJKLMNOPQRST");
  printf("length is %d, data is %s\n",strlen(the_string),the_string);
  put_new_25(the_string);
  printf("length is %d, data is %s\n",strlen(the_string),the_string);
  strcpy(the_string,"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx");
  printf("length is %d, data is %s\n",strlen(the_string),the_string);
  put_new_25(the_string);
  printf("length is %d, data is %s\n",strlen(the_string),the_string);

  return 0;
}
EOD
gcc -Wall 1.c -o 1
./1
The trick is to see how well function put_new_25() does its job, since it doesn't place a NUL character at the end of the string. It puts the first 25 letters of the alphabet into that array. When you see them, you should hope that that's all that is in the string.

The first time, it works ok, probably because the character array started out with all bits off (and you should never rely on that).

The second time, not so much. Here is my output, and yours would probably be the same:
Code:
length is 20, data is ABCDEFGHIJKLMNOPQRST
length is 25, data is abcdefghijklmnopqrstuvwxy
length is 50, data is xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
length is 50, data is abcdefghijklmnopqrstuvwxyxxxxxxxxxxxxxxxxxxxxxxxxx
 
Old 02-18-2009, 02:18 AM   #17
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Quote:
Originally Posted by Biddle View Post
Ok. So now do you know if it is defined behaviour to write to the memory in string1 and string2?
Some code as an illustration is sometimes more clear than sentences (when talking about code). So, yes.

I do think you have a valid point though.
 
Old 02-18-2009, 08:59 AM   #18
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by charlitos View Post
I dont know...I tested the code without putting the null character and apparently the compiler knows where the string ends without problems at all.
When something breaks, it's pretty certain it was wrong.

When something doesn't break, that doesn't mean it wasn't wrong.

When you run a simple program, there are lots of places in your data that will happen to be filled with zeroes.

Depending on how the space was allocated, that zero fill might be a reliable feature of the language or it might be an accident. Even if you allocated so the zero fill is a reliable feature of the language, that just covers the first time you use the space within your program.
 
Old 02-18-2009, 11:23 PM   #19
charlitos
Member
 
Registered: Feb 2009
Posts: 51

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by wje_lq View Post
Your test was weak. It essentially worked because of the Magic of Uninitialized Variables(tm). It's black magic that you should avoid at all costs. Here, try this shell script as a test. It will show you the danger of not putting the NUL character (all bits off) at the end of your string.

The trick is to see how well function put_new_25() does its job, since it doesn't place a NUL character at the end of the string. It puts the first 25 letters of the alphabet into that array. When you see them, you should hope that that's all that is in the string.

The first time, it works ok, probably because the character array started out with all bits off (and you should never rely on that).

The second time, not so much. Here is my output, and yours would probably be the same:
Code:
length is 20, data is ABCDEFGHIJKLMNOPQRST
length is 25, data is abcdefghijklmnopqrstuvwxy
length is 50, data is xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
length is 50, data is abcdefghijklmnopqrstuvwxyxxxxxxxxxxxxxxxxxxxxxxxxx
I got it, so when the string was first initialized I guess all the remaining characters were set to NULL and not just the last character, like for instance in

Code:
char mystring[MAX_LEN]="whatever";
My program only requires the string to get bigger, itll never shrink, thats the reason why it works, im guessing because all of the characters after 'r' has been set to NULL? But yes I see the potential problem, it will never happen with what Im doing now, but it's important to keep it in mind for sure.
 
Old 02-19-2009, 02:26 AM   #20
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
The big picture is this:
  1. From a storage standpoint, if you're going to allocate a specific number of bytes, as in either of these:
    Code:
    char mystring[]="abcde";
    char mystring[80];
    you must never write beyond the end of the allocated number of characters. In the case of the first example, the number of characters is six, because there's a NUL byte after the 'e'.

    If the number of bytes you need in the string is variable, use malloc()/calloc()/realloc().
  2. From a data representation standpoint, if you're going to work with strings, and you're storing a new string into an array, always store that NUL byte at the end. It's a good habit, and it keeps your program from breaking if you modify it later.
Quote:
it will never happen with what Im doing now, but it's important to keep it in mind for sure.
That approach is how broken programs are created.
 
Old 02-19-2009, 11:57 PM   #21
charlitos
Member
 
Registered: Feb 2009
Posts: 51

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by wje_lq View Post

That approach is how broken programs are created.
Sorry, probably I didnt make myself clear, I meant even when my program was not gonna fail because of that I actually put a null character at the end of the string, cause it is a good programming practice. I was just wondering if the compiler was doing it for me. But thanks for your response.
 
Old 02-20-2009, 08:20 AM   #22
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by charlitos View Post
I was just wondering if the compiler was doing it for me.
For much of this thread the method of declaring the char[] wasn't clear. Whether the language standard specifies the char[] be zero filled depends on how/where you declare the char[].
Code:
int main(void)
{
  char the_string[100];
In that declaration, the language standard absolutely does not specify that the array be zero filled.

If you try it in a simple enough program, I think you would find that the array is zero filled (just because ram pages allocated from the OS are normally zero filled). But significant stack use before the entry to main is possible, so even in a specific OS and compiler where you already found that char[] to be zero filled, it isn't a safe bet it would be zero filled in your next program.

Also the phrase "compiler was doing it for me" was never clear in this thread. At times it seemed like it meant the compiler was generating code to reterminate the string after it is modified in the course of the code. That is certainly not true.

If you define a global or static char[] and don't initialize it or give it incomplete initialization, the uninitialized bytes will be pre filled with zeroes. But if you declare a local char[], as in the code above, it may not be pre filled (it will tend to be pre filled if it represents the deepest stack use up to that point in the program, but even that is unreliable).

Last edited by johnsfine; 02-20-2009 at 08:21 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ vector or string size sefaklc Programming 20 08-02-2008 12:14 AM
Trying to change String using sed with a string \/home\/user\/Desktop icecoolcorey Programming 10 06-12-2008 11:32 PM
How do i change the size of a xterm window or lanch one with a specified size? Garavix Linux - Newbie 2 04-20-2006 09:06 PM
Really strange change of string values in C realos Programming 34 08-08-2005 09:32 PM
reading a char string of variable size in C introuble Programming 3 05-08-2005 01:07 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration