[SOLVED] What is so special about 34 or more spaces when reading text files with C code?

GazL · 10-13-2019, 08:12 AM

I thought the whole point was that you were trying to remove comments and just have the actual line returned, but perhaps I misunderstood your intentions.

If you want to retain access to the comment, just save a pointer to the character after the first '#' you find.

Here's an eample of what the for loop should look like:

Code:

        char *comment_start = NULL;
        char *first_non_whitespace = NULL;
        for ( char *p = buf; *p != '\0'; p++ ) {                                  
            if ( (first_non_whitespace == NULL) && (! isspace((unsigned char) *p)) )
                first_non_whitespace = p;
            switch( *p ) {
            case '#':
                if ( comment_start == NULL ) {
                    comment_start = p + 1;
                    *p = '\0';
                }
                break;
            case '\n':
                *p = '\0';
            }
        }

As an added bonus, it also strips leading whitespace. I'll leave you to figure out how to fit it into your own program.

Don't forget to #include ctype.h for isspace()!

jsbjsb001 · 10-14-2019, 08:50 AM

Perhaps I wasn't clear; Let's say I had a line like:

Code:

Not a comment#

or something like:

Code:

not#a comment

None of the above are comments, as while the hash is still there, it is not at the very start of those strings. So my last question was: is there anyway I could get it to still display "not#a comment", instead of it stripping the "#a comment" part of that same string, and just displaying "not" in that case?

But let's say I had a line like:

Code:

#is a comment

or

Code:

# is a comment

Then because the hash IS proceeding the string, then the program SHOULD take those lines as comments - that's what I was trying to say.

I tried the code you posted before, but I cannot figure out how I'm suppose to use it in my program - as it just segfaults every time it gets to that code. And while I've tried getting it print at least something useful, and even tried debugging it with gdb, I cannot get it to print anything useful, and gdb just tells me the bleeding bloody obvious. So I have no idea why it's segfaulting on me, or if I've even put what I assume is a second for loop in the right place.

NevemTeve · 10-14-2019, 09:33 AM

Code:

    char *p= linebuffer;
    while (ispace (*p)) ++p;
    if (*p==0) /* empty line */
    else if (*p=='#') /* comment line */; 
    else /* not comment line */;

GazL · 10-14-2019, 10:02 AM

If all you want to do is ignore lines that start with a '#' then you don't want any of the code I posted: that was for stripping comments wherever on the line they are (which is what I thought you wanted). Disregard all my posts on this topic, and just use a simple 'if'.

rtmistler · 10-14-2019, 10:29 AM

Quote:

Originally Posted by jsbjsb001

Perhaps I wasn't clear; Let's say I had a line like:

Code:

Not a comment#

or something like:

Code:

not#a comment

None of the above are comments, as while the hash is still there, it is not at the very start of those strings. So my last question was: is there anyway I could get it to still display "not#a comment", instead of it stripping the "#a comment" part of that same string, and just displaying "not" in that case?

But let's say I had a line like:

Code:

#is a comment

or

Code:

# is a comment

Then because the hash IS proceeding the string, then the program SHOULD take those lines as comments - that's what I was trying to say.

Coming into the discussion late.

These requirements are fairly clear.
The whole "some number of spaces before a #', I'm assuming you've debugged that. Meanwhile this post here doesn't address as to whether or not preceding spaces qualify to consider a line as a comment versus not.

Overall:
You can have a FLAG, a.k.a. a boolean variable or a 1/0 variable, whatever actual variable type you use is your business, but essentially it is an indicator as to whether or not you've recently seen a NEWLINE (Say the file is recently opened and you are start of file, you should still have that variable say that you've recently had a newline.
Next step. If the FIRST character you encounter, or the "First character you encounter that is NOT a space, is #, then it is a comment line.

Done. Probably can code this up in about 5 lines of code, notwithstanding all the fopen() stuff.

jsbjsb001 · 10-15-2019, 07:51 AM

Actually GazL, your posts have been very helpful and insightful, particularly your post #13 - it was absolutely perfect to help me figure out what condition I needed for my for loop. That combined with astro's advice (when it finally clicked for me what he was on about in terms of comparing with ASCII) is the very reasons I able able to solve that particular problem. So no, I won't be disregarding your posts

Better late than never RT, come on down. While I know what a bool variable/values are, being 0 and 1, or TRUE/FALSE (obviously FALSE being before TRUE if I'm right in my thinking); I not sure what you mean by the "debugging some number of spaces" comment. I tried to do what you suggested with the newline, but nothing I've tried has worked at all, not even just close to what I was trying to do. So I cannot figure out for the life of me how to get it to only ignore the line if the hash is the first character encountered bar the space.

To try and be clear, this is what I want to put into code:

Code:

Scan each line
ignore the spaces, and continue scanning the same line
if a hash is encountered, ignore the line and move to the next line, and repeat

if on the other hand, anything other than a space or a hash is encountered, then display the line - even if there is a hash after the first non-space character on that same line

So again;

#comment string here - should be taken as a comment
# comment string here - should be taken as a comment
                                          # comment string here - should be taken as a comment
not a comment string - should NOT be taken as a comment
 not a comment string - should NOT be taken as a comment
not# a comment string - should NOT be taken as a comment
                                     not a#comment string - should NOT be taken as a comment
                not a comment string# - should NOT be taken as a comment
not a comment string# - should NOT be taken as a comment

In other words; if the hash is anywhere but immediately proceeding the FIRST non-space character, the program should NOT see that line as a comment. But on the other hand, if the hash IS proceeding the FIRST non-space character (as above), then that line SHOULD be seen as a comment, and ignored. But I cannot figure out how I supposed to express that in code. The only thing I've been able to do is have the program search for the hash, and either display the line or not depending on whether it finds one or not. So again, if I had a line like "no#comment", then while the "no" would be displayed, the "#comment" part of that SAME line would not be, because the hash is still there - similar to the problem with using strchr(). That's the problem I'm trying to solve, and cannot once again for the life of me figure out how to. Even just trying to figure out how the bloodly hell I'm supposed to use the newline suggestion RT made is just absolutely baffling.

BW-userx · 10-15-2019, 08:06 AM

Quote:

...FALSE OR TRUE is TRUE, because 0 | 1 is 1. ... insert many other examples here. because the concept of zero being equivalent to false is well-understood. As others have said, the math came first. This is why 0 is false and 1 is true ...

backwards I think you got it.

rtmistler · 10-15-2019, 10:28 AM

Quote:

Originally Posted by jsbjsb001

if the hash is anywhere but immediately proceeding the FIRST non-space character, the program should NOT see that line as a comment.

I feel you are complicating this by allowing white spaces before # characters, that's not normal for a file which one would parse for configuration. This is a typical example:

Code:

# This is my file to control the settings of my program.
# Conventions are: PROPERTY=VALUE

INIT_SCREEN=START_EVALUATION
#INIT_SCREEN=ENTER_OPTIONS
TIMEOUT_FOR_USER_ENTRY=5

That example has options named INIT_SCREEN and TIMEOUT_FOR_USER_ENTRY. There also is a commented out line where the set-up of INIT_SCREEN was different, but they commented it out, so it's ignored.

By allowing white spaces, it is not an insurmountable problem, but why bother with the complications? We're not here to be one million percent compliant, accepting of, and correcting for everybody's typing mistakes.

Just like the C language has a syntax, your configuration file should have a syntax.

If you allow the typist to do free form typing, you're going to waste your time and effort reading in and parsing a configuration file.

I can think of two possible flows of reading data from a file and processing it.

You read it a line at a time or a character at a time.

If you read a line at a time:
You check each character starting from the beginning of the line.
(A) If you encounter any character besides a SPACE or #, the line is NOT a comment.
(B) If you encounter SPACE, you ignore them until you reach a character, or the end of line.
(C) If you read the end of line, the line is empty.
(D) If you encounter a #, the line is a comment.

If you read a character at a time:
You start at file open, and assume that you are starting a the beginning of a line.
You check each character.
(A), (B), (C), (D) actually all apply, just that you're checking for new line to tell you that you've reached the end of line.

The main point here is to detect an intentional configuration setting written by the user of your program.

That is all.

phil.d.g · 10-15-2019, 09:25 PM

Can you show us what you currently have? A few people have made suggestions and by the sounds of things you've tried a few of them, I'd like to see what you have now, and see if I can provide useful feedback on that.

jsbjsb001 · 10-16-2019, 01:07 AM

Quote:

Originally Posted by rtmistler

...
(B) If you encounter SPACE, you ignore them until you reach a character, or the end of line.
...

This is exactly one of the things that's absolutely baffling me. In other words; I just cannot figure out how I express this in code, particularly when you throw in everything else to the mix, like looking for a newline.

I do agree with you RT, and if I was writing a "serious" program, I likely would do what you say above. But my point here is that; I'm trying to learn about not just detecting a particular character, but also trying to get the program to take another action if that same character is not before any other characters. In other words; I'm trying to go beyond just simply detecting the hash in this case, and make my program more precise. So like I was saying above, if the hash isn't before any other characters apart from a space, then it not ignoring the line, but displaying it including both the hash, and whatever is after it, but that's on the same line. Again, this is what's absolutely baffling me as to how I express this in code. As just detecting the hash I get, but trying to go beyond that is confusing in terms of putting it into code.

Quote:

Originally Posted by phil.d.g

Can you show us what you currently have? A few people have made suggestions and by the sounds of things you've tried a few of them, I'd like to see what you have now, and see if I can provide useful feedback on that.

Well, I've tried at least a couple of different ways, both with and without using bool variables. So here's the attempt with bool variables;

Code:

// a program to skip to the next line in file if the comment char (#) is encountered

#define  CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {    
       
    char content[CONTENT_LEN];
    char filename[10] = "test.txt";
    bool str = false, newline = false;
    int i = 0, comment = 0;    
    
    FILE *testfile; 

    if ( ( testfile = fopen(filename, "r")) == NULL ) {
         fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
         return 1;
    }          
   
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {        
                for ( i = 0; content[i] != '\0'; i++ ) {                
            //           printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");
                    printf("str = %d\nnewline = %d\ncomment = %i\n", str, newline, comment);                                                  
                    if ( content[i] == '#' ) {
                       ++comment;
                       if ( content[i] == '\0' ) {
                          newline = true;
                          break;
                       }                               
                    }
                    if ( (newline) && (comment > 1) ) {
                       str = false;
                       break;
                    }
                    else {
                         str = true;
                    }                 
                }
                if (str) {
                    str = true;
                    printf("Uncommented lines in file: %s\n", content);
                }                
                newline = true;
                str = false;                
    }
 //   printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
    
    fclose(testfile);
    
    return 0;
}

If I set the "newline = true" that's in the while loop to "newline = false", then everything in the file gets displayed even if it has the hash in front of what should be a comment. And everything but the last two lines of it otherwise, regardless of whether there's a hash there or not. I tried to use printf() to print the values of my bool variables, but I'm totally confused as to what exactly is going on here.

I tried it without using bool variables, like just using if statements to check for a newline (using '\n') as well. But depending on exactly how I change the code, it either just prints out everything in the file, or I get nothing. So this among other variations of what's below is another thing I tried (which doesn't work either - and in this case I get nothing);

(I also tried other variations of the code posted previously in this thread that does work checking for the newline character, but again, nothing I try seems to work, either at all, or not properly - like it still displays lines that are comments for example, but not limited to.)

Code:

// a program to skip to the next line in file if the comment char (#) is encountered

#define  CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>

int main(void) {    
       
    char content[CONTENT_LEN];
    char filename[10] = "test.txt";    
    int i = 0, comment = 0;    
    
    FILE *testfile; 

    if ( ( testfile = fopen(filename, "r")) == NULL ) {
         fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
         return 1;
    }          
   
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {        
          for ( i = 0; content[i] != '\0'; i++ ) {                
            //           printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");                                                               
              if ( content[i] == '\n' ) {
                 continue;
                 if ( content[i] == '#' ) {
                 ++comment;
                 }
              }
            
          }
          if ( comment > 1 ) {
             printf("Uncommented lines in file: %s\n", content);                     
          }  
                
    }
 //   printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
    
    fclose(testfile);
    
    return 0;
}

So I'm buggered if I know what else to try. Hopefully I've made enough sense in what I was saying above - I tried nonetheless. Thanks for your help.

phil.d.g · 10-16-2019, 02:52 AM

Quote:

Originally Posted by jsbjsb001

But my point here is that; I'm trying to learn ...

Because you said that, I'm going to try and give you pointers to figure out what you're doing wrong, rather than suggest a working solution. I'll just work with the bool version. I think it best to focus on thing at once.

Your outer while loop isn't a problem. Nor is your for loop. We need to look inside the for loop.

Let's focus on this for a minute.

Code:

                    if ( content[i] == '#' ) {
                       ++comment;
                       if ( content[i] == '\0' ) {
                          newline = true;
                          break;
                       }                               
                    }

If the condition of the outer if statement is true, what can we say about the result of evaluating the condition of the inner statement?

Furthermore, what can we say about the condition of the inner statement and the for loop termination condition?

jsbjsb001 · 10-16-2019, 04:29 AM

Quote:

Originally Posted by phil.d.g

...
If the condition of the outer if statement is true, what can we say about the result of evaluating the condition of the inner statement?

That it would be false?

Quote:

Furthermore, what can we say about the condition of the inner statement and the for loop termination condition?

I'm not quite sure what the answer to that one is to be honest.

phil.d.g · 10-16-2019, 04:56 AM

Quote:

Originally Posted by jsbjsb001

That it would be false?

Yes. That means the code inside that if statement is dead code. It can never be executed, as such it's not required.

Quote:

Originally Posted by jsbjsb001

I'm not quite sure what the answer to that one is to be honest.

Ignoring the outer if statement momentarily, this condition is the same as the terminating condition, which also means this code would never be executed (the loop would terminate first).

Let's have one assumption for now: A line won't be longer that 372 characters, and thus every fgets() call will read a single, full line, and content[0] is the start of the line.

The requirements say that if the first character that isn't a space is a #, then the line is a comment.

So, the first step is to 'consume' the empty space. If we start with this:

Code:

    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {        
        for ( i = 0; content[i] != '\0'; i++ ) {
            // TODO: Implement this block.
        }               
    }

Can you write an if statement that will print the first character that isn't a space on each line, and only that one character?

Hint: Use the following to print a single character:

Code:

printf("%c\n", content[i]);

jsbjsb001 · 10-16-2019, 05:32 AM

Quote:

Originally Posted by phil.d.g

...
Can you write an if statement that will print the first character that isn't a space on each line, and only that one character?
...

Other than the only printing the first character, yes. In other words; I know how to write an if statement that says "if not equal to a space" (being "if ( content[i] != ' ' )"), then the printf() to print the result, but I'm not sure how to get it to stop at the first non-space character though.

BW-userx · 10-16-2019, 07:53 AM

Quote:

Originally Posted by phil.d.g

Because you said that, I'm going to try and give you pointers to figure out what you're doing wrong, rather than suggest a working solution. I'll just work with the bool version. I think it best to focus on thing at once.

Your outer while loop isn't a problem. Nor is your for loop. We need to look inside the for loop.

Let's focus on this for a minute.

Code:

                    if ( content[i] == '#' ) {
                       ++comment;
                       if ( content[i] == '\0' ) {
                          newline = true;
                          break;
                       }                               
                    }

If the condition of the outer if statement is true, what can we say about the result of evaluating the condition of the inner statement?

Furthermore, what can we say about the condition of the inner statement and the for loop termination condition?

if you are looking for the # first then checking for something afterwards to logically state then that has to be a comment. Keeping in mind I seen your test for #

Code:

#comment
Not#comment
Not Comment#

something like this maybe. just checking for the forward and not 1 behind # if true check on both side of the #

Code:

if (content[i] == '#') && (content[++1] == '\0')|| (content[++i] == '\n')
  newline=true;

to check behind the #

Code:

if (context[i] == '#')
     if ( context[--i] != '\0' || context[--i] != '\n' )
            //must be a char of some type? 
            // so it must be looking at something relative to 
              Not Comment# or Not#Comment

not tested though.