Lex/YACC Question

Sergei Steshenko · 03-26-2010, 09:59 AM

Quote:

Originally Posted by MTK358

Could you please just tell me the answer? I don't even understand what I should say.

How do you implement in code "followed by" ? Try, for starters, to answer this question.

In English you describe language constructs as "something followed by something else" - among other things. So, how do you check in plain procedural code that something is followed by something else ?

MTK358 · 03-26-2010, 10:10 AM

Is this what you want:

"xcv = ();" is wrong, because a variable assignment is a variable, followed by '=', followed by '(', an expression, and a ')'.

Sergei Steshenko · 03-26-2010, 10:34 AM

Quote:

Originally Posted by MTK358

Is this what you want:

"xcv = ();" is wrong, because a variable assignment is a variable, followed by '=', followed by '(', an expression, and a ')'.

Not quite, though you are closer - I gave just two legal constructs:

identifier_to_identifier assignment
number_to_identifier_assignment

.

I have not mentioned an expression. In the parser studying toy world there is no notion of expression yet.

And I'm still waiting for an implementation of "followed by" check.

MTK358 · 03-26-2010, 11:01 AM

Then parentheses are not a valid character.

Sergei Steshenko · 03-26-2010, 11:25 AM

Quote:

Originally Posted by MTK358

Then parentheses are not a valid character.

Correct.

So, what about the valid cases ? I.e. how would your parser deal with the input example:

Code:

abc = def;
fgh = 123;
xcv = ();

?

I.e. what is the implementation of "followed by" check ?

In the light of presence of just two legal constructs:

identifier_to_identifier assignment
number_to_identifier_assignment

how should the error regarding 'xcv = ();' be reported ? To be more exact, at what place of input stream should the error be reported ?

Don't start with error reporting until you implement the legal constructs:

identifier_to_identifier assignment
number_to_identifier_assignment

- the error reporting directly depends on the way you implement the two constructs.

MTK358 · 03-26-2010, 11:28 AM

variable, followed by '=', followed by variable, and a ';'.

variable, followed by '=', followed by number, and a ';'.

variable, followed by '=', followed by illegal character (ERROR).

Sergei Steshenko · 03-26-2010, 11:36 AM

Quote:

Originally Posted by MTK358

variable, followed by '=', followed by variable, and a ';'.

variable, followed by '=', followed by number, and a ';'.

variable, followed by '=', followed by illegal character (ERROR).

There is a number of issues with you answer:

a language like Perl or "C" does not have "followed by" statement/operator, so I'm still waiting for some procedural code;
your sequence of checks will fail for the following (I swapped the first two lines):

Code:

fgh = 123;
abc = def;
xcv = ();

, i.e. if you wrote the parser in such a manner, the very first check of yours:

Quote:

variable, followed by '=', followed by variable, and a ';'

would fail - which is a wrong failure.

So, again, first write procedurally something minimal, like just one check:

Quote:

variable, followed by '=', followed by variable, and a ';'

, and then the other check:

Quote:

variable, followed by '=', followed by number, and a ';'.

, and then we'll discuss how to make the toy parser insensitive to the correct constructs order.

MTK358 · 03-26-2010, 01:45 PM

Quote:

Originally Posted by smeezekitty

GCC is written with L&Y.

Seems to me that the fact that the de-facto standard C compiler uses it, that most of its basic concepts are quite simple, and that it's a popular, standard Unix tool, is enough reason to use it.

Sergei Steshenko · 03-26-2010, 05:48 PM

Quote:

Originally Posted by MTK358

Seems to me that the fact that the de-facto standard C compiler uses it, that most of its basic concepts are quite simple, and that it's a popular, standard Unix tool, is enough reason to use it.

The fact that tool A is written with tool B does not help your understanding of what is under the hood - remember your difficulties with 'Text::Balanced', 'substr', regular expressions.

As I've already said, the disconnect between English and code in this thread and the above named difficulties have the same root cause.

theNbomr · 03-26-2010, 06:09 PM

Sergei:
I'm not sure MTK358 really wants to learn how things work under the hood. He just wants to get a drivers license.
If he wanted to write a parser from scratch, then understanding the theory would be good. He just wants to use an existing tool. Using the rules of engagement for that tool. Even if it isn't the best tool out there. And even if you or he didn't write it. Thats why the title of the thread is 'Lex/YACC Question'.
Frankly, your assertion that tokenizing the input data is unhelpful doesn't stand up well to my (novice) analysis. Since the essence of a grammar is a way of generalizing the description of how parts of a language (analogous to parts of speech) are organized as a collective, it seems to make perfect sense to categorize the input into its more general component parts (tokens). This is exactly what the lexical analyzer does. A computer language grammar uses names to describe its component parts in consistent ways: variables, constants, operators, functions, statements and expressions are part of the common lexicon. Using these component names to describe the elements that are parsed from the data seems like a good way to create an unambiguous description of the grammar that the parser is to read. What is missing from this logic?

--- rod.

Sergei Steshenko · 03-26-2010, 06:39 PM

Quote:

Originally Posted by theNbomr

Sergei:
I'm not sure MTK358 really wants to learn how things work under the hood. He just wants to get a drivers license.
If he wanted to write a parser from scratch, then understanding the theory would be good. He just wants to use an existing tool. Using the rules of engagement for that tool. Even if it isn't the best tool out there. And even if you or he didn't write it. Thats why the title of the thread is 'Lex/YACC Question'.
Frankly, your assertion that tokenizing the input data is unhelpful doesn't stand up well to my (novice) analysis. Since the essence of a grammar is a way of generalizing the description of how parts of a language (analogous to parts of speech) are organized as a collective, it seems to make perfect sense to categorize the input into its more general component parts (tokens). This is exactly what the lexical analyzer does. A computer language grammar uses names to describe its component parts in consistent ways: variables, constants, operators, functions, statements and expressions are part of the common lexicon. Using these component names to describe the elements that are parsed from the data seems like a good way to create an unambiguous description of the grammar that the parser is to read. What is missing from this logic?

--- rod.

And I see the world differently - in terms of pattern recognition. Parallel thread: http://www.linuxquestions.org/questi...f-text-797670/ with its example:

Code:

always begin
if(V(gnd) < 0.01) #1 gnd_d = 1'b0; else #1 gnd_d = 1'b1;
if(V(vdd) > 0.50) #1 vdd_d = 1'b1; else #1 vdd_d = 1'b0;
end

is an example of a pattern to be recognized and then transformed.

The pattern in "scientific terms" is Verilog 'always' statement, but not just any Verilog 'always' statement (typically in designs one sees

Code:

always @(posedge clk)
  <do_something>

, '@(posedge clk)' being event specification), but specifically event-less 'always' statement (i.e. 'always' keyword), followed by sequential block ('begin' keyword followed by some operators, followed by 'end' keyword), where 'some operators' is a sequences of specifically 'if' statements (opposed to much wider set of generally allowed statements), and the peculiarity of the 'if' statements is call to 'V' function with 'gnd' argument as aprt of difference.

So, to solve the OP's in that thread problem one has to write code which recognizes such patterns, extracts from the patterns vital info (for the first 'if' it's

V(gnd) < 0.01
gnd_d = 1'b0
gnd_d = 1'b1

)

and generates another pattern.

As I look back, the vast majority of work I did with design code was pattern recognition. For example, I once wrote a piece if code which recognized sequences of two back to back connected inverters in a netlist, eliminated them and connected together the two unconnected nodes which became unconnected because of the inverters elimination.

I'm trying to say that pattern (language constructs) recognition is a much more generic approach and IMO it is much closer to natural human thinking.

As I mentioned many times, I learned about this from Prolog - a language meant to imitate human thinking/reasoning.

MTK358 · 03-26-2010, 07:00 PM

Quote:

Originally Posted by Sergei Steshenko

I'm trying to say that pattern (language constructs) recognition is a much more generic approach and IMO it is much closer to natural human thinking.

As I mentioned many times, I learned about this from Prolog - a language meant to imitate human thinking/reasoning.

IMO the way computers and humans think are basically polar opposites.

Can you mentally solve millions of complicated algorithms in a split second, with not even one error?

Can your computer be shown a picture and be able to tell exactly what every object in the picture is?

So why explain something that is intended for a computer in human terms?

Sergei Steshenko · 03-27-2010, 01:07 AM

Quote:

Originally Posted by MTK358

IMO the way computers and humans think are basically polar opposites.

Can you mentally solve millions of complicated algorithms in a split second, with not even one error?

Can your computer be shown a picture and be able to tell exactly what every object in the picture is?

So why explain something that is intended for a computer in human terms?

Prolog was meant as an AI tool. Computer programs have made tremendous pregress in image/pattern recognition.

The challenge is to develop tools/approaches which model human thinking well.

The "thing" I'm trying to get from you for your own sake (implementing "followed by" in a procedural language) is very basic and quite unambiguous.

Another interesting issue is robust "half-hearted" parsers which work very similar to humans, i.e. they know how to understand "what it's all about" omitting unnecessary details. I myself once wrote such a parser "for money", i.e. I needed it to do my job. The idea was original at the moment i.e. I came to it myself; my friends/colleagues really liked it.

But before all that "followed by" in a procedural language needs to be done. "followed by" is fundamental, and from my observation of your lack of understanding of quite relevant reasonable Perl documentation (the main items of the documentation are universal - not Perl-specific) your inability to comprehend/implement "followed by" and relate it to other fundamental things in computers/programming is the main reason you have difficulty understanding how parsing works.

Programmers who implemented various parser generators understood "followed by" perfectly well.

MTK358 · 03-28-2010, 11:11 AM

I have decided to make this simpler and make my compiler to convert code to C instead of compile or interpret (at least for now).

I have almost finished my AST-builder, and I am using a union for tokens' semantic values, like this:

Code:

%union {
	char* string;
}

%token <string> INTEGER
%token <string> IDENTIFIER

(For now I am using only a string type).

But what is this error here:

Code:

$ ./compile
Running YACC:
lang.y:24.31-32: $1 of `stmt' has no declared type
lang.y:24.45-46: $1 of `stmt' has no declared type
lang.y:28.19-20: $$ of `expr' has no declared type
lang.y:29.24-25: $$ of `expr' has no declared type
lang.y:30.36-37: $$ of `expr' has no declared type
lang.y:30.76-77: $3 of `expr' has no declared type
lang.y:31.27-28: $$ of `expr' has no declared type
lang.y:31.49-50: $1 of `expr' has no declared type
lang.y:31.53-54: $3 of `expr' has no declared type
lang.y:32.27-28: $$ of `expr' has no declared type
lang.y:32.49-50: $1 of `expr' has no declared type
lang.y:32.53-54: $3 of `expr' has no declared type
lang.y:33.27-28: $$ of `expr' has no declared type
lang.y:33.49-50: $1 of `expr' has no declared type
lang.y:33.53-54: $3 of `expr' has no declared type
lang.y:34.27-28: $$ of `expr' has no declared type
lang.y:34.49-50: $1 of `expr' has no declared type
lang.y:34.53-54: $3 of `expr' has no declared type
lang.y:35.26-27: $$ of `expr' has no declared type
lang.y:35.31-32: $2 of `expr' has no declared type
YACC returned with exit code 1

./compile is a little script that I wrote that runs Bison, Flex, and GCC and checks if each succeeds before starting the next one.

theNbomr · 03-28-2010, 12:21 PM

Since all of your types are string, add to your yacc file:

Code:

%type <string> expr
%type <string> stmnt

These go in the same part of the file where your %tokens are defined.
--- rod.