LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-23-2010, 06:45 AM   #76
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723

Code:
bump();
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 03-23-2010, 02:32 PM   #77
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
Code:
bump();
http://www2.informatik.hu-berlin.de/...P/CSP06_28.pdf
http://www.complang.org/thurston/thu...ON_06_btlr.pdf
http://pdos.csail.mit.edu/~baford/packrat/thesis/
http://pauillac.inria.fr/~ddr/camlp5.../bparsers.html
http://www.cs.rhul.ac.uk/research/la...backtrack.html

Last edited by Sergei Steshenko; 03-23-2010 at 02:36 PM.
 
Old 03-24-2010, 11:57 AM   #78
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I still don't get why not continue on with Lex and YACC?

It's not that Lex and YACC are too complicated, it's mostly that there are lots of little important things that aren't mensioned in tutorials. Such as, for exapmle the type of $$, $1, $2, etc. in YACC.
 
Old 03-24-2010, 03:48 PM   #79
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
I still don't get why not continue on with Lex and YACC?

It's not that Lex and YACC are too complicated, it's mostly that there are lots of little important things that aren't mensioned in tutorials. Such as, for exapmle the type of $$, $1, $2, etc. in YACC.
The little things make it complicated.

The overall approach of backtracking parser which tries language constructs makes the developer's life easier - the developer implements language constructs and callbacks step by step - of course, feeding currently supported set of language constructs fro testing.

So, by construction there is no possibility to obtain a message saying some piece is not described or what was it in your case. The developer exactly knows (until proven wrong by tests) what is supported and whatis not.

Also, as I wrote, it's easy to visualize the trial-error-backtracking process performed by the parser, i.e. it's easy to debug definitions of language constructs.
 
Old 03-24-2010, 08:25 PM   #80
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Lex & Yacc are great for someone who doesn't want to learn everything about grammars and parsing and the theory of it all. I couldn't imagine writing a decent parser for a non-trivial grammar without some formal background in the art. I've seen enough cases of it being done poorly, when a much better equivalent product could be produced with less effort with a documented tool such as lex & yacc (even including the effort to learn lex & yacc, which is non-trivial itself). Moreover, a roll-your-own parser will be understood by you only, whereas anyone acquainted with BNF will be able to quickly understand a lex & yacc based parser. Even if you can't produce a parser yourself, figuring out what an existing grammar and parser based on yacc does is fairly straightforward.
To Sergei and others who prefer to roll-their-own, I take my hat off. But for a novice trying to cobble together a calculator or simple programming language, it doesn't make sense to me. If you are interested in the theory behind the subject, that's a different matter, and probably the only way to go is do it yourself. Write your own lex & yacc, then use it to build a calculator. There's a learning experience.
I agree that it is difficult to find good information/education on the tools. Other than the links already posted in this thread, have you come across any nuggets that you would like to share?

--- rod.

Last edited by theNbomr; 03-24-2010 at 08:26 PM.
 
Old 03-25-2010, 08:04 AM   #81
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
@theNbomr

I agree.

Anyway, the main things that I keep stumpling on are:
  • How to have different tokens return different types?
  • The types of $$, $1, $2, etc.
  • The organization of the syntax tree
    • What kind of data structure to use?
      • structs with unions that hold the data for every type of node?
      • Faked "polymorphic" structs, where structs inherit each other by adding to the super-struct's template macro?
      • C++ classes
    • Should unary and binary operators have different node types, or should there be only one universal operator node type?
    • Should if, if/else, and loop constructs be types of the universal operator node or should they have their own node type?

Last edited by MTK358; 03-25-2010 at 08:06 AM.
 
Old 03-25-2010, 09:23 AM   #82
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,786

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by MTK358
How to have different tokens return different types?
Don't: http://dinosaur.compilertools.net/yacc/index.html

Quote:
3: Lexical Analysis

The user must supply a lexical analyzer to read the input stream and communicate tokens (with values, if desired) to the parser. The lexical analyzer is an integer-valued function called yylex. The function returns an integer, the token number, representing the kind of token read.
Quote:
Originally Posted by MTK358
The types of $$, $1, $2, etc.
It is YYSTYPE:
Quote:
Support for Arbitrary Value Types

By default, the values returned by actions and the lexical analyzer are integers. Yacc can also support values of other types, including structures. In addition, Yacc keeps track of the types, and inserts appropriate union member names so that the resulting parser will be strictly type checked. The Yacc value stack (see Section 4) is declared to be a union of the various types of values desired. The user declares the union, and associates union member names to each token and nonterminal symbol having a value. When the value is referenced through a $$ or $n construction, Yacc will automatically insert the appropriate union name, so that no unwanted conversions will take place. In addition, type checking commands such as Lint[5] will be far more silent.

...

Alternatively, the union may be declared in a header file, and a typedef used to define the variable YYSTYPE to represent this union.
 
Old 03-25-2010, 10:05 AM   #83
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by ntubski View Post
Quote:
3: Lexical Analysis

The user must supply a lexical analyzer to read the input stream and communicate tokens (with values, if desired) to the parser. The lexical analyzer is an integer-valued function called yylex. The function returns an integer, the token number, representing the kind of token read.
You don't understand. I am not talking about the return type of yylex(), but about the semantic value of the token.

For example, an integer token will have an int value, and a string token will have a char* value.
 
Old 03-25-2010, 10:09 AM   #84
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
You don't understand. I am not talking about the return type of yylex(), but about the semantic value of the token.

For example, an integer token will have an int value, and a string token will have a char* value.
And if you use backtracking parser (at least, it was the case in my implementation), the engine returns start and end pointers in text (i.e. start line/column numbers and end line/column numbers) corresponding to the recognized language construct, so you question about the token value becomes irrelevant.
 
Old 03-25-2010, 10:44 AM   #85
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
I think ntubski actually did answer your question regarding how to return token types. In lex, the standard idiom is to use the symbols defined in y.tab.h, which derive from the %tokens in the yacc grammar, as the return values. This is different from the value of the token, the type of which will vary depending on the data being parsed. The value of the token is stored in yylval, which is usually a union, so that it can store data of differing types. So, there are two things supplied by the lexical analyzer; the type of the data, and the value of the data. The former is the (integer) return value of the lexical analyzer, and the latter is a side effect of the lexical analyzer.
On your second question, the types of the $$, $1, $2 etc. The type of these will vary, depending on the type of the data for the token. The grammar parser code that uses the data knows/assumes that the appropriate data will be in yylval.
One concept that may be missing in your understanding is that there is a somewhat circular nature to the use of the grammar parser and the lexical analyzer. The grammar parser creates y.tab.h based on the grammar, and in essence says to the lexical analyzer "here is what I expect and here is the data structure(s) I will use to accept the data". At least this was something that I was late to understand; maybe the same for you. When used in combination, it is important to first run yacc to cause any changes in the grammar to be reflected in the y.tab.h, which is then used by lex.
The rest of your questions seem very project specific, and I will defer to those more skilled in the art to make suggestions.
--- rod.
 
Old 03-25-2010, 10:46 AM   #86
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by theNbomr View Post
...
One concept that may be missing in your understanding is that there is a somewhat circular nature to the use of the grammar parser and the lexical analyzer. The grammar parser creates y.tab.h based on the grammar, and in essence says to the lexical analyzer "here is what I expect and here is the data structure(s) I will use to accept the data". At least this was something that I was late to understand; maybe the same for you. When used in combination, it is important to first run yacc to cause any changes in the grammar to be reflected in the y.tab.h, which is then used by lex.
The rest of your questions seem very project specific, and I will defer to those more skilled in the art to make suggestions.
--- rod.
Which just proves my point - tokens is a redundant entity.
 
Old 03-25-2010, 11:17 AM   #87
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
So you mean tokens are not necessary if the parser generator can accept regexes as terminals?

Is there such a parser generator?
 
Old 03-25-2010, 11:26 AM   #88
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
So you mean tokens are not necessary if the parser generator can accept regexes as terminals?

Is there such a parser generator?
I gave enough links and "tips". Start thinking differently, i.e. not in terms of tokens.

Regular expressions are just a tool to formulate what a language construct is. The idea is there is an action per language construct, and the language constructs may be compound.

In my engine regular expressions were just a kind of "built-in" parametrized (the parameter being the regular expression itself) language constructs.

To better understand what is about implement a primitive back tracking parser from scratch. Make it understand, say, assignments in the form of

Code:
abc = def;
fgh = 123;
Test it on a false cases like

Code:
zxc = ();
 
Old 03-25-2010, 03:12 PM   #89
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Still,

what is so bad about Lex and YACC?

OK, so they use tokens, but as long as you can easily pass their semantic value, who really cares? And even so, that's not my main problem, which is the AST (which exists no matter what parser you use).

Last edited by MTK358; 03-25-2010 at 03:18 PM.
 
Old 03-25-2010, 03:16 PM   #90
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,339

Rep: Reputation: 231Reputation: 231Reputation: 231
GCC is written with L&Y.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
lex and yacc in RHEL4 birds Linux - Software 1 06-03-2009 02:56 AM
LEX and YACC with C gr33ndata Programming 4 11-18-2007 05:12 PM
Lex and Yacc on Federo 2.0 vivekian Fedora 6 05-20-2006 09:09 AM
Lex and Yacc on Mandrake 9.2.2 Anuradha Linux - Software 0 07-02-2005 03:32 AM
Lex & YACC coolfrog Programming 3 09-25-2004 07:00 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:06 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration