ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I still don't get why not continue on with Lex and YACC?
It's not that Lex and YACC are too complicated, it's mostly that there are lots of little important things that aren't mensioned in tutorials. Such as, for exapmle the type of $$, $1, $2, etc. in YACC.
I still don't get why not continue on with Lex and YACC?
It's not that Lex and YACC are too complicated, it's mostly that there are lots of little important things that aren't mensioned in tutorials. Such as, for exapmle the type of $$, $1, $2, etc. in YACC.
The little things make it complicated.
The overall approach of backtracking parser which tries language constructs makes the developer's life easier - the developer implements language constructs and callbacks step by step - of course, feeding currently supported set of language constructs fro testing.
So, by construction there is no possibility to obtain a message saying some piece is not described or what was it in your case. The developer exactly knows (until proven wrong by tests) what is supported and whatis not.
Also, as I wrote, it's easy to visualize the trial-error-backtracking process performed by the parser, i.e. it's easy to debug definitions of language constructs.
Lex & Yacc are great for someone who doesn't want to learn everything about grammars and parsing and the theory of it all. I couldn't imagine writing a decent parser for a non-trivial grammar without some formal background in the art. I've seen enough cases of it being done poorly, when a much better equivalent product could be produced with less effort with a documented tool such as lex & yacc (even including the effort to learn lex & yacc, which is non-trivial itself). Moreover, a roll-your-own parser will be understood by you only, whereas anyone acquainted with BNF will be able to quickly understand a lex & yacc based parser. Even if you can't produce a parser yourself, figuring out what an existing grammar and parser based on yacc does is fairly straightforward.
To Sergei and others who prefer to roll-their-own, I take my hat off. But for a novice trying to cobble together a calculator or simple programming language, it doesn't make sense to me. If you are interested in the theory behind the subject, that's a different matter, and probably the only way to go is do it yourself. Write your own lex & yacc, then use it to build a calculator. There's a learning experience.
I agree that it is difficult to find good information/education on the tools. Other than the links already posted in this thread, have you come across any nuggets that you would like to share?
The user must supply a lexical analyzer to read the input stream and communicate tokens (with values, if desired) to the parser. The lexical analyzer is an integer-valued function called yylex. The function returns an integer, the token number, representing the kind of token read.
Quote:
Originally Posted by MTK358
The types of $$, $1, $2, etc.
It is YYSTYPE:
Quote:
Support for Arbitrary Value Types
By default, the values returned by actions and the lexical analyzer are integers. Yacc can also support values of other types, including structures. In addition, Yacc keeps track of the types, and inserts appropriate union member names so that the resulting parser will be strictly type checked. The Yacc value stack (see Section 4) is declared to be a union of the various types of values desired. The user declares the union, and associates union member names to each token and nonterminal symbol having a value. When the value is referenced through a $$ or $n construction, Yacc will automatically insert the appropriate union name, so that no unwanted conversions will take place. In addition, type checking commands such as Lint[5] will be far more silent.
...
Alternatively, the union may be declared in a header file, and a typedef used to define the variable YYSTYPE to represent this union.
The user must supply a lexical analyzer to read the input stream and communicate tokens (with values, if desired) to the parser. The lexical analyzer is an integer-valued function called yylex. The function returns an integer, the token number, representing the kind of token read.
You don't understand. I am not talking about the return type of yylex(), but about the semantic value of the token.
For example, an integer token will have an int value, and a string token will have a char* value.
You don't understand. I am not talking about the return type of yylex(), but about the semantic value of the token.
For example, an integer token will have an int value, and a string token will have a char* value.
And if you use backtracking parser (at least, it was the case in my implementation), the engine returns start and end pointers in text (i.e. start line/column numbers and end line/column numbers) corresponding to the recognized language construct, so you question about the token value becomes irrelevant.
I think ntubski actually did answer your question regarding how to return token types. In lex, the standard idiom is to use the symbols defined in y.tab.h, which derive from the %tokens in the yacc grammar, as the return values. This is different from the value of the token, the type of which will vary depending on the data being parsed. The value of the token is stored in yylval, which is usually a union, so that it can store data of differing types. So, there are two things supplied by the lexical analyzer; the type of the data, and the value of the data. The former is the (integer) return value of the lexical analyzer, and the latter is a side effect of the lexical analyzer.
On your second question, the types of the $$, $1, $2 etc. The type of these will vary, depending on the type of the data for the token. The grammar parser code that uses the data knows/assumes that the appropriate data will be in yylval.
One concept that may be missing in your understanding is that there is a somewhat circular nature to the use of the grammar parser and the lexical analyzer. The grammar parser creates y.tab.h based on the grammar, and in essence says to the lexical analyzer "here is what I expect and here is the data structure(s) I will use to accept the data". At least this was something that I was late to understand; maybe the same for you. When used in combination, it is important to first run yacc to cause any changes in the grammar to be reflected in the y.tab.h, which is then used by lex.
The rest of your questions seem very project specific, and I will defer to those more skilled in the art to make suggestions.
--- rod.
...
One concept that may be missing in your understanding is that there is a somewhat circular nature to the use of the grammar parser and the lexical analyzer. The grammar parser creates y.tab.h based on the grammar, and in essence says to the lexical analyzer "here is what I expect and here is the data structure(s) I will use to accept the data". At least this was something that I was late to understand; maybe the same for you. When used in combination, it is important to first run yacc to cause any changes in the grammar to be reflected in the y.tab.h, which is then used by lex.
The rest of your questions seem very project specific, and I will defer to those more skilled in the art to make suggestions.
--- rod.
Which just proves my point - tokens is a redundant entity.
So you mean tokens are not necessary if the parser generator can accept regexes as terminals?
Is there such a parser generator?
I gave enough links and "tips". Start thinking differently, i.e. not in terms of tokens.
Regular expressions are just a tool to formulate what a language construct is. The idea is there is an action per language construct, and the language constructs may be compound.
In my engine regular expressions were just a kind of "built-in" parametrized (the parameter being the regular expression itself) language constructs.
To better understand what is about implement a primitive back tracking parser from scratch. Make it understand, say, assignments in the form of
OK, so they use tokens, but as long as you can easily pass their semantic value, who really cares? And even so, that's not my main problem, which is the AST (which exists no matter what parser you use).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.