ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I still detect the lack of "separation of concerns" here. You're still muddling code-generation with parsing.
An AST structure for an IF-statement might, for example, be a 2-tree with children for the "then" and "else" parts, with children consisting of, say, a list of statement-blocks for each part. Once the statement was completely recognized, code would be generated from the AST (fragment). It might be littered with semantic clues, as well.
One analogy that has been bantered-about is that, whereas the parser deals with "words," the AST represents "sentences."
Really, an example is worth a thousand words, and on the Internet such examples are legion. One of the obligatory (for some reason) classes that you always find at the graduate level in a computer-science curriculum is the building of a compiler, fortunately almost-always using a toolkit.
I need a little help on a possible algorithm to add the line numbers in. Here is an example of code:
Code:
VERSION TVOID 0V
; Begin if :1
ALOAD TBOOLEAN true
CGOTOL TVOID if_part :1
GOTOL TVOID else_part_or_end :1
LBL TVOID 0V
VOID TVOID 0V
; End if code :1
GOTOL TVOID end
LBL TVOID 0V
; Begin if :2
ALOAD TBOOLEAN false
CGOTOL TVOID if_part :2
GOTOL TVOID else_part_or_end :2
LBL TVOID 0V
VOID TVOID 0V
; End if code :2
GOTOL TVOID end
LBL TVOID 0V
; End if or else code :1
GOTOL TVOID end
LBL TVOID 0V
; End if :1
END TVOID 0V
What I need to do, is use the comments to add the line numbers. Here is what I got so far for an algorithm:
Code:
1. Count the labels until there is an if-else.
2. Count the if else labels.
3. Replace end with the results of (step 1 - 1) + (step 2 - 1).
I need to have a complete algorithm to add the line numbers, and it needs to work with all the possible if-else possibilities.
I "think" the comments and psudocode give enough information for the program to figure it out from there. Am I on the right track for what might work? I'd like to learn more about these ASTs, but for now, whatever will work will be fine, if I'm still mixing the code generation in too much. As long as it works, for now it is a fine algorithm. That way I can simply continue to learn as I go, and not have to wait until I finally figure out the ASTs, before finishing.
What I did, in a script compiler that needed to refer to line-numbers for error locating purposes, was to simply insert the source-line number (counted by the scanner and ignoring $INCLUDEs) directly into every p-code instruction that was generated. A source-line-range reference (not always accurate) was also inserted into AST entries. It enabled you to at least get close to the line-number where the error occurred, for debugging only. (The error-messages say, "near line such-and-so," and provide no useful information at all if the error is found in an included file. They're used to point a source-file editor to an approximate location, and nothing more.
Line-numbers should not be a syntactically-significant part of your language. Even the earliest BASIC interpreters got rid of the notion of "one statement per line," and used "line numbers" only as labels ... later replacing them with, well, labels. No one missed that throwback to the teletype days.
Line-numbers should not be a syntactically-significant part of your language. Even the earliest BASIC interpreters got rid of the notion of "one statement per line," and used "line numbers" only as labels ... later replacing them with, well, labels. No one missed that throwback to the teletype days.
I realize now I said line numbers, and what I meant was that I was using label numbers to jump to a statement in my generated code. A label is the LBL instruction.
My book finally came from the library. It's called "Writing Compilers and Interpreters: An Applied Approach". By the end of reading the book, I will have a working Pascal compiler and interpreter. I will then modify it with my understanding of the concepts, or attempt to anyway, to compile to PNFASM, instead of to Microsoft Quick Assembler, which I cannot use. Then I will modify the language itself, to allow at the very least an ASM keyword, which will output PNFASM code directly, for the times which I need more features of the language. Then I will have my first working compiler for a high level language. I still want to make this language, as it's more powerful yet, but I will have lots of power at my fingertips after that.
I still would like advice on the solution to this problem though, in case the book does not answer my question. The book covers C compilers, and not bison and flex compilers, but the ideas of the code generation and ASTs, will be the same.
The question was and remains to basically get the if-else statement working, in a nutshell. What I've done so far, is gone from one state to another, still not fully working. A summary of the steps were shown in this thread. Right now the only part of the code that's not working in any way is that the label numbers are not generated correctly. So far in my current code, there is no other problems, so if we could fix that from this point, without breaking anything else, then the solution will have been provided.
This language compiles to my PNFASM language, so that is what the final output needs to be.
At this point, I am learning from AST trees and all that, but as long as the if else will work and generate "right" code, it is irrelevant, and something to look at in the future, or to handle later if I have trouble in another spot, in which case if needed, I'd post a new thread to learn how to deal with it. My book I will go through, and I'd been waiting on this book, but if I get this to work without the knowledge of the book, all the better. Not that now that I'm aware of the general idea of generating AST trees, I won't look into it for later, but I don't care right now for this, if it uses it or not, as long as the compiler produces working code. If I'd ended up designing it and implementing the language, and never needed ASTs, further than what I got, that's fine, and in the future when I learn about it, I'll use them. But for now, any way that works, is okay with me. Even if I had to rewrite the program just to get that to work as well as what already works, that would be fine with me, but the point of this thread is to get it to work.
In the compilers that I have done, and as I have already said, forward-jumps are handled by means of a "fixup" data structure. For example, if I'm compiling an if/then/else construction, then the AST that I receive will already contain the one-or-two children ("then" and optionally "else"), and I know that I need to now generate code for each of them. But the design of my compiler in this section is non-recursive, using a "to-do list" of sorts to drive the exploration. There is a push-down stack of the various nested structures that I am "in" at any time, so I push an entry for if/then/else. One of the properties of that stack-object is a fixup list that will correspond to the end of the construct. Anyone that needs to branch there pushes a fixup-entry. Later on, when the end of the construct is reached, the code pops the stack-entry (verifying that it is an if/then/else entry as expected), then "resolves" the fixup-list by stamping the now-known correct address (and, at the same time, confirming that the location being updated is of the expected opcode-type ... one can never be too sure).
"Line numbers in the source code" should have no semantic meaning whatsoever. The source-code is translated into an AST data-structure ... usually, a chunk at a time ... and this is what you work from. If you need to generate forward or backward branches, etc., then the whole process at that point is entirely abstracted away from the source-code. It is done using internal compiler data structures and dictionaries, such as the "fixup lists" and "pushdown stacks" aforementioned.
Puh-l-l-l-leauzestop and study the resources that have already been pointed out to you. You're walking down a pathway that is so familiar and so old that it has been paved, but you've got a blindfold on and seem to be determined to learn everything in the school of hard knocks.
Last edited by sundialsvcs; 08-26-2014 at 10:44 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.