[SOLVED] PNFHA: if-else code generation

sundialsvcs · 08-13-2014, 07:12 AM

I still detect the lack of "separation of concerns" here. You're still muddling code-generation with parsing.

An AST structure for an IF-statement might, for example, be a 2-tree with children for the "then" and "else" parts, with children consisting of, say, a list of statement-blocks for each part. Once the statement was completely recognized, code would be generated from the AST (fragment). It might be littered with semantic clues, as well.

One analogy that has been bantered-about is that, whereas the parser deals with "words," the AST represents "sentences."

Really, an example is worth a thousand words, and on the Internet such examples are legion. One of the obligatory (for some reason) classes that you always find at the graduate level in a computer-science curriculum is the building of a compiler, fortunately almost-always using a toolkit.

des_a · 08-13-2014, 10:44 PM

So I got it to work now. It is a flat tree, but it works. Here is my definitions:

Code:

class ASTNode
{
 protected:
  String name;
  Array<String> value;
  unsigned long valuecount;


 public:
  ASTNode(const String n = "");
  ASTNode(const Array<String> v, const String n = "");
	
  ASTNode(ASTNode & n);

  ~ASTNode();

  
  String getName();
  Array<String> getValue();

  void setName(const String n);
  void setValue(const Array<String> v);


  int conprint(const char * format, ...);


  ASTNode operator=(ASTNode & n);
};

class ASTTree
{
 protected:
  Array<ASTNode *> nodes;


 public:
  ASTTree();
  ASTTree(ASTTree & t);
  ~ASTTree();


  void add_node(ASTNode & node);
  void remove_node();
  void remove_node(String n);

  ASTNode & getNode(String n);
  ASTNode & getNode(unsigned long offset);
  void setNode(String n, ASTNode & node);
  void setNode(unsigned long offset, ASTNode & node);

  unsigned long nodenumber();
};

des_a · 08-16-2014, 12:32 AM

I need a little help on a possible algorithm to add the line numbers in. Here is an example of code:

Code:

VERSION TVOID 0V


; Begin if :1
ALOAD TBOOLEAN true
CGOTOL TVOID if_part :1
GOTOL TVOID else_part_or_end :1


LBL TVOID 0V
VOID TVOID 0V
; End if code :1
GOTOL TVOID end


LBL TVOID 0V
; Begin if :2
ALOAD TBOOLEAN false
CGOTOL TVOID if_part :2
GOTOL TVOID else_part_or_end :2


LBL TVOID 0V
VOID TVOID 0V
; End if code :2
GOTOL TVOID end


LBL TVOID 0V
; End if or else code :1
GOTOL TVOID end


LBL TVOID 0V
; End if :1

END TVOID 0V

What I need to do, is use the comments to add the line numbers. Here is what I got so far for an algorithm:

Code:

  1. Count the labels until there is an if-else.
  2. Count the if else labels.
  3. Replace end with the results of (step 1 - 1) + (step 2 - 1).

I need to have a complete algorithm to add the line numbers, and it needs to work with all the possible if-else possibilities.

I "think" the comments and psudocode give enough information for the program to figure it out from there. Am I on the right track for what might work? I'd like to learn more about these ASTs, but for now, whatever will work will be fine, if I'm still mixing the code generation in too much. As long as it works, for now it is a fine algorithm. That way I can simply continue to learn as I go, and not have to wait until I finally figure out the ASTs, before finishing.

sundialsvcs · 08-16-2014, 06:35 AM

What I did, in a script compiler that needed to refer to line-numbers for error locating purposes, was to simply insert the source-line number (counted by the scanner and ignoring $INCLUDEs) directly into every p-code instruction that was generated. A source-line-range reference (not always accurate) was also inserted into AST entries. It enabled you to at least get close to the line-number where the error occurred, for debugging only. (The error-messages say, "near line such-and-so," and provide no useful information at all if the error is found in an included file. They're used to point a source-file editor to an approximate location, and nothing more.

Line-numbers should not be a syntactically-significant part of your language. Even the earliest BASIC interpreters got rid of the notion of "one statement per line," and used "line numbers" only as labels ... later replacing them with, well, labels. No one missed that throwback to the teletype days.

pwalden · 08-18-2014, 05:22 PM

Quote:

Originally Posted by sundialsvcs

Line-numbers should not be a syntactically-significant part of your language. Even the earliest BASIC interpreters got rid of the notion of "one statement per line," and used "line numbers" only as labels ... later replacing them with, well, labels. No one missed that throwback to the teletype days.

I don't think he's listening.

des_a · 08-19-2014, 08:15 PM

I realize now I said line numbers, and what I meant was that I was using label numbers to jump to a statement in my generated code. A label is the LBL instruction.

des_a · 08-19-2014, 08:28 PM

The first label number in the program starts at 0, and it goes up from there.

des_a · 08-23-2014, 11:12 PM

My book finally came from the library. It's called "Writing Compilers and Interpreters: An Applied Approach". By the end of reading the book, I will have a working Pascal compiler and interpreter. I will then modify it with my understanding of the concepts, or attempt to anyway, to compile to PNFASM, instead of to Microsoft Quick Assembler, which I cannot use. Then I will modify the language itself, to allow at the very least an ASM keyword, which will output PNFASM code directly, for the times which I need more features of the language. Then I will have my first working compiler for a high level language. I still want to make this language, as it's more powerful yet, but I will have lots of power at my fingertips after that.

des_a · 08-23-2014, 11:14 PM

I still would like advice on the solution to this problem though, in case the book does not answer my question. The book covers C compilers, and not bison and flex compilers, but the ideas of the code generation and ASTs, will be the same.

NevemTeve · 08-25-2014, 01:43 PM

It is not clear what your question is. Actually, this topic looks like a blog, not an usual questions/answers topic.

des_a · 08-25-2014, 09:54 PM

The question was and remains to basically get the if-else statement working, in a nutshell. What I've done so far, is gone from one state to another, still not fully working. A summary of the steps were shown in this thread. Right now the only part of the code that's not working in any way is that the label numbers are not generated correctly. So far in my current code, there is no other problems, so if we could fix that from this point, without breaking anything else, then the solution will have been provided.

This language compiles to my PNFASM language, so that is what the final output needs to be.

des_a · 08-25-2014, 10:05 PM

At this point, I am learning from AST trees and all that, but as long as the if else will work and generate "right" code, it is irrelevant, and something to look at in the future, or to handle later if I have trouble in another spot, in which case if needed, I'd post a new thread to learn how to deal with it. My book I will go through, and I'd been waiting on this book, but if I get this to work without the knowledge of the book, all the better. Not that now that I'm aware of the general idea of generating AST trees, I won't look into it for later, but I don't care right now for this, if it uses it or not, as long as the compiler produces working code. If I'd ended up designing it and implementing the language, and never needed ASTs, further than what I got, that's fine, and in the future when I learn about it, I'll use them. But for now, any way that works, is okay with me. Even if I had to rewrite the program just to get that to work as well as what already works, that would be fine with me, but the point of this thread is to get it to work.

des_a · 08-25-2014, 10:16 PM

See: http://forums.techguy.org/software-d...ml#post8958614, for an updated summary of the problem's state.

NevemTeve · 08-26-2014, 03:11 AM

You've got complete examples, did you actually examined any of them?

sundialsvcs · 08-26-2014, 10:43 AM

In the compilers that I have done, and as I have already said, forward-jumps are handled by means of a "fixup" data structure. For example, if I'm compiling an if/then/else construction, then the AST that I receive will already contain the one-or-two children ("then" and optionally "else"), and I know that I need to now generate code for each of them. But the design of my compiler in this section is non-recursive, using a "to-do list" of sorts to drive the exploration. There is a push-down stack of the various nested structures that I am "in" at any time, so I push an entry for if/then/else. One of the properties of that stack-object is a fixup list that will correspond to the end of the construct. Anyone that needs to branch there pushes a fixup-entry. Later on, when the end of the construct is reached, the code pops the stack-entry (verifying that it is an if/then/else entry as expected), then "resolves" the fixup-list by stamping the now-known correct address (and, at the same time, confirming that the location being updated is of the expected opcode-type ... one can never be too sure).

"Line numbers in the source code" should have no semantic meaning whatsoever. The source-code is translated into an AST data-structure ... usually, a chunk at a time ... and this is what you work from. If you need to generate forward or backward branches, etc., then the whole process at that point is entirely abstracted away from the source-code. It is done using internal compiler data structures and dictionaries, such as the "fixup lists" and "pushdown stacks" aforementioned.

Puh-l-l-l-leauze stop and study the resources that have already been pointed out to you. You're walking down a pathway that is so familiar and so old that it has been paved, but you've got a blindfold on and seem to be determined to learn everything in the school of hard knocks.