Objects and assignment in interpreter
I have it so that an Env object contains all the variables in the current scope, and the parent scope. You can also save them to variables like objects, and my idea is to use the "." operator to get values from them.
I have 2 issues about this: First, how do I assign to them? I currently have it that only a variable name can be assigned to, but you should be able to assign to an "obj.value" expression, too. How do I keep track of what variable to set and still be able to say it in an expression and have it evaluate to the variable? Second, these object don't really have a "type", they're just containers that contain any values you want under any name you want. How can I, for example, define a primitive "boolean" object and have things like if statements recognize it? |
Quote:
In practice, your parser needs to proceed one part a time: first "obj", then "value". When it first sees "obj", it much search the local scope to see if there is an object of that name, and if not, if the global scope has one. If not, then it is a new object, and you create it in the local scope. When the parser finds the object the string refers to, it checks if the expression refers to the object, or some subobject. In your example, at "obj", the parser has already found the "obj" object, but there is an additional token "value", after the subobject operator. So, the parser must check all the subobjects of the current object to see if any of them are named "value". Quote:
Have you taken a look at the Python source code? It is very similar in its approach to variable and object storage and reference methods. If you are conversant in C, I'd recommend looking at the Extending and Embedding section, to see how Python developers solved these problems -- even if you yourself decide to do it another way. I'm not sure I'm making any sense, though. Nominal Animal |
One thing I'm confused about is that you're talking about the parser as if it actually interprets the code. I thought it should just build the AST and then the interpretation will be done in a completely different function.
Quote:
Quote:
Again, objects in my interpreter are actually scopes, they're the same thing. And they have no type, they're basically nesting associative arrays. If you think that's a bad idea and there's a better solution, you can tell me. |
Quote:
Quote:
Quote:
Consider an if clause. In your AST, you have an if-statement node, with two or three child AST nodes: expression, true-statement, and optionally false-statement. Correct? Your if statement evaluator function will call the boolean conversion function on the expression AST node, to determine which statement to execute. If the expression node is a leaf, the function will just peek inside the referred machine object, and then return true or false, depending on the contents. If the expression is more complex, your conversion function will notice that the AST node is not a leaf. It must be a logical operator, with one or more child AST nodes. So, it calls the logical operator (say AND) evaluator function, giving it the child AST nodes as parameters. (I assume your parser has already taken care of the operator precedence, so the expression is a correct AST tree, and not just a chain of operations.) The logical operator (AND) evaluator function will call the boolean conversion function on the first parameter. If it evaluates to FALSE, it will return machine FALSE. (These are the standard short circuit rules.) Otherwise it will call the conversion function on the second parameter, and return whatever it returns. In this fashion, the evaluation goes on recursively, and the conversion functions peek at the machine objects. Program control statements are evaluated using machine types. Okay, now back on track. If I've understood you correctly, you are having a problem of associating a small AST tree for "obj.value" ("obj" AST node which has a "." child AST node and a "value" grandchild AST node) to the machine object it represents. In this case your assignment target evaluator function must first resolve the "obj" AST node to the proper object -- I think you already have this for standard variables. Then it must notice that it is not a leaf node, but has a "." child; therefore it must resolve the child node as an object, from within the "obj" object, and use that as the target. Note that this assumes your lexer and parser recognize and understand the "." as an operator. In practice, each of your variable and object properties or fields or methods, must have a corresponding machine objects, with a recognizable label the evaluator functions can find. Python uses a single mechanism for all variables, fields and methods. It has two main objects, globals, and local. A new search will start at the local object. If not found, it'll check the global object. All further searches ("." operators) are limited to the current object. In C, your object reference resolver could look something like this: Code:
typedef struct object object_t; Quote:
One thing I'd do, for embedding reasons, is to make sure it is possible to instantiate multiple interpreters in the same process and thread. In other words, that the entire interpreter state is encapsulated in a single structure supplied to the main interpreter function. This is extremely useful when an application fires up scripts to handle certain events: the event handler scripts can then each be a separate instance. It would be especially if used as an Apache module: for each request, the worker process could clone the interpreter instance from a preconfigured, prepared state (perhaps even having run an initialization script for that subtree at configuration time). After processing the request, the instance is just thrown away. That would make for an extremely robust and effective scripting language for web pages.. Waiting to hear more, Nominal Animal |
Quote:
Quote:
Quote:
Operators, when implemented, would probably just call predefined functions on the object behind the scenes. That way, you can even use the operators for your own objects, not just the built-in ones. Quote:
|
Quote:
Code:
typedef enum objtype objtype_t; For example, a generic object matches exactly your scope concept. It doesn't need to have any data, just child objects, and they can have any format. Since each has their own name, you already have a basic associative list construct. If you keep the child array as a binary tree, you can locate any child in logarithmic time. If you also use the child_hash array with e.g. crc32's of the name to be looked up, you can do extremely large lists/scopes efficiently. A hash table is constant-time, and thus probably even faster, but I find rehashing tedious. A tight binary tree as an array is very, very fast. There are other, more interesting features you can add. For example, instead of embedding the data like above, you can use pointers and reference counting to save memory. Nested object structure is already very useful for garbage collection; whenever you exit the current scope, just free the current scope object and all objects it contains. As to the conversion functions, consider this as an example: Code:
/* 1: True, 0: False. Quote:
So, exactly what use case you have in mind for your interpreter? Nominal Animal |
Quote:
And I don't understand some of object's members: What's the idea of the object storing its name? It could have many different names at once since it can be stored in variables and copied. Why is there a limit on the number of children? Why is the child member an object_t*, not and object_t**? How can you ensure that the children will be created one after another in an array, and not allocated in random parts of the computer's memory? Seems like you need an array of pointers to the children. And finally, what's the hash for? Quote:
I'm considering using this (link) to make things easier (just write GC_malloc instead of malloc, GC_realloc instead of realloc, and forget about freeing things). Or will it cause serious performance issues? Quote:
And since my idea is that all functions are anonymous, how would you implement it in C, where functions can't be nested and only have predefined global names? I guess you can define a struct that has the parameter names and a pointer to an AST node, but what about functions predefined in C? Quote:
Code:
expr: lvalue |
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Here's some example code for reference counting: Code:
typedef struct storage storage_t; In this manner you destroy the objects whenever you like, as long as you use storage_new for allocating new storage, storage_attach when adding a reference to the storage, and content_detach when removing a reference to the storage (including when destroying an object instance). The data is only freed when there are no more references to it. The garbage chain is needed to avoid referencing already free()d data; the actual freeing is only done when it is surely safe. Quote:
Quote:
Quote:
Quote:
If "." is a normal operator, the AST for a.b is (.(a,b)), right? And for a.b.c it is (.(a,(.(b,c))), right? Okay, then this should work: Code:
struct object { Quote:
Code:
reference: ID | reference DOT ID Nominal Animal |
Quote:
Quote:
Quote:
That wouldn't work because there is no object during parsing, just an expression node that will evaluate to the object diring evaluation, and then you might as well just have a '.' node with the object node and member name node. Quote:
Quote:
Quote:
Quote:
In case that's what confused you, when I wrote "expr: lvalue" I didn't mean that an expr can only be an lvalue, I assumed that it would be clear that expr should also match operators, function calls, etc. |
Quote:
The object_t type matches a named thing; each named thing corresponds to exactly one object_t. The scope the name is valid in determined by the tree (global, local) and the position in the tree. The actual data stored in the object_t is in a separate structure. This separate structure counts any references to it by the variable instances, and is only destroyed when the count reaches zero. When you copy an object, say by evaluating a = b; you find the object_t named b in the current scope. Then you create a new object_t, set its name to a, and copy the storage reference from b, increasing the storage reference count. Then you check if there is an existing object_t named a, and if yes, you replace it (destroying it, decreasing the storage reference count to whatever storage it uses) with the new one. Quote:
The example code resolve_object(AST node, local scope object, global scope object) uses the names stored in the AST nodes and the variable objects (names of variable objects) to resolve both simple object names (like a) and dot operator references (like a.b.c). It should work if the AST for a.b.c is (.(a,.(b,c))). Quote:
Quote:
Quote:
Quote:
Quote:
Nominal Animal |
Quote:
Anyway, I also wanted to mention that saved scopes aren't the only data type that will be stored in variables: functions will be, too. I guess I need a data type struct, that would say whether it's an object or a function, and the object and function types will "inherit" the data type struct by placing the data type struct's members before the type-specific members. That way an object or function can be cast to a generic data type struct. Quote:
Quote:
For example, if the node is an if/else node, the eval function would assume that it has 3 children: first the test, then the true-body, then the false-body. Quote:
Quote:
|
I started writing code to evaluate the syntax tree:
Code:
typedef enum { I started naming things that can be stored in variables "object"s, functions "function"s, and environments (that will serve as scopes and can be saved in variables) "env"s. If you have a better idea, tell me. |
Okay. Let's assume you have this code:
Code:
print("Command: ", argv.0, "\n") Code:
CALL:"print" ( STRING:"Command: ", Code:
CALL ─── FORLOOP ─── CALL ─── IF ─── CALL Code:
typedef enum { Code:
typedef struct node node_t; In case of NODE_STRING, it should not contain the double quotes. Each of your objects, functions and envs require a name. You can treat them all as generic objects. Assuming again a simple binary tree form, and not child arrays: Code:
typedef struct object object_t; When used internally by your parser, you cast it to the proper subtype first. Code:
/* This bit is common to all data, found in the 'common' member Nominal Animal |
Quote:
There is no such thing here as functions with permanent, predefined names. Quote:
Quote:
Also, what's so bad about using a separate list type? |
Quote:
If you want to allow an expression to evaluate to a function name, similar to PHP variable functions, the function name needs to be specified in the first child node instead of the AST node itself. For example, a print call AST would be (CALL("print", ...)). The difference is not significant, really. Quote:
If you mean that you don't want name strings in the AST nodes, then I'm stumped. I don't know of any method you can refer to the objects then. What do you do with the strings the script programmer uses to label the objects and functions and variables? Quote:
Quote:
Quote:
In my opinion, the memory layout of the object hierarchy should follow the logical hierarchy: members of an object should be children to the object. The data or code that defines the object, is in a separate, reference-counted structure, which has no name. The distinction is similar to directory entries and inodes in POSIX file systems. The interpreter has only pointers to two objects: the global object, and the current local scope object. When exiting a local scope, the interpreter changes to the parent of the current local scope object, and the current local scope object is destroyed. Since all data is reference-counted, return values and so on are perfectly safe, and garbage collection is easily implemented. Only the objects have names. The strings the script programmer uses to refer to anything is resolved to objects, based on the object names. I have zero idea how you intend to refer to anything without using names. Quote:
Nominal Animal |
All times are GMT -5. The time now is 03:48 PM. |