Parser in Perl

MTK358 · 01-28-2010, 07:31 AM

I was reading through the thread and saw, and I don't understand:

Quote:

Originally Posted by Sergei Steshenko

... I have already described here in part my PerlPreProcessor - have a look at it. It can help you a lot with text substitution, it can cope with metadata, unlike C++ template it is stateful, i.e. one can pass data between templates - because it's all in Perl.

And no new language is invented - in your case it will be pure "C" + pure Perl. The only new entity is simple reserved comments - like

Code:

// PERL_BEGIN
// PERL_END
// PERL_ONE_LINER

.

Also, I just can't find a good explanation of extract_bracketed()'s return value.

Sergei Steshenko · 01-28-2010, 12:16 PM

Quote:

Originally Posted by MTK358

Anyway, here is my current code, it first slurps the file, splices escaped newlines, writes it to a new file with the extension changes to ".c", runs cpp to extract the comments and write to a temp file, and then gets rid of the temp file.

The remarkable thing is that it worked perfectly the first try!!!

Code:

#!/usr/bin/env perl

foreach $filename (@ARGV) {
	open(INFILE, "<$filename");
	undef $/;
	$file = <INFILE>;
	close(INFILE);
	
	$file =~ s/\\\n//g;
	
	$outfilename = $filename;
	$outfilename =~ s/(.*)\..*/\1.c/;
	open(OUTFILE, ">$outfilename");
	print OUTFILE $file;
	close(OUTFILE);
	
	system("cpp -fpreprocessed $outfilename -o $outfilename.temp");
	system("rm $outfilename");
	system("mv $outfilename.temp $outfilename");
}

I have a very strict rule - not to bother with Perl code not having

Code:

use strict;
use warnings; # one can use -w switch on Perl invocation line instead/in addition to this pragma

- even if the code works perfectly.

...

Code:

undef $/;

is a bad idea, though it works. It's a bad idea because it is a global setting change.

I prefer not to touch it, and instead I write:

Code:

my $file_buffer = join('', <$fh>);

.

If one still feels to change a global setting, the dynamic scoping approach should better be used:

Code:

local($/); undef $/;

.

Of course, reading

perldoc -f local

is recommended.

...

Regarding

Code:

	system("rm $outfilename");
	system("mv $outfilename.temp $outfilename");

- nah, not elegant 'cause too much shellish.

perldoc -f unlink
perldoc -f rename

are your friends.

And what for file handles are global ?

Sergei Steshenko · 01-28-2010, 12:24 PM

Quote:

Originally Posted by MTK358

Using the browser's find function, I discovered "-fpreprocessed" may do the job. But the problem is that it doesn't splice escaped newlines.

EDIT: that might not be an issue, I can probably splice escaped newlines in Perl using s/\\\n//g.

EDIT2: I've tested the splicing trick, and it seems to work just as described in the CPP manual.

Now, how do you find all instances of "keyword { ... }" and process them with Text::Balanced?.

Item in bold is both smart and not. It is smart because you've quickly found what you need - using my prompt.

It is not because I am capable of giving such a prompt and you are apparently not at the moment. And it is because I still like reading full documentation - despite the browser "Find" function - the old fashioned pre-Internet way.

...

Regarding the item in bold - doesn't this post of mine: http://www.linuxquestions.org/questi...02#post3842502 have enough of a suggestion ?

Sergei Steshenko · 01-28-2010, 12:34 PM

Quote:

Originally Posted by MTK358

...
Also, I just can't find a good explanation of extract_bracketed()'s return value.

What about looking for

General behaviour in list contexts

(copy-pasted by me) in the output of 'perldoc -t Text::Balanced' or in the WEB page describing the module ? The answer is there.

MTK358 · 01-28-2010, 01:07 PM

Quote:

Originally Posted by Sergei Steshenko

I have a very strict rule - not to bother with Perl code not having

Code:

use strict;
use warnings; # one can use -w switch on Perl invocation line instead/in addition to this pragma

I did that, along with a few changes from yesterday:

Code:

#!/usr/bin/env perl

use strict;
use warnings;

# Iterate over all arguments
foreach $filename (@ARGV) {
	# Check for things that would cause errors later on
	if(!(-e $filename)) die "error: $filename does not exist\n";
	if(!(-r $filename)) die "error: you do not have permission to read $filename\n";
	if(-d $filename) die "error: $filename is a directory\n";
	
	# Read file into string
	open(FILE, "<$filename");
	undef $/;
	my $file = <FILE>;
	close(FILE);
	
	# Splice newlines escaped with a backslash
	$file =~ s/\\\n//g;
	
	# output filename variable
	my $outfilename = $filename;
	# Replace extension with ".c"
	$outfilename =~ s/(.*)\..*/\1.c/; # TODO: be able to handle files with no dot in the name
	# Write spliced string to new filename
	open(FILE, ">$outfilename");
	print FILE $file;
	close(FILE);
	
	# Remove comments from spliced file using the C preprocessor
	system("cpp -fpreprocessed $outfilename -o $outfilename.temp") == 0 or die "error: cpp failed to remove comments\n";
	# Take care of the temp file
	system("rm $outfilename") == 0 or die "error: rm failed\n";
	system("mv $outfilename.temp $outfilename") == 0 or die "error: mv failed\n";
	
	# Read in processed file for actual OO conversion
	open(FILE, "<$outfilename");
	undef $/;
	$file = <FILE>;
	close(FILE);
	
	# Find all class definitions and process them
	my $offset = 0;
	my $result = index($file, 'class', $offset);
	while($result != -1) {
		my @classdef = extract_bracketed(substr($file, $result, -1), '{}', '[^{]*');
		# TODO: process class and turn it into C code
		$offset = $result + 1;
		$result = index($file, 'class', $offset);
	}
}

It has some errors I don't understand:

Code:

\1 better written as $1 at ./test line 25.
Global symbol "$filename" requires explicit package name at ./test line 7.
Global symbol "$filename" requires explicit package name at ./test line 9.
syntax error at ./test line 9, near ") die"
Global symbol "$filename" requires explicit package name at ./test line 10.
syntax error at ./test line 10, near ") die"
Global symbol "$filename" requires explicit package name at ./test line 11.
syntax error at ./test line 11, near ") die"
Global symbol "$filename" requires explicit package name at ./test line 14.
Global symbol "$filename" requires explicit package name at ./test line 23.
Execution of ./test aborted due to compilation errors.

Quote:

Originally Posted by Sergei Steshenko

Code:

undef $/;

is a bad idea, though it works. It's a bad idea because it is a global setting change.

I don't understand how it works. I just copy-and-pasted it out of a google search. As you can tell, I am still very much a Perl newbie.

Quote:

Originally Posted by Sergei Steshenko

Regarding

Code:

	system("rm $outfilename");
	system("mv $outfilename.temp $outfilename");

perldoc -f unlink
perldoc -f rename

are your friends.

I'll read up on that.

Quote:

Originally Posted by Sergei Steshenko

And what for file handles are global ?

I don't understand.

MTK358 · 01-28-2010, 01:10 PM

Quote:

Originally Posted by Sergei Steshenko

What about looking for

General behaviour in list contexts

(copy-pasted by me) in the output of 'perldoc -t Text::Balanced' or in the WEB page describing the module ? The answer is there.

So element #0 of extract_bracketed()'s returned array is the matched text, #1 is all the text after the match, and #2 is the prefix, right?

Telemachos · 01-28-2010, 02:00 PM

Quote:

Originally Posted by MTK358

Quote:

Originally Posted by Sergei Steshenko

And what for file handles are global ?

I don't understand.

You are using global, bareword filehandles (e.g., FILE), but modern Perl supports (and encourages) lexical filehandles:

Code:

open my $file, '<', '/path/to/file.txt'
    or die: "Can't open file.txt: $!";

while (<$file>) {
    # whatever
}

close $file or die "Problem closing file.txt: $!";

Check out perldoc -f open and perldoc perlopentut for some discussion.

MTK358 · 01-28-2010, 02:47 PM

Quote:

Originally Posted by Telemachos

You are using global, bareword filehandles (e.g., FILE), but modern Perl supports (and encourages) lexical filehandles:

Code:

open my $file, '<', '/path/to/file.txt'
    or die: "Can't open file.txt: $!";

while (<$file>) {
    # whatever
}

close $file or die "Problem closing file.txt: $!";

Check out perldoc -f open and perldoc perlopentut for some discussion.

I fixed it now, but what about these errors?

Code:

\1 better written as $1 at ./test line 25.
syntax error at ./test line 9, near ") die"
syntax error at ./test line 10, near ") die"
syntax error at ./test line 11, near ") die"
Execution of ./test aborted due to compilation errors.

MTK358 · 01-31-2010, 10:03 AM

I was considering the design of this "new programming language", and thought that it would be easier if there would be a polymorphic struct-like type, but will not have the complications of methods and a dynamic dispatch table.

So I got the idea to make a "pstruct" (Polymorphic Structure) construct:

Code:

pstruct Shape2D {
	int x;
	int y;
}

// Shape3D inherits all of Shape2D's fields, and adds to it.

pstruct Shape3D Shape2D {
	int z;
}

Translated to C code:

Code:

#define Shape2D_PSTRUCT_TEMPLATE \
	int x; \
	int y; \

struct Shape2D {
	Shape2D_PSTRUCT_TEMPLATE
}
typedef struct Shape2D Shape2D;

#define Shape3D_PSTRUCT_TEMPLATE Shape2D_PSTRUCT_TEMPLATE \
	int z; \

struct Shape3D {
	Shape3D_PSTRUCT_TEMPLATE
}
typedef struct Shape3D Shape3D;

And the class construct will probably be much simpler if it could use this extra layer of abstraction.

Sergei Steshenko · 01-31-2010, 11:33 AM

Quote:

Originally Posted by MTK358

So element #0 of extract_bracketed()'s returned array is the matched text, #1 is all the text after the match, and #2 is the prefix, right?

That's how it works for me.

Sergei Steshenko · 01-31-2010, 11:34 AM

Quote:

Originally Posted by MTK358

I fixed it now, but what about these errors?

Code:

\1 better written as $1 at ./test line 25.
syntax error at ./test line 9, near ") die"
syntax error at ./test line 10, near ") die"
syntax error at ./test line 11, near ") die"
Execution of ./test aborted due to compilation errors.

Why do you have ':' after 'die' ?

Sergei Steshenko · 01-31-2010, 11:37 AM

Quote:

Originally Posted by MTK358

I was considering the design of this "new programming language", and thought that it would be easier if there would be a polymorphic struct-like type, but will not have the complications of methods and a dynamic dispatch table.

So I got the idea to make a "pstruct" (Polymorphic Structure) construct:

Code:

pstruct Shape2D {
	int x;
	int y;
}

// Shape3D inherits all of Shape2D's fields, and adds to it.

pstruct Shape3D Shape2D {
	int z;
}

Translated to C code:

Code:

#define Shape2D_PSTRUCT_TEMPLATE \
	int x; \
	int y; \

struct Shape2D {
	Shape2D_PSTRUCT_TEMPLATE
}
typedef struct Shape2D Shape2D;

#define Shape3D_PSTRUCT_TEMPLATE Shape2D_PSTRUCT_TEMPLATE \
	int z; \

struct Shape3D {
	Shape3D_PSTRUCT_TEMPLATE
}
typedef struct Shape3D Shape3D;

And the class construct will probably be much simpler if it could use this extra layer of abstraction.

Maybe you don't have to invent anything:

http://freshmeat.net/projects/ctpp
http://freshmeat.net/projects/ctalk-lang

MTK358 · 01-31-2010, 12:51 PM

It's interesting that someone else is trying to do such a thing, and I might check them out fur fun later.

Anyway, I am trying to figure out how to turn the above example into:

A string containing the pstruct's name

An array of the names of the pstruct's parents (they can have multiple inheritance)

An array of the pstruct's members.

I already have the code to turn those into valid C code, but I can't seem to fugure out how to get them from the original.

I am using a subroutine called "process_pstruct()", that takes a pstruct definition taken out of the input file, and returns it translated to C.

Here's how I see it would work:

Code:

pstruct Shape2D { // All code before the first "{" is the header
	int x;
	int y;
} // All code between the first "{" and the last "}" is the members

Then I split the header across matches of /\s+/, discard the first element (which is always the string "pstruct"), use the second one as the name, and the rest (if any) are the parents.

Next I split the members across /\s*;\s*/, and get the member array

MTK358 · 01-31-2010, 05:10 PM

I got everything working, but no matter what I try I just can't separate these two parts and put then into separate string scalars:

Code:

pstruct Shape2D {
    int x;
    int y;
}

What I need to do it to extract the green and red hilighted portions and put them into two string vars. I already have the code to do everything else.

I've tried everything from every possible regex I could think of to index() and substr(), but nothing works!

Sergei Steshenko · 01-31-2010, 05:25 PM

Quote:

Originally Posted by MTK358

I got everything working, but no matter what I try I just can't separate these two parts and put then into separate string scalars:

Code:

pstruct Shape2D {
    int x;
    int y;
}

What I need to do it to extract the green and red hilighted portions and put them into two string vars. I already have the code to do everything else.

I've tried everything from every possible regex I could think of to index() and substr(), but nothing works!

Are telling us that Text::Balanced can't extract the {<whatever_text>} part ?

Or you are telling us that, having the <whatever_text>, which in your case is

Code:

int x;
int y;

you can't separate, say, 'x' from 'y' when the above text is written as

Code:

int x; int y;

?