LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-25-2010, 12:08 PM   #1
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Parser in Perl


I wonder if you have a large string in Perl, for example this:

Code:
stuff
keyword morestuff { blah{
blah{}
}blah
} test
end
Is it possible to make it extract everything from the beginning of the sequence "keyword" until the matching brace to the first left brace after "keyword"?

For example, it should extract this from the above example:

Code:
keyword morestuff { blah{
blah{}
}blah
}
And is is possible to pass this section to a function, where it is processed, and then to replace it with the function's returned string?
 
Old 01-25-2010, 03:19 PM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
I wonder if you have a large string in Perl, for example this:

Code:
stuff
keyword morestuff { blah{
blah{}
}blah
} test
end
Is it possible to make it extract everything from the beginning of the sequence "keyword" until the matching brace to the first left brace after "keyword"?

For example, it should extract this from the above example:

Code:
keyword morestuff { blah{
blah{}
}blah
}
And is is possible to pass this section to a function, where it is processed, and then to replace it with the function's returned string?
Start from here:

http://perldoc.perl.org/Text/Balanced.html ,
http://perldoc.perl.org/perlfaq6.html - in this one look for "Can I use Perl regular expressions to match balanced text?".

...

I once wrote a simple elegant parser specifically for nested pairs of {} ; I hope the aboveText::Balanced is a better solution. If not, I'll describe my idea.
 
Old 01-25-2010, 03:26 PM   #3
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
... Just curious - maybe you are parsing SYNOPSYS library format file ? That's because I wrote my nested {} parser exactly for this purpose.
 
Old 01-25-2010, 03:32 PM   #4
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
No, it's for a silly little project I am doing that adds classes and polymorphism tc C using a simple Perl search-and-replace script. It actually kind of works now but the syntax is really terrible and I thought that using curly braces as the block delimiter and semicolons insted of newlines as separators would make it fit in much nicer.

Basically I want the syntax to be:

Code:
class ClassName SuperClass1 SuperClass2 ... {
    int var;

    char str;

    void method();

    int anotherMethod(int a) {
        return a + this.var;
    }
}

Last edited by MTK358; 01-25-2010 at 03:35 PM.
 
Old 01-25-2010, 04:11 PM   #5
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
No, it's for a silly little project I am doing that adds classes and polymorphism tc C using a simple Perl search-and-replace script. It actually kind of works now but the syntax is really terrible and I thought that using curly braces as the block delimiter and semicolons insted of newlines as separators would make it fit in much nicer.

Basically I want the syntax to be:

Code:
class ClassName SuperClass1 SuperClass2 ... {
    int var;

    char str;

    void method();

    int anotherMethod(int a) {
        return a + this.var;
    }
}
Oh, it looks like you're in trouble: think of the following:

Code:
char *s = "a nasty string with ; inside";
- what will your parser do with the ';' inside ? Or it won't delve into strings ? I.e. are you writing a partial parser ?

But still:

Code:
char *s = "a nasty string with { ... } inside";
I.e. the robust parser should:
  1. get rid of comments (and there should be a possibility to restore them);
  2. temporarily get rid of strings
.

My point is that to make a half-hearted parser robust one should make it more than just half hearted .
 
Old 01-25-2010, 04:15 PM   #6
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
... I have already described here in part my PerlPreProcessor - have a look at it. It can help you a lot with text substitution, it can cope with metadata, unlike C++ template it is stateful, i.e. one can pass data between templates - because it's all in Perl.

And no new language is invented - in your case it will be pure "C" + pure Perl. The only new entity is simple reserved comments - like

Code:
// PERL_BEGIN
// PERL_END
// PERL_ONE_LINER
.
 
Old 01-25-2010, 04:45 PM   #7
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I haven't thought about strings containing semicolons or braces. The parser will somehow have to ignore characters in double or single quotes.
 
Old 01-26-2010, 07:27 AM   #8
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
I haven't thought about strings containing semicolons or braces. The parser will somehow have to ignore characters in double or single quotes.
And it's more than that - C99 allows anonymous structs to be passed as function parameters, and the srtucts too contain { ... }.
 
Old 01-26-2010, 09:21 AM   #9
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by Sergei Steshenko View Post
And it's more than that - C99 allows anonymous structs to be passed as function parameters, and the srtucts too contain { ... }.
That won't be a problem if the parser looks for the matching brace, not the first right brace.

I mabe a quick little program that does this, but it doesn't seem to work:

Code:
$text = "{ testext } to test.";

print get_matching_brace($text, 0);

# get_matching_brace(string, index of left brace)
# returns text between matching braces, including braces
sub get_matching_brace {
	$start = $index;
	$index = $_[1];
	$depth = 1;
	while(depth > 0) {
		$index++;
		if(substr($_[0], $index, 1) eq '{') {
			$depth++;
		} elsif(substr($_[0], $index, 1) eq '}') {
			$depth--;
		}
	}
	return substr($_[0], $start, $index);
}
 
Old 01-26-2010, 01:53 PM   #10
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
That won't be a problem if the parser looks for the matching brace, not the first right brace.

I mabe a quick little program that does this, but it doesn't seem to work:

Code:
$text = "{ testext } to test.";

print get_matching_brace($text, 0);

# get_matching_brace(string, index of left brace)
# returns text between matching braces, including braces
sub get_matching_brace {
	$start = $index;
	$index = $_[1];
	$depth = 1;
	while(depth > 0) {
		$index++;
		if(substr($_[0], $index, 1) eq '{') {
			$depth++;
		} elsif(substr($_[0], $index, 1) eq '}') {
			$depth--;
		}
	}
	return substr($_[0], $start, $index);
}
Nah. It's "not" Perl. Use regular expressions - the engine can cope with multi-new-line strings.

And won't you try Text::Balanced ?
 
Old 01-26-2010, 02:12 PM   #11
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by Sergei Steshenko View Post
And won't you try Text::Balanced
I would like to try but I can't find a good, simple explanation.
 
Old 01-26-2010, 03:34 PM   #12
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
I would like to try but I can't find a good, simple explanation.
You know my standard question in such cases: what is the first thing you do not understand ?
 
Old 01-26-2010, 03:53 PM   #13
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
This, from the Text::Balanced SYNIPSIS section:

Code:
use Text::Balanced qw (
                        extract_delimited
                       extract_bracketed
                       extract_quotelike
                       extract_codeblock
                       extract_variable
                        extract_tagged
                  extract_multiple
                        gen_delimited_pat
                       gen_extract_tagged
                     );
 
Old 01-26-2010, 04:05 PM   #14
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
This, from the Text::Balanced SYNIPSIS section:

Code:
use Text::Balanced qw (
                        extract_delimited
                       extract_bracketed
                       extract_quotelike
                       extract_codeblock
                       extract_variable
                        extract_tagged
                  extract_multiple
                        gen_delimited_pat
                       gen_extract_tagged
                     );

This piece of code tells which functions to import from Text::Balanced. You most likely will need 'extract_bracketed', and maybe 'extract_codeblock' - the latter one may help dealing with "nasty" strings.

Just copy the above piece - it will import all the functions, you may need some of them later.

But start from 'extract_bracketed'.
 
Old 01-26-2010, 07:25 PM   #15
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Next, I don't understand extract_bracketed()'s third parameter.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
checking for XML::Parser... configure: error: XML::Parser perl module is required for kornerr Linux - General 11 11-16-2008 07:24 AM
perl xml::parser dirhandle problem theshark Linux - Software 0 03-16-2006 06:45 PM
XML::Parser perl module is required for intltool, for LogJam GT_Onizuka Linux - Newbie 7 06-30-2005 07:49 AM
XML::Parser perl module is required farzan Linux - Software 8 09-26-2004 05:54 AM
XML::Parser perl mod is req 4 intltool error BorisMcHack Slackware 4 06-23-2004 07:51 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration