LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-29-2009, 04:49 AM   #16
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454

Quote:
Originally Posted by raimizou View Post
Thanks all for your help. I now have a functional script (it works at least on the simple "foo bar" example that I posted before), but the performance of this script is quite bad (while loops...). Do you have any idea on how to improve the code pasted below ? Thanks again for your valuable help.

Gilles



Code:
#!/usr/bin/perl -w
use strict;


#-----------------#
#     PREAMBLE    #
#-----------------#


# Configuration variables
my $correctLaTeX = $ARGV[2];
my $verbose = 0; # 0 = false

# Print the value of the command line arguments
if ($verbose){
  my $numArgs = $#ARGV + 1;
  print "You provided $numArgs arguments\n";
  print "Input file is $ARGV[0]\n";
  print "Output file is $ARGV[1]\n";
  print "Modification files is $ARGV[2]\n\n"
}

#-----------------#
# MAIN OPERATIONS #
#-----------------#

# Open input file in read mode
open INPUTFILE, "<", $ARGV[0] or die $!;
# Open output file in write mode
open OUTPUTFILE, ">", $ARGV[1] or die $!;

#$modif = "s/ foo / bar /g";

# Read the input file line by line :
while (my $input_line = <INPUTFILE>) {
  # remove the end of line character
  # Open the list of corrections in read mode
  open CORRECTIONFILE, "<", $correctLaTeX or die $!;	
  # Read the modification file line by line :
  while (my $modif = <CORRECTIONFILE>){
    # Remove the comments	  
    $modif =~ s/#.*$// ;
    if ($modif =~ /^[ 	]*$/) {
      # Nothing to do (empty modification)
    } else {
      if ($verbose){
        print("$input_line") ;
      }
      my $counter == 0
      # Apply the modification (up to twenty times)
      while ( eval("\$input_line =~ $modif") and $counter < 20 ){
	$counter += 1;
      };
      if ($verbose){
        print("$input_line") ;
      }
    }
  }
  # Write the modified line to the output file
  print OUTPUTFILE $input_line;   
  close CORRECTIONFILE;
}

# Close the input and output files
close INPUTFILE;
close OUTPUTFILE;
Why do you use 'eval' ? 'eval' is compilation + linking + execution, and that it is why your code is slow.

Reread regular expressions tutorial - you do not need 'eval'.
 
Old 04-30-2009, 01:33 AM   #17
raimizou
LQ Newbie
 
Registered: Apr 2009
Posts: 8

Rep: Reputation: 0
use of eval

Thanks, Sergei, for your answer. I will reread the perldoc perlretut.

I have tried not to use eval, but did not manage to avoid it yet. I use eval in
Code:
eval("\$input_line =~ $modif")
because the regular expression is inside a variable
Code:
$modif
and is applied on another variable
Code:
$input_line
.
 
Old 04-30-2009, 04:00 AM   #18
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by raimizou View Post
Thanks, Sergei, for your answer. I will reread the perldoc perlretut.

I have tried not to use eval, but did not manage to avoid it yet. I use eval in
Code:
eval("\$input_line =~ $modif")
because the regular expression is inside a variable
Code:
$modif
and is applied on another variable
Code:
$input_line
.
But regular expression can be written as

Code:
s/$match_regex/$replacement_string/
; pay attention to 'e' switch in regular expressions.

Whenever possible, try to have regular expressions which are known at compile time, and use 'o' switch for efficiency.
 
Old 05-04-2009, 03:52 AM   #19
raimizou
LQ Newbie
 
Registered: Apr 2009
Posts: 8

Rep: Reputation: 0
This time I'm stuck ;)

Thanks for your post Sergei.

I appreciate your solution with
Code:
s/$match_regex/$replacement_string/
Unfortunately, one of the requirements for my software is that the list of corrections that I apply must be reusable, and may include more complex modifications than just a single substitution. Thus, I would like to keep a formulation that does not parse the regexp.

I have reread the regexp tutorial, as well as some web resources about the pre-compilation of the regexp (http://alumnus.caltech.edu/~svhwan/p...gExpLoops.html, http://modperlbook.org/html/6-5-3-Co...pressions.html ...). As I mentioned before, I am a real newbie in Perl, so it was difficult for me to understand the subtleties of pre-compilation.

With my limited understanding of what I read, I guess that I could save a lot of time by compiling a single time every regexp of my modification file. In the current version, a loop reads every line of the text file that must be modified. Then, a second loop compiles and applies every regexp of the modification file on the previous file. So, for a text file of 1000 lines I could divide the regexp compilation time by a factor of 1000 with the \o switch. Is that correct ?

What can I do for the substitution commands that reuse a captured pattern ($1, $2...) ; will they still work with the \o pattern ?

I really appreciate your help. Thanks a lot for sharing your knowledge.
 
Old 05-04-2009, 05:07 AM   #20
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by raimizou View Post
Thanks for your post Sergei.

I appreciate your solution with
Code:
s/$match_regex/$replacement_string/
Unfortunately, one of the requirements for my software is that the list of corrections that I apply must be reusable, and may include more complex modifications than just a single substitution. Thus, I would like to keep a formulation that does not parse the regexp.

I have reread the regexp tutorial, as well as some web resources about the pre-compilation of the regexp (http://alumnus.caltech.edu/~svhwan/p...gExpLoops.html, http://modperlbook.org/html/6-5-3-Co...pressions.html ...). As I mentioned before, I am a real newbie in Perl, so it was difficult for me to understand the subtleties of pre-compilation.

With my limited understanding of what I read, I guess that I could save a lot of time by compiling a single time every regexp of my modification file. In the current version, a loop reads every line of the text file that must be modified. Then, a second loop compiles and applies every regexp of the modification file on the previous file. So, for a text file of 1000 lines I could divide the regexp compilation time by a factor of 1000 with the \o switch. Is that correct ?

What can I do for the substitution commands that reuse a captured pattern ($1, $2...) ; will they still work with the \o pattern ?

I really appreciate your help. Thanks a lot for sharing your knowledge.
I have read the http://modperlbook.org/html/6-5-3-Co...pressions.html page you've mentioned, and yes, the idea to generate a piece of code in which regular expression will be literal, i.e.

Code:
my $pattern = '^\d+$';
eval q{
    foreach (@list) {
        print if /$pattern/o;
    }
}
is a good one.

AFAIK the $1, $2 ...$N mechanism is orthogonal to 'o' switch, i.e. it should work IMO regardless of /o.

...

Anyway, aren't you in too early in the optimization stage ? On the one hand, I'm pretty much aware of the 'o' switch, on the other, I rarely use it - even without it the scripts seem to be fast enough.

...

You probably can change your approach to something like

Code:
my @regexes_and_repclacements; # need to fill it
...
for(;;)
  {
  last unless @regexes_and_replacements;

  my $regex = shift @regexes_and_replacements;
  my $replacement = shift @regexes_and_replacements;

  my $match_and_replace_sub =
  sub
    {
    my ($line_scalar_ref) = @_;
    ${$line} =~ s/$regex/$replacement/o; # recompiled once for each new $regex, $replacement
    };


  foreach my $line(@lines)
    {
    $match_and_replace_sub->(\$line);
    # do something with $line after replacement if any
    }
  }
- the idea is that each line is processed a number of times, and for each time you have once compiled $regex, $replacement pair.

By the way, this is an example of closures - $match_and_replace_sub subroutine inherits lexical variables from outer scope.
 
Old 03-14-2010, 03:32 PM   #21
justaddwater71
LQ Newbie
 
Registered: Mar 2010
Distribution: Ubuntu 9.10
Posts: 1

Rep: Reputation: 0
macemoneta one-liner

macmoneta's one-liner solution just saved me a day of pain and suffering doing some machine learning. You rock.
 
Old 08-10-2010, 02:28 PM   #22
vinay.baranwal
LQ Newbie
 
Registered: Aug 2010
Posts: 3

Rep: Reputation: 0
Help Needed

Dear all,
i am pretty new to perl, but love this language very much.

Perl one liners are like perfect piece of coding but i am facing a problem and seeks your guidance.

I have a file containing several entries like this LOC_Os04g58220|13104.t05295

I want to replace |13104.t05295 fragment with nothing. When i place this in the above mentioned one liner, it completes the job but somehow | character remains. One more thing worth mention here that these number are variables, so is there anyway with one liners that i can remove this particular string by using wild card characters??
I had tried, but alas! it didn't work for me.

perl -i.tiny -pe "s/(|************)//ge" Main.txt_yes

Since i am using windows platform hence " instead of ' is needed.

Prompt response will be much appreciated.

Thanks in advance

Last edited by vinay.baranwal; 08-10-2010 at 02:30 PM.
 
Old 08-10-2010, 04:00 PM   #23
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
It would be much better to start your own thread.

Quote:
Originally Posted by vinay.baranwal View Post
perl -i.tiny -pe "s/(|************)//ge" Main.txt_yes
Asterisk (*) does NOT mean "one of any character", it means "zero or more of the preceding character". Period (.) means "one of any character". Thus ".*" means "zero or more of any character".

So your regex should be "s/|.*//ge".

Quote:
Originally Posted by vinay.baranwal View Post
Since i am using windows platform
Since you are using the Windows platform you don't belong in this forum. But there is nothing wrong with you trying Linux, you might like it! It's free, very easy, and you can try it out without risking your Windows installation using a "Live CD", which basically boots a Linux desktop off your CD/DVD drive!

Last edited by MTK358; 08-10-2010 at 04:04 PM.
 
Old 08-10-2010, 06:25 PM   #24
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
...
Since you are using the Windows platform you don't belong in this forum.
...
Wrong. This particular forum ("Programming") specifically allows programming questions regardless of OS.
 
Old 08-11-2010, 09:43 AM   #25
vinay.baranwal
LQ Newbie
 
Registered: Aug 2010
Posts: 3

Rep: Reputation: 0
Problem persist

Thanks all buddy...

But i am sorry to say that problem still persist.

Since | is an operator symbol ("OR"). When i tried using (|.*) it replace the content of whole file.

And i need to delete | operator symbol from the file along with the numeric following this.

Would somebody help me out.

Thanks
 
Old 08-11-2010, 10:13 AM   #26
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Use "\|" for a literal "|" character.

You should really read a good regular expression tutorial.
 
Old 08-11-2010, 11:53 AM   #27
vinay.baranwal
LQ Newbie
 
Registered: Aug 2010
Posts: 3

Rep: Reputation: 0
Thanks

Yeah !!!!!! It did work.

Thanks buddy. Yeah sure i am going to read tutorials...

Thanks once again
 
Old 08-11-2010, 10:50 PM   #28
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by vinay.baranwal View Post
Perl one liners are like perfect piece of coding but i am facing a problem and seeks your guidance.
one liners are good when they are short and simple(to understand), but becomes messy and hard to read if you make it extremely long.

Quote:
Code:
perl -i.tiny -pe "s/(|************)//ge" Main.txt_yes
regular expressions are not always the way to go sometimes. with your requirement, you can just do string splitting. See perldoc -f split() and then get the first element. Or with newer Perl, you can use -F option.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to replace a string in a text file jpan Linux - General 3 10-14-2012 06:17 PM
Perl: Search and replace directories within text files Erhnam Programming 2 03-07-2006 04:07 AM
Replace text of unknown content with other text in file brian0918 Programming 15 07-14-2005 09:22 PM
Replace text of unknown content with other text in file brian0918 Linux - Software 1 07-14-2005 03:22 PM
replace a string/number in a text file jpan Linux - General 3 10-22-2004 09:33 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration