ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Thanks all for your help. I now have a functional script (it works at least on the simple "foo bar" example that I posted before), but the performance of this script is quite bad (while loops...). Do you have any idea on how to improve the code pasted below ? Thanks again for your valuable help.
Gilles
Code:
#!/usr/bin/perl -w
use strict;
#-----------------#
# PREAMBLE #
#-----------------#
# Configuration variables
my $correctLaTeX = $ARGV[2];
my $verbose = 0; # 0 = false
# Print the value of the command line arguments
if ($verbose){
my $numArgs = $#ARGV + 1;
print "You provided $numArgs arguments\n";
print "Input file is $ARGV[0]\n";
print "Output file is $ARGV[1]\n";
print "Modification files is $ARGV[2]\n\n"
}
#-----------------#
# MAIN OPERATIONS #
#-----------------#
# Open input file in read mode
open INPUTFILE, "<", $ARGV[0] or die $!;
# Open output file in write mode
open OUTPUTFILE, ">", $ARGV[1] or die $!;
#$modif = "s/ foo / bar /g";
# Read the input file line by line :
while (my $input_line = <INPUTFILE>) {
# remove the end of line character
# Open the list of corrections in read mode
open CORRECTIONFILE, "<", $correctLaTeX or die $!;
# Read the modification file line by line :
while (my $modif = <CORRECTIONFILE>){
# Remove the comments
$modif =~ s/#.*$// ;
if ($modif =~ /^[ ]*$/) {
# Nothing to do (empty modification)
} else {
if ($verbose){
print("$input_line") ;
}
my $counter == 0
# Apply the modification (up to twenty times)
while ( eval("\$input_line =~ $modif") and $counter < 20 ){
$counter += 1;
};
if ($verbose){
print("$input_line") ;
}
}
}
# Write the modified line to the output file
print OUTPUTFILE $input_line;
close CORRECTIONFILE;
}
# Close the input and output files
close INPUTFILE;
close OUTPUTFILE;
Why do you use 'eval' ? 'eval' is compilation + linking + execution, and that it is why your code is slow.
Reread regular expressions tutorial - you do not need 'eval'.
Unfortunately, one of the requirements for my software is that the list of corrections that I apply must be reusable, and may include more complex modifications than just a single substitution. Thus, I would like to keep a formulation that does not parse the regexp.
With my limited understanding of what I read, I guess that I could save a lot of time by compiling a single time every regexp of my modification file. In the current version, a loop reads every line of the text file that must be modified. Then, a second loop compiles and applies every regexp of the modification file on the previous file. So, for a text file of 1000 lines I could divide the regexp compilation time by a factor of 1000 with the \o switch. Is that correct ?
What can I do for the substitution commands that reuse a captured pattern ($1, $2...) ; will they still work with the \o pattern ?
I really appreciate your help. Thanks a lot for sharing your knowledge.
Unfortunately, one of the requirements for my software is that the list of corrections that I apply must be reusable, and may include more complex modifications than just a single substitution. Thus, I would like to keep a formulation that does not parse the regexp.
With my limited understanding of what I read, I guess that I could save a lot of time by compiling a single time every regexp of my modification file. In the current version, a loop reads every line of the text file that must be modified. Then, a second loop compiles and applies every regexp of the modification file on the previous file. So, for a text file of 1000 lines I could divide the regexp compilation time by a factor of 1000 with the \o switch. Is that correct ?
What can I do for the substitution commands that reuse a captured pattern ($1, $2...) ; will they still work with the \o pattern ?
I really appreciate your help. Thanks a lot for sharing your knowledge.
my $pattern = '^\d+$';
eval q{
foreach (@list) {
print if /$pattern/o;
}
}
is a good one.
AFAIK the $1, $2 ...$N mechanism is orthogonal to 'o' switch, i.e. it should work IMO regardless of /o.
...
Anyway, aren't you in too early in the optimization stage ? On the one hand, I'm pretty much aware of the 'o' switch, on the other, I rarely use it - even without it the scripts seem to be fast enough.
...
You probably can change your approach to something like
Code:
my @regexes_and_repclacements; # need to fill it
...
for(;;)
{
last unless @regexes_and_replacements;
my $regex = shift @regexes_and_replacements;
my $replacement = shift @regexes_and_replacements;
my $match_and_replace_sub =
sub
{
my ($line_scalar_ref) = @_;
${$line} =~ s/$regex/$replacement/o; # recompiled once for each new $regex, $replacement
};
foreach my $line(@lines)
{
$match_and_replace_sub->(\$line);
# do something with $line after replacement if any
}
}
- the idea is that each line is processed a number of times, and for each time you have once compiled $regex, $replacement pair.
By the way, this is an example of closures - $match_and_replace_sub subroutine inherits lexical variables from outer scope.
Dear all,
i am pretty new to perl, but love this language very much.
Perl one liners are like perfect piece of coding but i am facing a problem and seeks your guidance.
I have a file containing several entries like this LOC_Os04g58220|13104.t05295
I want to replace |13104.t05295 fragment with nothing. When i place this in the above mentioned one liner, it completes the job but somehow | character remains. One more thing worth mention here that these number are variables, so is there anyway with one liners that i can remove this particular string by using wild card characters??
I had tried, but alas! it didn't work for me.
Asterisk (*) does NOT mean "one of any character", it means "zero or more of the preceding character". Period (.) means "one of any character". Thus ".*" means "zero or more of any character".
So your regex should be "s/|.*//ge".
Quote:
Originally Posted by vinay.baranwal
Since i am using windows platform
Since you are using the Windows platform you don't belong in this forum. But there is nothing wrong with you trying Linux, you might like it! It's free, very easy, and you can try it out without risking your Windows installation using a "Live CD", which basically boots a Linux desktop off your CD/DVD drive!
regular expressions are not always the way to go sometimes. with your requirement, you can just do string splitting. See perldoc -f split() and then get the first element. Or with newer Perl, you can use -F option.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.