Need more efficient script PDF417 parsing

Sergei Steshenko · 04-21-2010, 10:36 AM

Quote:

Originally Posted by suse_nerd

Yeah, but I was hoping to be able to take input directly from the barcode scanner, not a file.

What is "directly" ?

And, anyway, you need to debug the script, so repeatedly entering data manually doesn't help.

suse_nerd · 04-21-2010, 12:35 PM

Quote:

Originally Posted by Sergei Steshenko

What is "directly" ?

And, anyway, you need to debug the script, so repeatedly entering data manually doesn't help.

What do you mean by debug the script?

I mean I press the scan button on the barcode scanner and the input comes to STDIN. It works, but only parses the first line of the file:

Code:

$ ./out2.pl < 19_1.txt
Scan barcode and press enter
Got carriage return
Parsing the following input:
 $CENTAUR30298309                            000130287018
     000130318905                            000130295355
     000130295344                            000130295333
     000130209138                            000130210705
     000130217293                            000130273352
     000130292823                            000130292834
     000130293065                            000130293076
     000130293087                            000130293000
     000130293010                            000130293021
     000130292415                            000130292426
     000130292947                            000130292958
     0001$
Wrote a line to data.txt: 30298309,                    ,        ,0001

My script:

Code:

#!/usr/bin/perl
use strict;
use warnings;

print "Scan barcode and press enter\n";
while(defined(my $line = <STDIN>))
{
print "Got carriage return\n";

print "Parsing the following input:\n $line";
#Prompt for $myfile
#While not_end_of_line loop
if($line =~ m/^.{8}(.{8})(.{20})(.{8})(.{4})/)

  {
  open (MYFILE, '>>data.txt');
  print MYFILE "$1,$2,$3,$4\n";
  
  print "\nWrote a line to data.txt: ";
  print "$1,$2,$3,$4\n";
  }
#End not_end_of_line loop
  }

  close (MYFILE);

Obviously I need another loop in there which keeps repeating the operating until we get to the end of the line.

Sergei Steshenko · 04-21-2010, 12:51 PM

Quote:

Originally Posted by suse_nerd

What do you mean by debug the script?

I mean I press the scan button on the barcode scanner and the input comes to STDIN. It works, but only parses the first line of the file:

Code:

$ ./out2.pl < 19_1.txt
Scan barcode and press enter
Got carriage return
Parsing the following input:
 $CENTAUR30298309                            000130287018
     000130318905                            000130295355
     000130295344                            000130295333
     000130209138                            000130210705
     000130217293                            000130273352
     000130292823                            000130292834
     000130293065                            000130293076
     000130293087                            000130293000
     000130293010                            000130293021
     000130292415                            000130292426
     000130292947                            000130292958
     0001$
Wrote a line to data.txt: 30298309,                    ,        ,0001

My script:

Code:

#!/usr/bin/perl
use strict;
use warnings;

print "Scan barcode and press enter\n";
while(defined(my $line = <STDIN>))
{
print "Got carriage return\n";

print "Parsing the following input:\n $line";
#Prompt for $myfile
#While not_end_of_line loop
if($line =~ m/^.{8}(.{8})(.{20})(.{8})(.{4})/)

  {
  open (MYFILE, '>>data.txt');
  print MYFILE "$1,$2,$3,$4\n";
  
  print "\nWrote a line to data.txt: ";
  print "$1,$2,$3,$4\n";
  }
#End not_end_of_line loop
  }

  close (MYFILE);

Obviously I need another loop in there which keeps repeating the operating until we get to the end of the line.

Is the input data you are showing a single line or a multi-line file ?

suse_nerd · 04-21-2010, 01:16 PM

Quote:

Originally Posted by Sergei Steshenko

Is the input data you are showing a single line or a multi-line file ?

The data is on a single line and the header is also - I have updated the original post to make it clearer.

Sergei Steshenko · 04-21-2010, 01:26 PM

Quote:

Originally Posted by suse_nerd

The data is on a single line and the header is also - I have updated the original post to make it clearer.

So, is the 'if' statement executed ?

What happens if you put

Code:

warn "CHECKPOINT in 'if'";

as the first statement of the 'if' body ?

suse_nerd · 04-21-2010, 01:36 PM

Quote:

Originally Posted by Sergei Steshenko

So, is the 'if' statement executed ?

What happens if you put

Code:

warn "CHECKPOINT in 'if'";

as the first statement of the 'if' body ?

I believe it is- it outputs one "line" of data successfully in the CVS format but does not continue any further. It does not parse into the next part of the line.

Code:

$ ./out2.pl < 19_1.txt
Scan barcode and press enter
Got carriage return
Parsing the following input:
CHECKPOINT in 'if' at ./out2.pl line 16, <STDIN> line 1.
 $CENTAUR30298309                            000130287018
     000130318905                            000130295355
     000130295344                            000130295333
     000130209138                            000130210705
     000130217293                            000130273352
     000130292823                            000130292834
     000130293065                            000130293076
     000130293087                            000130293000
     000130293010                            000130293021
     000130292415                            000130292426
     000130292947                            000130292958
     0001$
Wrote a line to data.txt: 30298309,                    ,        ,0001

Sergei Steshenko · 04-21-2010, 01:59 PM

Quote:

Originally Posted by suse_nerd

I believe it is- it outputs one "line" of data successfully in the CVS format but does not continue any further. It does not parse into the next part of the line.

Code:

$ ./out2.pl < 19_1.txt
Scan barcode and press enter
Got carriage return
Parsing the following input:
CHECKPOINT in 'if' at ./out2.pl line 16, <STDIN> line 1.
 $CENTAUR30298309                            000130287018
     000130318905                            000130295355
     000130295344                            000130295333
     000130209138                            000130210705
     000130217293                            000130273352
     000130292823                            000130292834
     000130293065                            000130293076
     000130293087                            000130293000
     000130293010                            000130293021
     000130292415                            000130292426
     000130292947                            000130292958
     0001$
Wrote a line to data.txt: 30298309,                    ,        ,0001

So, why not to increase number on members like this:

Code:

(.{8})(.{20})(.{8})(.{4})

in the 'if' statement and correspondingly why not to print the newly added $N variables into the output file ?

suse_nerd · 04-21-2010, 03:06 PM

Quote:

Originally Posted by Sergei Steshenko

So, why not to increase number on members like this:

Code:

(.{8})(.{20})(.{8})(.{4})

in the 'if' statement and correspondingly why not to print the newly added $N variables into the output file ?

Sorry I'm not really sure what you mean by this.

Can you give an example?

The spec is like this:
#<code_length_8><batch_length_20><expiry_length_8><quantity_length_4>

So do you mean, I should split at the 56th, 76th, 84th character etc, incrementing by a pattern of 8,20,8 then 4 for each iteration?

Is there not a cleaner way of doing this?

Sergei Steshenko · 04-21-2010, 03:12 PM

Quote:

Originally Posted by suse_nerd

Sorry I'm not really sure what you mean by this.

Can you give an example?

The spec is like this:
#<code_length_8><batch_length_20><expiry_length_8><quantity_length_4>

So do you mean, I should split at the 56th, 76th, 84th character etc, incrementing by a pattern of 8,20,8 then 4 for each iteration?

Is there not a cleaner way of doing this?

Who else but you knows character positions ? Is there info in the input stream allowing to recognize fields other than by their positions ?

Sergei Steshenko · 04-21-2010, 03:15 PM

By the way, the thread name doesn't appear to be relevant. I.e. I think you are dealing with a particular file format parsing issue, not specifically with PDF417.

suse_nerd · 04-21-2010, 03:29 PM

Quote:

Originally Posted by Sergei Steshenko

By the way, the thread name doesn't appear to be relevant. I.e. I think you are dealing with a particular file format parsing issue, not specifically with PDF417.

Well it is a PDF417 barcode which needs to be parsed, that is the data you could get from a barcode with corresponds to the PDF417 standard.

No there is nothing else other than the positions in the data. There is nothing else to split the individual fields apart other than the length
So the header is
$CENTAUR = 8 characters
First item of data starts at 9th Character
First record ends at 8+8+20+8+4 = 48th character

Second field would look like this with regard to character numbers:
# 49 50 51 52 53 54 55 56
# 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
# 77 78 79 80 81 82 83 84
# 85 86 87 88
So split at the 56, 76 ,84 and 88

#Input spec PDF417 label
#Header: $CENTAUR
#Record format:
#<code_length_8><batch_length_20><expiry_length_8><quantity_length_4>
#Footer: $
#Fields contain white space if empty
#Max 22 records
#No spaces between fields if values present
#Example complete barcode input

Code:

 #$CENTAUR30292426                            0001$
 #$CENTAUR48347498                            0001483934739                            0001$

Sergei Steshenko · 04-21-2010, 03:35 PM

Quote:

Originally Posted by suse_nerd

...
No there is nothing else other than the positions in the data.
...

Then why do you ask ? I.e. the only (lame) optimization available is not to write manually '.', '{', '(', ')', '}'. Yes, this can be trivially done, i.e. the regular expression can be taken from a variable, and the variable contents can be generated in a loop - each position number will be enclosed into '.', '{', '(, ')', '}'.

But you will have to manually fill array of position numbers.

suse_nerd · 04-21-2010, 03:43 PM

Something like this is working, but is messy as hell

Code:

#!/usr/bin/perl
use strict;
use warnings;

print "Scan barcode and press enter\n";
while(defined(my $line = <STDIN>))
{
print "Got carriage return\n";

print "Parsing the following input:\n $line";
#Prompt for $myfile


if($line =~ m/^.{8}(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})/)

  {
  warn "CHECKPOINT in 'if'";
  open (MYFILE, '>>data.txt');
  print MYFILE "$1,$2,$3,$4\n";
  print MYFILE "$5,$6,$7,$8\n";
  print MYFILE "$9,$10,$11,$12\n";
print MYFILE "$13,$14,$15,$16\n";
print MYFILE "$17,$18,$19,$20\n";
print MYFILE "$21,$22,$23,$24\n";
print MYFILE "$25,$26,$27,$28\n";
print MYFILE "$29,$30,$31,$32\n";
print MYFILE "$33,$34,$35,$36\n";
print MYFILE "$37,$38,$39,$40\n";
print MYFILE "$41,$42,$43,$44\n";
print MYFILE "$45,$46,$47,$48\n";
print MYFILE "$49,$50,$51,$52\n";
print MYFILE "$53,$54,$55,$56\n";
print MYFILE "$57,$58,$59,$60\n";
print MYFILE "$61,$62,$63,$64\n";
print MYFILE "$65,$66,$67,$68\n";
print MYFILE "$69,$70,$71,$72\n";
print MYFILE "$73,$74,$75,$76\n";
print MYFILE "$77,$78,$79,$80\n";

  
  
  }
  

  }

  close (MYFILE);

But this doesnt look for the end of the file "$" so it could throw up errors for data with fewer records than the spec max of 22

Sergei Steshenko · 04-21-2010, 03:49 PM

Quote:

Originally Posted by suse_nerd

Something like this is working, but is messy as hell

Code:

#!/usr/bin/perl
use strict;
use warnings;

print "Scan barcode and press enter\n";
while(defined(my $line = <STDIN>))
{
print "Got carriage return\n";

print "Parsing the following input:\n $line";
#Prompt for $myfile


if($line =~ m/^.{8}(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})(.{8})(.{20})(.{8})(.{4})/)

  {
  warn "CHECKPOINT in 'if'";
  open (MYFILE, '>>data.txt');
  print MYFILE "$1,$2,$3,$4\n";
  print MYFILE "$5,$6,$7,$8\n";
  print MYFILE "$9,$10,$11,$12\n";
print MYFILE "$13,$14,$15,$16\n";
print MYFILE "$17,$18,$19,$20\n";
print MYFILE "$21,$22,$23,$24\n";
print MYFILE "$25,$26,$27,$28\n";
print MYFILE "$29,$30,$31,$32\n";
print MYFILE "$33,$34,$35,$36\n";
print MYFILE "$37,$38,$39,$40\n";
print MYFILE "$41,$42,$43,$44\n";
print MYFILE "$45,$46,$47,$48\n";
print MYFILE "$49,$50,$51,$52\n";
print MYFILE "$53,$54,$55,$56\n";
print MYFILE "$57,$58,$59,$60\n";
print MYFILE "$61,$62,$63,$64\n";
print MYFILE "$65,$66,$67,$68\n";
print MYFILE "$69,$70,$71,$72\n";
print MYFILE "$73,$74,$75,$76\n";
print MYFILE "$77,$78,$79,$80\n";

  
  
  }
  

  }

  close (MYFILE);

But this doesnt look for the end of the file "$" so it could throw up errors for data with fewer records than the spec max of 22

Rather than $N you may have normal names - start from

perldoc perlretut

and look there for

named capture
.

And extend your regular expression to match '$' at the end - it's trivial, and document I'm suggesting to read contains the needed info.

suse_nerd · 04-21-2010, 04:48 PM

Quote:

Originally Posted by Sergei Steshenko

Rather than $N you may have normal names - start from

perldoc perlretut

and look there for

named capture
.

And extend your regular expression to match '$' at the end - it's trivial, and document I'm suggesting to read contains the needed info.

Care to give me a little more guidance? I am not that proficient at regex or perl.
I got that I need to do something like this

Code:

$header='~m/^.{8}(.{8})(.{20})(.{8})(.{4})';
$body='(.{8})(.{20})(.{8})(.{4})';
$foot='{2}';

But how to put this into a loop I am not sure. Also still don't really know what you mean by $N. Do you mean the nth ($N) character I split?

How do you detect the end footer? Look ahead by one character and see if it is a $?