[SOLVED] Get strings distributed along up to 3 lines
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I see some issues in the current process, namely around the data not being set at all points. What i mean is if we look at just the first line in the first 2 sets of output data:
1. The '9' at the end of that string means that for some reason we are now capturing 11 digits instead of 10 as per the previous sets ... so my question is, how do we know when it should be 10 or 11? (could it be more?)
2. The second shows that in all previous sets we are ignoring 'f' but now we are using it to return values. This one is not as much of a concern as my thought here would be to process everything prior to 940e and then process this one separately as it appears to have a different set of rules.
Currently my idea looks something like below but need more information to have a better picture:
Code:
#!/usr/bin/env ruby
BEGIN{ $/="ff77" }
File.open(ARGV[0])
while gets
$_.gsub!(/\n/,"")
$_.split($/).each{
|x|
next unless x =~ /^(.{6,18})(532064[^f]*).(814[^f]*)/
printf("%d %s %s\n",$1,$2,$3)
if x =~ /05(9.{32,34}.*?)(940e.{28})/
$1.scan(/9.*?(?=9\d|$)/).each{
|y|
puts "|" + y + "|"
printf("PROD_%c ",y[1])
s = y[4..-1].gsub(/f/,"")
s =~ /(.{2})(.{2})(.{2})(.{6})(.{10})(.*)/
puts "\t" + $1 + "|" + $2 + "|" + $3 + "|" + $4 + "|" + $5 + "|" + $6
}
else
puts
end
}
end
The above has errors and is not complete, but may give you an idea of where my questions above are going?
I haven't answered you before because after see your code in post #32, I was breaking my head trying
to use the function "scan" you used to print separated values and in decimal format. I got this test code,
to, but obviously is missing something to fix in the print. After that you sent the other code in post #33.
This is what I was trying before you sent your last code.
Code:
pat="05910f01020000000d8147451807ffffff009310010c0000000d8147451805ffffff0101960f010c0000000d81474518559fffff00940e01020102010001ffffff02010201"
pat.scan(/(9\d)(..)(..)(..)(..)(.{6})(\d{1,16})(f{1,16})([0-1]{2})([0-1]{0,2})/).each{
|y|
for i in y
print $1,"|",$2.hex,"|",$3.hex,"|",$4.hex,"|",$5.hex,"|",$6.hex,"|",$7,"|",$9.hex,"|",$10.hex
end
}
Quote:
Originally Posted by grail
1. The '9' at the end of that string means that for some reason we are now capturing 11 digits instead of 10 as per the previous sets ... so my question is, how do we know when it should be 10 or 11? (could it be more?)
Yes, these fields that begin with 532064.. and 814.. are formed by 16 characters, a variable number of digits and padding f's.
So,
1- each sub string begins with 91,93,94 or 96 (could be more like 92, 95, 97 etc)
2- The next 2 characters (in blue) after 91,93,94 etc are the length of the substring. So,
for 91 the next byte is "0f"=15, then after the "0f" there are 15 bytes (30 characters)
for 93 the next byte is "10"=16, then after the "10" there are 16 bytes (32 characters)
Code:
Ok ... see what ya think of this (I have converted to a script instead of command line as now way to big):
It seems to work just fine, but I don't understand the misterious magic inside some code lines, for example:
In |y| is stored all pattern matched by the regex,but I don't understanf what it means "y[4..-1]"
And one issue is that the strings "PROD_X" are not in sequencial order, just was and example to put PROD_X. Theye are
related like this:
Code:
for 91 --> APPLE
for 93 --> GRAPES
for 96 --> PEAR
for 94 --> ORANGE
.
.
Could be more values
Them with those mapped values, the output desired change a little bit:
This is a slice of the string stored in 'y' starting at the fifth character (as zero based arrays in Ruby) to the -1th character which means start from the right and come back 1, hence the last character.
As for not being sequential, this is not an issue as the regex always matches 940e last, hence ORANGEs are always last, although I guess they may appear earlier as well.
Is not working. I think your regex is matching the value after "9", and my regex is supposed to match
each string that begins with "9" within pattern 2,but I receive an error using this regex.
May you explain me please, how does it work the regex you use 9.*?(?=9[1-9]|$ in order
to understand how to modify something if I need.
.*? - non-greedy search of any characters after the 9
(?=9[1-9]|$) - this is a positive look ahead (see here for details). This means that the data we want must be followed by a number between 91 - 99 or the end of the string ($). The idea of this mechanism is that it is not saved as data we want but must be present after the data we are looking for
If you prefer to layout the entire regex and save all the different portions (as in the regex you have shown), advise what error you are getting and I will see if I can help correct?
Also remember that the line below does do the necessary break up you are looking to accomplish here (as far as I can tell):
$ ruby script.rb file
1 532064022272619 81422060001extract1.rb:22:in `block (2 levels) in <main>': undefined method `scan' for nil:NilClass (NoMethodError)
from extract1.rb:16:in `each'
from extract1.rb:16:in `block in <main>'
from extract1.rb:7:in `each'
from extract1.rb:7:in `<main>'
This error points to a previous change you would have to make as it is saying that $1 is nil and hence the nil class has no method called scan.
Which means the follow regex has been changed:
Code:
if x =~ /05(9.{32,34}.*?)940e(.{28})/
$1 would refer to - (9.{32,34}.*?)
If this line has not been changed, the other thing you may have done is another regex between these 2 lines, such as a call to sub or gsub, and if these calls do not use brackets to save a back reference
or the item being searched for does not exist, then again $1 will be nil
I understand now your look ahead regex, thank you for explain me, but your regex and the long regex I've tried match the same
strings within $1, but with my regex it fails, I'm not sure if is because contains "(" and ")".
Well I replaced my line with yours and get an error related to the fact that the printf is not receiving what it is looking for, but not the error you are getting.
I will says again, that your error points to the fact that $1 is nil, may i suggest placing the following on the line immediately preceding this entry:
Code:
puts $1
This will show you what the regex is scanning before it does so.
I get the the content of $1 only for the first iteration and the following error:
Code:
$ ruby extract1.rb file
910f01020000000d8147451807ffffff009310010c0000000d8147451805ffffff0101960f010c0000000d81474518559fffff00
extract1.rb:24:in `block (2 levels) in <main>': undefined method `scan' for nil:NilClass (NoMethodError)
from extract1.rb:17:in `each'
from extract1.rb:17:in `block in <main>'
from extract1.rb:7:in `each'
from extract1.rb:7:in `<main>'
I don't undertand why if both are valid regex to match the same strings.
And I try to use the long regex because if 9X occurs in the middle of 2 strings that I really want to match, will
match a smaller subtring. Because of that I'm trying to force the length of the substring putting "{26,28}".
I am not getting the same output or error as you so I would suggest you are either using different data or you have changed another part of the code as well as the line you have mentioned.
Please provide your current code and test data.
Also, what version of Ruby are you running?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.