LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Grep multiple fields large file (https://www.linuxquestions.org/questions/programming-9/grep-multiple-fields-large-file-4175732614/)

bishop2001 01-09-2024 04:53 PM

Grep multiple fields large file
 
Greetings
I have a huge file that I'm trying to extract a few fields from followed by a value. For example the file looks like this but huge, but the pattern is the same

$_h7e 6ijrn3ij exceed: 8686738, string ABC/#123/in4j([99, fieldA

I want to extract all matches and display like:
exceed: 8686738, string ABC/#123/in4j([99,
exceed: 683738, string #Pheu/GP/i972j(3i,
Etc...

I'm trying regex like,
egrep -o "exceed: [0-9]*|string *,"

I'm matching exceed: and the numeric value, but not the field string and up until the comma. Suggestions please
Thanks again

Turbocapitalist 01-09-2024 05:33 PM

Perhaps, the following?

Code:

grep -o -E 'exceed: [0-9]*|string [^,]*'

# or

grep -o -E 'exceed: [0-9]*, string [^,]*'

The asterisk does nothing on its own in regular expressions or even extended regular expressions. You're perhaps thinking of globbing, which is different.

Compare "man 7 regex" versus "man 7 glob"

syg00 01-09-2024 06:31 PM

While I'm the self confessed #1 fan of regex solutions, maybe a simple cut ?.
Code:

cut -d ' ' -f 3-6

sundialsvcs 01-09-2024 09:49 PM

My first-blush guess here is: "greedy" vs. "non-greedy."

If you use a character like "*" in a regular expression, the default behavior is greedy. In other words, it will not stop at the first match, but will instead "greedily" stop at the last one. Which is probably not what you want in this case.

For instance, the pattern (punctuation added ...) {*,} will try to find the last comma.

grail 01-10-2024 12:40 AM

Assuming the data is having set fields, you could also use awk:
Code:

awk -F'[ ,]' '/exceed/{print $3,$4","$6,$7","}' file

MadeInGermany 01-10-2024 11:28 AM

Code:

grep -Eo "(exceed:|string) [^,]*"
If string always follows exceed: then you can print the one string:
Code:

grep -Eo "exceed: [0-9]+, string [^,]*"
Code:

grep -Eo "exceed: [0-9]+, string [^,]*,?"


All times are GMT -5. The time now is 03:25 PM.