AWK/SED Multiple pattern matching over multiple lines issue
I have to construct a maintenance program, part of this program is the interrogation of log files.
Ordinarily a grep or sed would sort me right out however this problem has a few other restrictions. I have to initially get the current date from the system and then match this to entries in a log file. Not a problem, already done. However once I have located a matching line I then have to step over the next lines looking for another pattern and, if found, write these entries to a file. I can ONLY use either grep, sed or awk to do this. I believe awk will do it no problem however I am not familiar with all it's aspects. An example of the data may help: test.log: 2006 Nov 06 18:01:25:538 GMT +1 userQueue - Job-18494 s/QueryLog]: located user queue on line 654 of system 5432 2006 Nov 06 18:03:25:538 GMT +1 userQueue - Job-18494 s/QueryLog]: located user queue on line 654 of system 5432 2006 Nov 06 18:04:25:538 GMT +1 userQueue - Job-18494 s/QueryLog]: located user queue on line 654 of system 5432 2006 Nov 06 18:06:25:538 GMT +1 userQueue - Job-18494 s/QueryLog]: located user queue on line 654 of system 5432 2006 Nov 06 18:07:25:538 GMT +1 userQueue - Job-18494 s/QueryLog]: located user queue on line 654 of system 5432 2006 Nov 06 18:08:26:179 GMT +1 userQueue - [Unknown ] - Severity: 2; Category: ; ExceptionCode: ; Message: unable to create new nati ve thread; Parameters: <n/a>; Stack Trace: Job-18507 Error in userQueue java.lang.OutOfMemoryError: unable to create new native thread I need to extract the corresponding line(s) relating to the OutOfMemoryError and date! e.g. output should look like: (date) (filename) (error) 2006 Nov 06 userQueue java.lang.OutOfMemoryError: unable to create new native thread Currently I'm using something like this: #!/bin/bash date=`date | awk '{print $6 " " $2 " " $3}'` filename=`sed -n "/$date/p" *.log* | awk '{print $7}'` echo "Date is: " $date echo "Filename is: " $filename search=`sed "/$date/p" *.log* | grep OutOfMemory` echo "Search Results: " $search totalString=$date" "$filename" "$search echo "Final Result: "$totalString > errorFiles This of course doesn't work and gets every instance of either 2006 Nov 06 OR OutOfMemory. I have also played around with simple oneliners like: sed -e '/2006 Nov 06/b' -e '/OutOfMemoryError/b' -e d test.log > output awk '{ if($1 == "2006" && $2 == "Nov" && $3 == "21") print}' test.log I believe awk is the way to go. From the above example I should only have to search for the next pattern and output. But I'm unsure. I hope some Linux crack could help with this. I'm sure someone with a more in-depth knowledge of awk or sed could solve this very simply. Any help would be great. Thanks. |
It's not entirely clear to me which lines exactly you're trying to filter out of the file. So here are few commands that I think/hope may help...
Code:
# Get all lines that start with $date and also Code:
date=`date +"%Y %b %d"` |
Hko,
Thanks for your time, response and advice regarding getting the date. I am trying to filter out the entire file. All I need is the line that is identified as having a date, that matches the current date, and is preceeded by the 'OutOfMemory' error string. Which in most cases will be 4 lines below the matched date line. My primary problem is that when I make a search I get a list of all instances that have the date value. The date field is not a unique identifier. The relationship between the date and error is the unique part! Thanks |
OK. If I understand correctly what you're trying to do, this would do the trick:
Code:
#!/bin/bash |
Quote:
make all stuff that belongs to one log-entry reside on one line? Cheers, Tink |
Hi!
One more tip: All your lines begins with 2006... so you can use it as a line delimiter and delete newlines at all. Code:
...|tr -d '\n'|awk -F '200[0-9]' '/OutOfMemoryError/ {print}'|... |
Take a look in the getline command which is part of awk/gawk program.
|
Since you are looking for a pattern on a single line containing both the date and "Out of Memory", these could both be contained in a regular expression pattern. Just put a ".*" pattern inbetween the two patterns.
Or you could use grep twice: "grep 'pattern1' logfile | grep 'pattern2'" to produce an intersection of the two patterns. There are three other things you can use with sed. The -n option will suppress output unless you use the print command. The -e option allows you to enter more then a single command ( As demonstrated by poster Hko above ). You can use brackets to use subpatterns inside // slashes to further fine tune the search. This may allow you to first select lines with the current date, and then create different files which filter different patterns. If you have a gawk-doc package, you might want to install it. It includes the book "Gawk: Effective AWK Programming." |
Thanks for all the feedback guys.
HKO your solution worked great on a single entry log file I tested, however sed died with a "sed: Memory allocation failed." error when tested on a real 8MB file. Any suggestions? |
Just in case anyone was interested, an ugly solution I came up with is this:
#!/bin/bash date=`date +"%Y %b %d"` errorCode=$1 sed -n '/'"$date"'/,$p' ./data/5.log > tempfile lineValue=`grep -n "$errorCode" tempfile | cut -d: -f 1 > lineValues` count=`wc -w < lineValues` grep -n "$date" tempfile | cut -d: -f 1 > dateValues for((j=1;j<="$count";j++)); do nOe=`sed "$j"'q;d' lineValues` nOd=`sed "$j"'q;d' dateValues` max=$nOe min=$nOd for ((i="$nOe";i>=0;i--)); do if [ "$i" == "$max" ];then error=`sed "$max"'q;d' tempfile` fi if [ "$i" == "$min" ];then info=`sed "$min"'q;d' tempfile` fi done output="$info"" ""$error" done Thanks for the help guys. |
Quote:
Code:
#!/bin/bash |
Hko,
Once again thanks for your response. Just to let you know 'tac' doesn't come as standard with the SunOS version I am using. So the elegant solution you proposed can't be used :? I am working with limited resources. |
Hi GigerMalmensteen,
As you have several steps to accomplish your task, I guess the best tool for your needs is awk: first, identify the messages of the day, second cat all the physical lines that compound the logical one, decide if it is to be reported and finally cut the slices you want to display. Below I show you an script which does the above steps: Code:
#!/bin/sh |
It would all be very easy and elegant (not to mention pretty fast) to use perl:
Code:
#!/usr/bin/perl -w Code:
chmod 755 mylogscan Code:
./mylogscan logfile1 logfile2 logfile3 use strict; just means complain a lot about potentially risky code. It's generally a good idea to use this. Code:
while(<>) { ... } Code:
/^(\d\d\d\d \w\w\w \d\d \d\d:\d\d:\d\d:\d\d\d \w\w\w ([+\-]\d)?)/ The rest is pretty self explanatory I think. Perl's syntax is highly abbreviated for this sort of task because it's exactly the sort of thing that needs to be done a lot. It saves a lot of typing at the expense of scaring off newbies. Perl eats gigabytes of log files for breakfast, and still has room left for more! Long live Perl! |
Matthew42g, you ought to be able to shorten the regex with these operators I believe:
{n} Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m times see http://perldoc.perl.org/perlre.html |
All times are GMT -5. The time now is 02:43 PM. |