[SOLVED] extracting a string form log output.

sysmicuser · 04-03-2012, 02:21 AM

Hi Guys,

I am looking for a quick hack to extract "Success" from following log file.

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
^M100 493 100 493 0 0 12832 0 --:--:-- --:--:-- --:--:-- 12832^M100 493 100 493 0 0 12611 0 --:--:-- --:--:-- --:--:-- 0
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns0="urn:myorganization:glide:services:FlowControlService:1:0"><env:Body><ns0:getActivationAge ntPropertyValueResponseElement><ns0:status>Success</ns0:status><ns0:value>$1</ns0:value><ns0:message xsi:nil="1"/></ns0:getActivationAgentPropertyValueResponseElement></env:Body></env:Envelope>

Essentially.

remembering string between,<ns0:status>Success</ns0:status>

How should we go about it?

business_kid · 04-03-2012, 02:38 AM

grep will get the line.

If you want to hack and trim use egrep with a posix or perl regex with appropriate switches. YMMV
man regex - posuix REs
man pcre - perl REs (more powerful imho).

pan64 · 04-03-2012, 05:50 AM

How do you like this?

Code:

sed ' /status/ { s/^.*\(<ns0:status>.*<.ns0:status>\).*$/\1/; p }; d ' logfile

sysmicuser · 04-03-2012, 06:18 AM

@pan64

output is very close.

<ns0:status>Success</ns0:status>

All what we need is "Success" instead of whole tag, should it be easy?

pan64 · 04-03-2012, 06:22 AM

just move the parentheses

Code:

sed ' /status/ { s/^.*<ns0:status>\(.*\)<.ns0:status>.*$/\1/; p }; d ' logfile

sysmicuser · 04-03-2012, 06:25 AM

@pan64 again very close, how to have a line feed \n after success?

[user01@tmelbld19 ~]$ sed ' /status/ { s/^.*<ns0:status>$.*$<.ns0:status>.*$/\1/; p }; d ' logfile
Success[user01@tmelbld19 ~]$

So after "Success" can we have a line break?

pan64 · 04-03-2012, 06:31 AM

just put a \n after the \1 part.

sed ' /status/ { s/^.*<ns0:status>$.*$<.ns0:status>.*$/\1\n/; p }; d ' logfile

sysmicuser · 04-03-2012, 06:41 AM

@pan64

Dear Sir, Thank you for your help.

Yes it works now, but I am not interested in only solution but also the associated learning with it.

May I please ask the magic behind your regex?

Many Thanks

pan64 · 04-03-2012, 06:57 AM

ok, let's try to explain, but first, here is the man page of the command sed: http://linux.die.net/man/1/sed
you can find the sed script between the ' chars.
/status/ means I want to search for lines containing the text status.
in {} there are two commands to execute on the current line (which should now contain the text status).
The first command is the s, means substitute, the syntax is: s/search text/replace text/. ^ is the beginning of the line, .* means anything, $ and $ means grouping, . means any char and finally $ means end of line. The replacement string is \1 which means the first group found - the text between $ and $. So the full line will be replaced with the grouped text, and a \n is added.
The second command is p which means print the text.
The last command is a d which means I want to delete the line and go to the next one. It will be executed for every line, the /status/ search expression works only for the commands inside {}.

sysmicuser · 04-03-2012, 09:10 AM

@pan64

Thank you very much !

David the H. · 04-03-2012, 12:40 PM

There's no need to make it that complicated. Just use the "-n" option to silence output by default, then add the "p" modifier directly to the substitute command. That way only lines that match will be printed.

You can also use the "-r" option to explicitly enable extended regex, so that there's no need to escape the parentheses.

Since the "." represents any character, we really don't want to use it when we actually want to match "/". Unfortunately, this is the default delimiter for the "s" command. However, sed allows you to use any ascii character as the delimiter, so just choose one that's not found in the expression itself. I prefer using "|" myself.

Next, the regex only really needs to contain enough of the string to ensure a unique match, and the starting and ending anchors are also superfluous here, as the regex assumes them. Of course, it doesn't hurt to leave them in either.

Finally, it's probably safer to replace the ".*" (a string of characters of any length) with "[^<]*" (a string of "not <" of any length), to avoid any possible issues with regex greediness.

Code:

sed -rn '/status/ s|.*status>([^<]*)</ns0:stat.*|\1|p' logfile

BTW, I don't see where the line-ending issue could be coming from. sed always appends a newline to each line of output anyway.

Here are a few useful sed references.
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt

sysmicuser · 04-03-2012, 05:30 PM

@David the H

Code:

[user01@tmelbld19 ~]$ ./getEvaluateReadyToRate.sh 2>&1|tee -a panduta.log|sed -rn '/status/ s|.*status>([^<]*)</ns0:stat.*|\1|p'
Success[user01@tmelbld19 ~]$

There is no line break after "Success"... regex and sed seriously confuses me a lot

Tinkster · 04-03-2012, 07:33 PM

Quote:

Originally Posted by sysmicuser

@David the H

Code:

[user01@tmelbld19 ~]$ ./getEvaluateReadyToRate.sh 2>&1|tee -a panduta.log|sed -rn '/status/ s|.*status>([^<]*)</ns0:stat.*|\1|p'
Success[user01@tmelbld19 ~]$

There is no line break after "Success"... regex and sed seriously confuses me a lot

As in the earlier post: just slap a \n behind \1

sysmicuser · 04-03-2012, 10:18 PM

@Tinkster it works mate !

David the H. · 04-05-2012, 10:03 AM

Yes, the line-break is an easy fix. But what I didn't understand in my last post was why it wasn't including one to start with. sed works by placing the line into the pattern buffer minus the newline that delimited it, performs it's edits on the buffer contents, then adds a newline back to the output. So I was thinking it always added a newline to the output.

Besides, I got a newline in all of my test runs.

I've figured it out now though.

The reason you didn't get one is because the line operated on is the last one in the file, and there's no final newline after it (pretty much the only place that could happen). So I guess sed only inserts a newline in the output if there was one in the input. Something new learned!