SED, or GREP Command

edwardcode · 04-16-2013, 10:45 AM

I need to search for a string (string1) in a file, but that sting has a few lines that need to go with it. The problem is that the number of lines vary. so I need to search for word in a file and find all of the other connecting lines with it. See below for example
(empty line)
/tmp
bad line
bad line
bad line
bad line
(empty line)
/tmp
good line
good line
good line
String1
good line
good line
good line
(empty line)
/tmp
bad line
bad line
bad line
bad line
(empty line)

now I want to search for string1 and get all of the good lines that are with it. All good lines start with "tmp" and it always ends in an empty line. So I need ONLY everything in between, and including, the "tmp" and the (empty line) that is associated with "string1"

I know sed can do this very easily I just am not sure of the syntax.

Any help would be great.

Thanks

cortman · 04-16-2013, 10:57 AM

If I understand correctly, you want to get all the text between /tmp and [empty line] if a certain string is contained between the two?
I'm a base beginner at sed too, but perhaps this would work-

Code:

sed -n '/\/tmp/,/^$/p' file_name

edwardcode · 04-16-2013, 11:03 AM

I tried that but that dose not search for string1. Also all of the lines start with /tmp so that would bring up the entire file.

P.S. The proper syntax for the command you want to run is:

sed -n '/\/tmp/,/^$/p' file_name

You need a coma not a period.

danielbmartin · 04-16-2013, 11:03 AM

With this InFile ...

Code:

bad line 1
bad line 2
bad line 3
bad line 4

/tmp
good line 1
good line 2
good line 3
String1
good line 4
good line 5
good line 6

bad line 5
bad line 6
bad line 7

bad line 1
bad line 2
bad line 3
bad line 4

/tmp
good line 11
good line 12
good line 13
String2
good line 14
good line 15
good line 16

bad line 15
bad line 16
bad line 17

... this code ..

Code:

sed -n '/\/tmp/,/^$/p' $InFile >$OutFile

... produced this OutFile ...

Code:

/tmp
good line 1
good line 2
good line 3
String1
good line 4
good line 5
good line 6

/tmp
good line 11
good line 12
good line 13
String2
good line 14
good line 15
good line 16

Daniel B. Martin

danielbmartin · 04-16-2013, 11:06 AM

Quote:

Originally Posted by edwardcode

I tried that but that dose not search for string1.

You provided a sample input file. It is also helpful if you provide a corresponding sample output file. This helps us understand your description of the desired result.

Daniel B. Martin

edwardcode · 04-16-2013, 11:07 AM

I did make a small change the original post. All "sections" start with /tmp. So I need to search for string and get all lines that go upto /tmp and down to the empty line.

edwardcode · 04-16-2013, 11:08 AM

See below for the sample input and the sample output for the search I need to do. Remember the number of lines before and after the string may vary. The only thing that seems to be consistent is the /tmp on top and the empty line on the bottom of string1.

Sample input:

(empty line)
/tmp
bad line
bad line
bad line
bad line
(empty line)
/tmp
good line
good line
good line
String1
good line
good line
good line
(empty line)
/tmp
bad line
bad line
bad line
bad line
(empty line)

Sample output:

/tmp
good line
good line
good line
String1
good line
good line
good line
(empty line)

danielbmartin · 04-16-2013, 11:22 AM

Proposed solution withdrawn. Haste makes waste!

edwardcode · 04-16-2013, 11:29 AM

That output bad lines I need nothing but the block of good lines. Maybe I am using the wrong command for this. I am fairly certain that there is a way to do it with regex. I also thought you could do it with sed but the more I research it the more trouble I am getting.

danielbmartin · 04-16-2013, 11:36 AM

You are looking for a "grep with context" where the context is defined by something other than a fixed number of lines before and/or after a string match.

With this InFile ...

Code:

bad line 1
bad line 2
bad line 3
bad line 4

/tmp
good line 1
good line 2
good line 3
String1
good line 4
good line 5
good line 6

bad line 5
bad line 6
bad line 7

bad line 1
bad line 2
bad line 3
bad line 4

/tmp
good line 11
good line 12
good line 13
String2
good line 14
good line 15
good line 16

bad line 15
bad line 16
bad line 17

... this code ..

Code:

paste -d"~" -s $InFile       \
|sed 's/~\/tmp/\n\/tmp/g'    \
|grep ^/tmp                  \
|grep String2                \
|sed -r 's/(.*)(~~.*)/\1~/g' \
|sed 's/~/\n/g'              \
> $OutFile

... produced this OutFile ...

Code:

/tmp
good line 11
good line 12
good line 13
String2
good line 14
good line 15
good line 16

Daniel B. Martin

edwardcode · 04-16-2013, 11:53 AM

The only problem I have with that set up is that this will be applied on gig's and gig's of text files and I would rather just have one command (even if it is regex) that would be able to parse it. I also think that awk might be able to work, but I am still not sure.

millgates · 04-16-2013, 11:57 AM

Hi, what about

Code:

sed -n '/\/tmp/,/^$/H; /^$/{x; /String1/p}' <infile

danielbmartin · 04-16-2013, 12:00 PM

Quote:

Originally Posted by edwardcode

The only problem I have with that set up is that this will be applied on gig's and gig's of text files and I would rather just have one command (even if it is regex) that would be able to parse it. I also think that awk might be able to work, but I am still not sure.

Don't discard a solution before trying it.

A single complicated awk might not be any better than a pipe of several simpler transformations.

Daniel B. Martin

edwardcode · 04-16-2013, 12:05 PM

Milgates,
That was what I was looking for. Will you please explain this portion of your code:

/^$/{x; /String1/p}'

Thanks

konsolebox · 04-16-2013, 12:16 PM

millgates beat me to it. Mine is something more complicated.

Code:

sed -n -e '/\/tmp/,/^$/{ /tmp/{ h; b; }; H; /String1/,/^$/{ /^$/{ x; p; }; }; }'