LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   SED, or GREP Command (https://www.linuxquestions.org/questions/programming-9/sed-or-grep-command-4175458335/)

edwardcode 04-16-2013 10:45 AM

SED, or GREP Command
 
I need to search for a string (string1) in a file, but that sting has a few lines that need to go with it. The problem is that the number of lines vary. so I need to search for word in a file and find all of the other connecting lines with it. See below for example
(empty line)
/tmp
bad line
bad line
bad line
bad line
(empty line)
/tmp
good line
good line
good line
String1
good line
good line
good line
(empty line)
/tmp
bad line
bad line
bad line
bad line
(empty line)

now I want to search for string1 and get all of the good lines that are with it. All good lines start with "tmp" and it always ends in an empty line. So I need ONLY everything in between, and including, the "tmp" and the (empty line) that is associated with "string1"

I know sed can do this very easily I just am not sure of the syntax.

Any help would be great.

Thanks

cortman 04-16-2013 10:57 AM

If I understand correctly, you want to get all the text between /tmp and [empty line] if a certain string is contained between the two?
I'm a base beginner at sed too, but perhaps this would work-

Code:

sed -n '/\/tmp/,/^$/p' file_name

edwardcode 04-16-2013 11:03 AM

I tried that but that dose not search for string1. Also all of the lines start with /tmp so that would bring up the entire file.


P.S. The proper syntax for the command you want to run is:

sed -n '/\/tmp/,/^$/p' file_name

You need a coma not a period.

danielbmartin 04-16-2013 11:03 AM

With this InFile ...
Code:

bad line 1
bad line 2
bad line 3
bad line 4

/tmp
good line 1
good line 2
good line 3
String1
good line 4
good line 5
good line 6

bad line 5
bad line 6
bad line 7

bad line 1
bad line 2
bad line 3
bad line 4

/tmp
good line 11
good line 12
good line 13
String2
good line 14
good line 15
good line 16

bad line 15
bad line 16
bad line 17

... this code ..
Code:

sed -n '/\/tmp/,/^$/p' $InFile >$OutFile
... produced this OutFile ...
Code:

/tmp
good line 1
good line 2
good line 3
String1
good line 4
good line 5
good line 6

/tmp
good line 11
good line 12
good line 13
String2
good line 14
good line 15
good line 16

Daniel B. Martin

danielbmartin 04-16-2013 11:06 AM

Quote:

Originally Posted by edwardcode (Post 4932543)
I tried that but that dose not search for string1.

You provided a sample input file. It is also helpful if you provide a corresponding sample output file. This helps us understand your description of the desired result.

Daniel B. Martin

edwardcode 04-16-2013 11:07 AM

I did make a small change the original post. All "sections" start with /tmp. So I need to search for string and get all lines that go upto /tmp and down to the empty line.

edwardcode 04-16-2013 11:08 AM

See below for the sample input and the sample output for the search I need to do. Remember the number of lines before and after the string may vary. The only thing that seems to be consistent is the /tmp on top and the empty line on the bottom of string1.


Sample input:


(empty line)
/tmp
bad line
bad line
bad line
bad line
(empty line)
/tmp
good line
good line
good line
String1
good line
good line
good line
(empty line)
/tmp
bad line
bad line
bad line
bad line
(empty line)



Sample output:

/tmp
good line
good line
good line
String1
good line
good line
good line
(empty line)

danielbmartin 04-16-2013 11:22 AM

Proposed solution withdrawn. Haste makes waste!

edwardcode 04-16-2013 11:29 AM

That output bad lines I need nothing but the block of good lines. Maybe I am using the wrong command for this. I am fairly certain that there is a way to do it with regex. I also thought you could do it with sed but the more I research it the more trouble I am getting.

danielbmartin 04-16-2013 11:36 AM

You are looking for a "grep with context" where the context is defined by something other than a fixed number of lines before and/or after a string match.

With this InFile ...
Code:

bad line 1
bad line 2
bad line 3
bad line 4

/tmp
good line 1
good line 2
good line 3
String1
good line 4
good line 5
good line 6

bad line 5
bad line 6
bad line 7

bad line 1
bad line 2
bad line 3
bad line 4

/tmp
good line 11
good line 12
good line 13
String2
good line 14
good line 15
good line 16

bad line 15
bad line 16
bad line 17

... this code ..
Code:

paste -d"~" -s $InFile      \
|sed 's/~\/tmp/\n\/tmp/g'    \
|grep ^/tmp                  \
|grep String2                \
|sed -r 's/(.*)(~~.*)/\1~/g' \
|sed 's/~/\n/g'              \
> $OutFile

... produced this OutFile ...
Code:

/tmp
good line 11
good line 12
good line 13
String2
good line 14
good line 15
good line 16

Daniel B. Martin

edwardcode 04-16-2013 11:53 AM

The only problem I have with that set up is that this will be applied on gig's and gig's of text files and I would rather just have one command (even if it is regex) that would be able to parse it. I also think that awk might be able to work, but I am still not sure.

millgates 04-16-2013 11:57 AM

Hi, what about
Code:

sed -n '/\/tmp/,/^$/H; /^$/{x; /String1/p}' <infile

danielbmartin 04-16-2013 12:00 PM

Quote:

Originally Posted by edwardcode (Post 4932576)
The only problem I have with that set up is that this will be applied on gig's and gig's of text files and I would rather just have one command (even if it is regex) that would be able to parse it. I also think that awk might be able to work, but I am still not sure.

Don't discard a solution before trying it.

A single complicated awk might not be any better than a pipe of several simpler transformations.

Daniel B. Martin

edwardcode 04-16-2013 12:05 PM

Milgates,
That was what I was looking for. Will you please explain this portion of your code:

/^$/{x; /String1/p}'

Thanks

konsolebox 04-16-2013 12:16 PM

millgates beat me to it. Mine is something more complicated.
Code:

sed -n -e '/\/tmp/,/^$/{ /tmp/{ h; b; }; H; /String1/,/^$/{ /^$/{ x; p; }; }; }'


All times are GMT -5. The time now is 12:23 AM.