Removing duplicate lines with sed

tireseas · 01-10-2005, 01:41 PM

Hi

Am trying to come to grips with sed. Have read the info page and done some scratching about in a few books, but have not been able to figure out how to remove duplicate lines in a text file, and the example program in the info pages just really didn't make sense to me.

This is the source document*:
======================
River by the deer
The crisp flower by the rain
Happy white river
River by the deer
The liquid flower
The cloud drifts across the storm
Gentle golden deer
Gentle golden deer
======================
* generously spawned by nonsense-0.6 a random text generator

What I am wishing to do using sed is to delete the two duplicate lines when I pass the source file to it and then output the cleaned text to another file, e.g. cleaned.txt

1. How can I do this using sed? I was thinking of grepping, but then I still have to delete the duplicates although grep at least would give me patterns to work with I suppose. Is it possible to do it without grep?

2. Has anyone come across a comprehensive resource for using sed that covers all these kinds of angles, because I am wanting to become more familiar with this evidently powerful tool.

Many thanks

itsme86 · 01-10-2005, 01:49 PM

Look at the commands uniq and sort.

slakmagik · 01-10-2005, 01:56 PM

Well, yeah, but...

From 'sed1line5.2.txt', which probably isn't the original title:

Code:

 # delete duplicate, consecutive lines from a file (emulates "uniq").
 # First line in a set of duplicate lines is kept, rest are deleted.
 sed '$!N; /^\(.*\)\n\1$/!P; D'

 # delete duplicate, nonconsecutive lines from a file. Beware not to
 # overflow the buffer size of the hold space, or else use GNU sed.
 sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

http://sed.sourceforge.net/
http://www.student.northpark.edu/pem...ed/sedfaq.html
http://www.opengroup.org/onlinepubs/...9/xcu/sed.html
http://www-106.ibm.com/developerwork...ry/l-sed1.html
http://www-106.ibm.com/developerwork...ry/l-sed2.html
http://www-106.ibm.com/developerwork...ry/l-sed3.html

It is an awesome tool. And don't forget that it's ed's cousin! ed, sed, grep, more/less, even awk and vi/m - the core.

homey · 01-10-2005, 02:06 PM

Here's one way to do it with sort
cat file.txt | sort -u > filename.txt

tireseas · 01-11-2005, 01:31 AM

Thanks digiot that was very helpful and worked.

Any recommendations for good reference books/papers on sed?

Thanks homey - I'll try your recommendation and get back to you.

bigearsbilly · 01-11-2005, 04:16 AM

Personally, I think the poem is fine as it is.

sed is groovy.
o'reilly using sed and awk.

[url]http://www.amazon.co.uk/exec/obidos/ASIN/1565922255/qid=1105438475/sr=1-1/ref=sr_1_10_1/026-8352874-6384407[/url

bigearsbilly · 01-11-2005, 04:17 AM

http://www.amazon.co.uk/exec/obidos/...352874-6384407

tireseas · 01-11-2005, 06:43 AM

Quote:

Originally posted by bigearsbilly
[B]Personally, I think the poem is fine as it is.

Glad you liked the poem. I had nothing to do with it beyond deliberately duplicating two or so lines churned out by "nonsense" in order to illustrate the question. That either suggests that your poetic appreciation needs brushing up or that nonsense randomly generates some great poetry!!!

Also, thanks for the link to the Sed & Awk book. For now I'll go for the pocket reference guide, but the bigger book does seem worth keeping in mind.

Cheers

tireseas · 01-11-2005, 01:50 PM

Sorry digiot - I didn't twig the links posted under the code block!! D'Oh!!!

Cheers - now that I've cottoned on, that is!!

- Andy

slakmagik · 01-11-2005, 02:27 PM

Quote:

Originally posted by tireseas
Sorry digiot - I didn't twig the links posted under the code block!! D'Oh!!!

Cheers - now that I've cottoned on, that is!!

- Andy

Oh. I thought you saw those and wanted a book as well. I was going to suggest the same thing bigearsbilly did. Glad you found the links.

BTW, bigearsbilly - you could edit that first link and the thread'll stop being so wiiiiide.

bigearsbilly · 01-12-2005, 03:27 AM