LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-10-2005, 01:41 PM   #1
tireseas
Member
 
Registered: Jun 2003
Location: London, UK
Distribution: Slackware 10 & 10.1
Posts: 149

Rep: Reputation: 15
Removing duplicate lines with sed


Hi

Am trying to come to grips with sed. Have read the info page and done some scratching about in a few books, but have not been able to figure out how to remove duplicate lines in a text file, and the example program in the info pages just really didn't make sense to me.

This is the source document*:
======================
River by the deer
The crisp flower by the rain
Happy white river
River by the deer
The liquid flower
The cloud drifts across the storm
Gentle golden deer
Gentle golden deer
======================
* generously spawned by nonsense-0.6 a random text generator

What I am wishing to do using sed is to delete the two duplicate lines when I pass the source file to it and then output the cleaned text to another file, e.g. cleaned.txt

1. How can I do this using sed? I was thinking of grepping, but then I still have to delete the duplicates although grep at least would give me patterns to work with I suppose. Is it possible to do it without grep?

2. Has anyone come across a comprehensive resource for using sed that covers all these kinds of angles, because I am wanting to become more familiar with this evidently powerful tool.

Many thanks
 
Old 01-10-2005, 01:49 PM   #2
itsme86
Senior Member
 
Registered: Jan 2004
Location: Oregon, USA
Distribution: Slackware
Posts: 1,246

Rep: Reputation: 59
Look at the commands uniq and sort.
 
Old 01-10-2005, 01:56 PM   #3
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Well, yeah, but...

From 'sed1line5.2.txt', which probably isn't the original title:

Code:
 # delete duplicate, consecutive lines from a file (emulates "uniq").
 # First line in a set of duplicate lines is kept, rest are deleted.
 sed '$!N; /^\(.*\)\n\1$/!P; D'

 # delete duplicate, nonconsecutive lines from a file. Beware not to
 # overflow the buffer size of the hold space, or else use GNU sed.
 sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
http://sed.sourceforge.net/
http://www.student.northpark.edu/pem...ed/sedfaq.html
http://www.opengroup.org/onlinepubs/...9/xcu/sed.html
http://www-106.ibm.com/developerwork...ry/l-sed1.html
http://www-106.ibm.com/developerwork...ry/l-sed2.html
http://www-106.ibm.com/developerwork...ry/l-sed3.html

It is an awesome tool. And don't forget that it's ed's cousin! ed, sed, grep, more/less, even awk and vi/m - the core.
 
Old 01-10-2005, 02:06 PM   #4
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
Here's one way to do it with sort
cat file.txt | sort -u > filename.txt
 
Old 01-11-2005, 01:31 AM   #5
tireseas
Member
 
Registered: Jun 2003
Location: London, UK
Distribution: Slackware 10 & 10.1
Posts: 149

Original Poster
Rep: Reputation: 15
Thanks digiot that was very helpful and worked.

Any recommendations for good reference books/papers on sed?

Thanks homey - I'll try your recommendation and get back to you.
 
Old 01-11-2005, 04:16 AM   #6
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
Personally, I think the poem is fine as it is.

sed is groovy.
o'reilly using sed and awk.

[url]http://www.amazon.co.uk/exec/obidos/ASIN/1565922255/qid=1105438475/sr=1-1/ref=sr_1_10_1/026-8352874-6384407[/url
 
Old 01-11-2005, 04:17 AM   #7
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
http://www.amazon.co.uk/exec/obidos/...352874-6384407
 
Old 01-11-2005, 06:43 AM   #8
tireseas
Member
 
Registered: Jun 2003
Location: London, UK
Distribution: Slackware 10 & 10.1
Posts: 149

Original Poster
Rep: Reputation: 15
Quote:
Originally posted by bigearsbilly
[B]Personally, I think the poem is fine as it is.
Glad you liked the poem. I had nothing to do with it beyond deliberately duplicating two or so lines churned out by "nonsense" in order to illustrate the question. That either suggests that your poetic appreciation needs brushing up or that nonsense randomly generates some great poetry!!!

Also, thanks for the link to the Sed & Awk book. For now I'll go for the pocket reference guide, but the bigger book does seem worth keeping in mind.

Cheers
 
Old 01-11-2005, 01:50 PM   #9
tireseas
Member
 
Registered: Jun 2003
Location: London, UK
Distribution: Slackware 10 & 10.1
Posts: 149

Original Poster
Rep: Reputation: 15
Sorry digiot - I didn't twig the links posted under the code block!! D'Oh!!!

Cheers - now that I've cottoned on, that is!!

- Andy
 
Old 01-11-2005, 02:27 PM   #10
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Quote:
Originally posted by tireseas
Sorry digiot - I didn't twig the links posted under the code block!! D'Oh!!!

Cheers - now that I've cottoned on, that is!!

- Andy
Oh. I thought you saw those and wanted a book as well. I was going to suggest the same thing bigearsbilly did. Glad you found the links.

BTW, bigearsbilly - you could edit that first link and the thread'll stop being so wiiiiide.
 
Old 01-12-2005, 03:27 AM   #11
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
doh!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Merge lines in a file using sed arobic Programming 8 01-20-2012 02:11 PM
Join all lines using sed chipix Programming 3 04-03-2007 09:55 AM
checking for duplicate lines in text files (vb.net) mrobertson Programming 11 08-01-2005 12:40 PM
Removing Duplicate Hardware Suse 9.1 oldtrout Linux - Hardware 0 12-08-2004 02:58 PM
replacement with sed: replace pattern with multiple lines Hcman Programming 5 11-18-2004 07:40 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration