ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Am trying to come to grips with sed. Have read the info page and done some scratching about in a few books, but have not been able to figure out how to remove duplicate lines in a text file, and the example program in the info pages just really didn't make sense to me.
This is the source document*:
======================
River by the deer
The crisp flower by the rain
Happy white river
River by the deer
The liquid flower
The cloud drifts across the storm
Gentle golden deer
Gentle golden deer
======================
* generously spawned by nonsense-0.6 a random text generator
What I am wishing to do using sed is to delete the two duplicate lines when I pass the source file to it and then output the cleaned text to another file, e.g. cleaned.txt
1. How can I do this using sed? I was thinking of grepping, but then I still have to delete the duplicates although grep at least would give me patterns to work with I suppose. Is it possible to do it without grep?
2. Has anyone come across a comprehensive resource for using sed that covers all these kinds of angles, because I am wanting to become more familiar with this evidently powerful tool.
From 'sed1line5.2.txt', which probably isn't the original title:
Code:
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'
# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
Originally posted by bigearsbilly
[B]Personally, I think the poem is fine as it is.
Glad you liked the poem. I had nothing to do with it beyond deliberately duplicating two or so lines churned out by "nonsense" in order to illustrate the question. That either suggests that your poetic appreciation needs brushing up or that nonsense randomly generates some great poetry!!!
Also, thanks for the link to the Sed & Awk book. For now I'll go for the pocket reference guide, but the bigger book does seem worth keeping in mind.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.