Bash - Deleting duplicate records
I have a text file full of user-submitted email addresses. I want to remove the duplicate records, but it isn't as simple as using "uniq." When I find a dupe I want to remove both of them, not just one. If it's possible I'd also like to create a text file containing all of the email addresses that had duplicates.
Is this possible? Thanks |
I've changed things slightly. Instead of removing them completely I'd like to leave on, and only take the dupes out. I know I can do that with uniq, but how would I know which ones were taken out so I can write them to a file?
|
Try this:
Code:
vi x |
Thanks for the reply.
I don't know if this was the best way, but I was able to do it like this: sort participants | uniq > temp1 sort participants > temp2 comm -1 -3 temp1 temp2 > temp3 sort temp3 | uniq > outputfile |
Try "sort participants|uniq -d"
I suspect you'll probably get the same result (but I confess - I don't know for sure!) Anyway, glad you got it working! Your .. PSM |
You can use "uniq -c" which will prefix each line with a count of the number of times the line occurred; any count greater than 1 will have been a duplicate.
|
All times are GMT -5. The time now is 09:19 AM. |