Bash

Wire323 · 12-03-2005, 07:18 PM

I have a text file full of user-submitted email addresses. I want to remove the duplicate records, but it isn't as simple as using "uniq." When I find a dupe I want to remove both of them, not just one. If it's possible I'd also like to create a text file containing all of the email addresses that had duplicates.

Is this possible?

Thanks

Wire323 · 12-03-2005, 08:50 PM

I've changed things slightly. Instead of removing them completely I'd like to leave on, and only take the dupes out. I know I can do that with uniq, but how would I know which ones were taken out so I can write them to a file?

paulsm4 · 12-03-2005, 10:14 PM

Try this:

Code:

vi x
aaa
bbb
aaa
ccc
aaa

sort x|uniq -d
aaa

Wire323 · 12-03-2005, 10:57 PM

Thanks for the reply.

I don't know if this was the best way, but I was able to do it like this:

sort participants | uniq > temp1
sort participants > temp2
comm -1 -3 temp1 temp2 > temp3
sort temp3 | uniq > outputfile

paulsm4 · 12-03-2005, 11:39 PM

Try "sort participants|uniq -d"

I suspect you'll probably get the same result (but I confess - I don't know for sure!)

Anyway, glad you got it working!

Your .. PSM

eddiebaby1023 · 12-04-2005, 08:51 AM

You can use "uniq -c" which will prefix each line with a count of the number of times the line occurred; any count greater than 1 will have been a duplicate.