LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   uniq -u : does not seem to remove duplicate lines (https://www.linuxquestions.org/questions/linux-general-1/uniq-u-does-not-seem-to-remove-duplicate-lines-747714/)

boxb29 08-15-2009 02:22 AM

uniq -u : does not seem to remove duplicate lines
 
I am trying to comb through my local.cf file and remove all the duplicate blacklist_from entries. I ran

uniq -u local.cf output.cf

It did trim about 45 lines out of the file. But, there are still many many duplicate lines. I thought maybe they were different some how, but not visible to the eye...BUT, I ran:

sort output.cf | uniq -dc

this gave me a line count output for all the dups, and there are still many many...as you can see (below).

HELP :)


root@LINUX03:/home/backups# sort output.cf | uniq -dc
3
11 #
12 blacklist_from 1800FLOWERS@e.1800flowers.com
2 blacklist_from acrane@amgacademy.com
2 blacklist_from alejmagna@hotmail.com
10 blacklist_from alerts@personals.yahoo.com
3 blacklist_from Allen_Brothers@mail.vresp.com
8 blacklist_from Borders@e.borders.com
2 blacklist_from buy.com_offers@enews.buy.com
5 blacklist_from capitalone@email.capitalone.com
2 blacklist_from customerservice@duebrightlive.info
2 blacklist_from customerservice@ehealthinsurance.com
2 blacklist_from customerservice@mymorepayhomeonline.info
2 blacklist_from customerservice@youreraseduelive.info
2 blacklist_from directv@customerinfo.directv.com
3 blacklist_from email@email.creditreport.com
8 blacklist_from email@email.hotels.com
4 blacklist_from etrade@email.etradefinancial.com
30 blacklist_from group-digests@linkedin.com
4 blacklist_from HHonors@h3.hilton.com
4 blacklist_from info@aiueducationonline.com
4 blacklist_from info@birdiebug.com
2 blacklist_from info@promo-em.jetblue.com
2 blacklist_from info@samstailor.com
2 blacklist_from invite@naymz.com
6 blacklist_from iprint@specials.iprint.com
12 blacklist_from JobAlerts@CyberCoders.com
2 blacklist_from lilly@sportsub.com
16 blacklist_from listmaster@thegolfchannel.com
7 blacklist_from mail@netapp.com
2 blacklist_from mail@news.beachcamera.com
2 blacklist_from microsoft@reply.digitalriver.com
2 blacklist_from mike.moreno_at_mbofpleasanton.com@mmserver.com
2 blacklist_from Mimosa_Systems@mail.vresp.com
6 blacklist_from movies@news.fandango.com
2 blacklist_from mwilkinson@serrahs.com
2 blacklist_from nancyp@saintmatthew.org
2 blacklist_from newsletter@reply.ticketmaster.com
2 blacklist_from notifications@email.etradefinancial.com
6 blacklist_from NutriSystem@news.nutrisystem.com
4 blacklist_from paypal@email.paypal.com
2 blacklist_from PGATOUR@pgatouremail.com
2 blacklist_from PGATOUR@weic11.com
4 blacklist_from radioshack@em.radioshack.com
2 blacklist_from Rebecca_Salie@mail.vresp.com
4 blacklist_from replies@oracle-mail.com
10 blacklist_from reply@igmemail.com
2 blacklist_from rexspelling@resumespider.com
28 blacklist_from rushinahurry@rushlimbaugh.com
2 blacklist_from sanjoseexecutives@gmail.com
2 blacklist_from store-news@amazon.com
4 blacklist_from Store-News@ShopAETV.p0.com
2 blacklist_from support@myremoveliability.info
2 blacklist_from TheHartford@weic11.com
4 blacklist_from updates@linkedin.com
4 blacklist_from update@stubhub-mail.com
2 blacklist_from ups@upsemail.com
2 blacklist_from vmwareteam@connect.vmware.com
2 blacklist_from voyages@viator.messages1.com
2 blacklist_from WebEx@weic11.com

JulianTosh 08-15-2009 02:38 AM

you must supply uniq with sorted data. try 'sort local.cf | uniq -c'

jschiwal 08-15-2009 02:42 AM

The same goes for the "comm" command.
comm -3 <(sort list1) <(sort list2)

Nevahre 08-15-2009 03:47 AM

why use 2 progrs: sort -u local.cf
sort has a unique (-u/--unique) option.....

w1k0 08-15-2009 02:05 PM

Quote:

Originally Posted by Nevahre (Post 3644150)
why use 2 progrs: sort -u local.cf
sort has a unique (-u/--unique) option.....

What command to use depends on what result you expect.

Look at that file:

$ cat file
Code:

one
two
three
two
three
three
four
four
four
four

That command removes consecutive duplicated lines:

$ uniq -u file
Code:

one
two
three
two

That command counts consecutive duplicated lines:

$ uniq -dc file
Code:

      2 three
      4 four

That commands counts occurrences of all lines:

$ sort file | uniq -c
Code:

      4 four
      1 one
      3 three
      2 two

That command does the same but sorts result taking into considerance the numbers of occurrences:

$ sort file | uniq -c | sort -nr
Code:

      4 four
      3 three
      2 two
      1 one

That command displays each unique line only once:

$ sort -u file
Code:

four
one
three
two


Nevahre 08-15-2009 02:56 PM

I know that.

I just try to say that: sort file | uniq is too long. use sort -u file

w1k0 08-15-2009 03:15 PM

I quoted in my post your comment but the entire post was directed rather to boxb29 than to you. I'm not sure what he'd like to achieve but I suppose he'd like to receive ``clean'' file including each unique line only once. If so your advice is the best solution of that problem.

boxb29 08-15-2009 06:34 PM

perfect...

Code:

sort -u local.cf > new.file
That did the trick....thanks all !


All times are GMT -5. The time now is 04:11 PM.