LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-09-2005, 11:11 AM   #1
Mr_H
LQ Newbie
 
Registered: Sep 2003
Posts: 2

Rep: Reputation: 0
Question Comparing 2 Files for Duplicates


I've searched the forums for this and I haven't found it, if someone's seen it out there, please point me in the right direction

Now, onto the problem:
I have two files:
List 1 is a list of users who are going to be deleted. (1,600 people)
List 2 is a list of users who have logged on in the past 30 days. (11,000 people)

The files are not necessarily in order (so line by line comparision doesn't work)

I need a way to compare these two files to look for Duplicates and print those out, IE: Is there someone on List 1 who is also on List 2?

I've looked at diff, it just shows the differences (of which there are a lot obviously).
Comm does line by line, cmp ditto.

Any suggestions from folks on what I can do here? It's probably something obvious that's shooting right over my head and I do apologize if it's a stupid question, but it's driving me nuts.

Thanks!
H
 
Old 11-09-2005, 11:31 AM   #2
Ynot Irucrem
Member
 
Registered: Apr 2005
Location: Perth, Western Australia
Distribution: Debian
Posts: 233

Rep: Reputation: 30
look into sort. then you can do a line by line comparison.
 
Old 11-09-2005, 11:40 AM   #3
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 66
This should do the trick:
Code:
sort list1.txt list2.txt | uniq -d
Yves.
 
Old 11-09-2005, 11:55 AM   #4
Ynot Irucrem
Member
 
Registered: Apr 2005
Location: Perth, Western Australia
Distribution: Debian
Posts: 233

Rep: Reputation: 30
isnt sorting redundant there? wouldn't
Code:
cat file1.txt `echo` file2.txt | uniq -d
be faster? yes i know i suggested sorting, but i didn't know about the uniq command until I just looked it up. (damn linux... there's too many useful utilities
 
Old 11-09-2005, 12:17 PM   #5
Mr_H
LQ Newbie
 
Registered: Sep 2003
Posts: 2

Original Poster
Rep: Reputation: 0
Thanks folks, much appreciated! The sort one seems to have worked (the cat one gave me only four responses and 3 were the same)

Anyhow, helped out a lot
 
Old 11-09-2005, 12:43 PM   #6
Ynot Irucrem
Member
 
Registered: Apr 2005
Location: Perth, Western Australia
Distribution: Debian
Posts: 233

Rep: Reputation: 30
hmm.. i didn't read that man page properly. and I was thinking backwards. the way i was thinking, it should have been
Code:
echo -e "`file1.txt`\n`cat file2.txt`" | uniq -d
but i didn't see this part:
Quote:
Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first.

Last edited by Ynot Irucrem; 11-09-2005 at 12:46 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
comparing lots of files Frustin Linux - General 4 09-22-2005 02:54 PM
Using diff for comparing 2 files beep Programming 5 01-21-2005 12:51 PM
Comparing files contents? hhegab Linux - Newbie 3 07-01-2004 12:45 AM
Comparing 2 Files xianzai Programming 2 05-23-2004 11:50 AM
Comparing files on creation time StarTux Programming 2 08-29-2003 01:08 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:17 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration