LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Comparing files and copying differences (https://www.linuxquestions.org/questions/linux-newbie-8/comparing-files-and-copying-differences-720948/)

gregmcc 04-22-2009 11:35 AM

Comparing files and copying differences
 
I've got 2 files - File1 and File2

File1:
abc:123
def:456
tex:765

File2:
abc:567

What I would like to do is compare the first column (up to the : ) in the files and write the changes to File2

In other words check abc in File1 against File2, is it exists in File2 then skip, otherwise append it to the file, then check def, then tex etc etc

So once the script has run File2 would contain:

abc:567
def:456
tex:765

Any ideas - I've played around with awk and for loops but don't seem to be getting anywhere :(

kapilsingh 04-22-2009 01:04 PM

I don't have complete solution for you but I suggest,

by using uniq command I have found,
(for your given example)
uniq file1 file2

after that content of file2

abc:123
def:456
tex:765
I think you want 567 in place of 123.
"comm" command may be helpful for you.

Thanks
Kapil Singh

gregmcc 04-22-2009 01:23 PM

You might be onto something but the problem is with uniq and comm is that it will compare the whole line.

I only want to compare the first field - semicolon delimited.

Update: After many hours of searching I came across this post which does the job!!! :)

http://www.unix.com/shell-programmin...n-2-files.html

jf.argentino 04-22-2009 01:57 PM

maybe by using temporary files filled with the gawk command?

Quigi 04-22-2009 02:23 PM

Quote:

Originally Posted by gregmcc (Post 3517431)
In other words check abc in File1 against File2, is it exists in File2 then skip, otherwise append it to the file, then check def, then tex etc etc

You don't specify, so I'll infer from your example that the keys are in ascending order in both input files. Then this will write the desired result to stdout:
Code:

sort -t: -msuk1,1 File2 File1
:study: man sort.

To update File2, don't directly redirect output, because that would truncate File2 before sort reads it. Rather,
Code:

sort -t: -msuk1,1 File2 File1 > tmp
mv tmp File2

More generally, I think your objective is to merge two "associative arrays" (AKA "dictionaries" in PostScript or "hashes" in Perl). Do you care to tell us why you want to do this?

More robustly and flexibly than the above "sort" call, you could use Perl to build up the hash %t, then write it all out at the end. Note the opposite order of arguments:
Code:

perl -we 'while(<>) {($k,$v)=split /:/, $_, 2; $t{$k}=$v;} while (@e = each %t) {print join ":", @e}' File1 File2
Or, if you want the output rows ordered (and with nicer names):
Code:

perl -we 'while(<>) {($key,$value)=split /:/, $_, 2; $hash{$key}=$value;} for (sort keys %hash) {print $_, ":", $hash{$_}};' File1 File2
Quote:

I've played around with awk and for loops but don't seem to be getting anywhere
I learned csh scripting, and awk, and sed, and (ba)sh. When I came across Perl, I realized that was the one tool I should have leared in the beginning. Give it a try!

Quote:

Originally Posted by kapilsingh
uniq file1 file2

That keeps the unique lines from file1 (only!) and overwrites file2. Not the solution. Also, as gregmcc points out, this looks at the whole line.

You can tell uniq to only heed the first 3 characters on each line. That could do in the example, but it's not really colon-delimited.

/Christian

gregmcc 04-22-2009 03:07 PM

Thanks for the reply.

I should have specified - The keys are in a random order

Quote:

Originally Posted by Quigi (Post 3517554)
You don't specify, so I'll infer from your example that the keys are in ascending order in both input files. Then this will write the desired result to stdout:
Code:

sort -t: -msuk1,1 File2 File1
:study: man sort.

I tried this and it works great if the files are already sorted.

Quote:

More generally, I think your objective is to merge two "associative arrays" (AKA "dictionaries" in PostScript or "hashes" in Perl). Do you care to tell us why you want to do this?
File1 is on one server and File2 is on another server. I want to keep File2 up to date with new info that is added to File1. But I couldnt do a copy or rsync as the file content is not exactly the same.

I ended up using this:

Code:

awk -F ":" 'BEGIN{while(getline<"/tmp/file1") a [$1]=1 } ; a [$1] !=1 {print $0 } ' /tmp/file2 > /tmp/file.diff
Still not 100% sure what it does but it works :)

Libu 04-22-2009 03:20 PM

How about
Quote:

grep -v `cut -d":" -f1 File2` File1 >> File2

Quigi 04-23-2009 12:54 PM

Quote:

Originally Posted by gregmcc (Post 3517590)
Thanks for the reply.

I should have specified - The keys are in a random order
I tried this and it works great if the files are already sorted.

As you probably saw in the man page, "-m" tells sort that the files are already sorted. If they aren't, simply drop the "m", and sort will order them. The keys will be in order in the output. I can't tell from your example if that's a problem.

Or use one of the Perl one-liners.

Quote:

File1 is on one server and File2 is on another server. I want to keep File2 up to date with new info that is added to File1. But I couldn't do a copy or rsync as the file content is not exactly the same.
OK, makes sense.


All times are GMT -5. The time now is 12:14 PM.