[SOLVED] awk comparing first columns of two files

anapaula · 11-17-2011, 03:34 PM

Hi, folks...

i have two files;

file1;

xcvfdertyu
asdcvfgtre
asdfghnbvc
werdfcvbgf

file2;
xcvfdertyu dfdsdgfdsgsdgsdfdsfdfsfsdfdsfgsd
asdcvfgtre sdfgsddfsdfdsfdsfsdfdsfgdsfgdsfsd
asdfghnbvc sdfsdffgsdfsdfdsfdsfsdsafdsgrhbfh
werdfcvbgf xxcvcvdssdfeafefaefrertthythmjnhm

I need compare if all columns of file1 exist in file2, and if exist print

first column file1, first and second column file2 and so on

xcvfdertyu xcvfdertyu dfdsdgfdsgsdgsdfdsfdfsfsdfdsfgsd
asdcvfgtre asdcvfgtre sdfgsddfsdfdsfdsfsdfdsfgdsfgdsfsd
asdfghnbvc asdfghnbvc sdfsdffgsdfsdfdsfdsfsdsafdsgrhbfh
werdfcvbgf werdfcvbgf xxcvcvdssdfeafefaefrertthythmjnhm

I tried this:
{
while (getline < ARGV[1]) {
field1 = $1;
while (getline < ARGV[2]) {
field2 = $1;
field3 = $2;
if (field1==field2) {
print field1, field2, field3; {

}
}
}
}
}
but its returns only the first line

Thanks a lot

Ana Paula

colucix · 11-17-2011, 03:50 PM

Indeed two nested while getline loops are not the solution, because after the first loop reads the first line of file1, the second loop immediately reads all the content of file2, so that only the first matching line from file1 is processed.

I would try something like this: first let awk read all the content of file1 and store each line as index of an array; second let awk read the second file and check if $1 is an index of the array:

Code:

FNR == NR {
  _[$1]++
}

FNR < NR {
  if ( $1 in _ ) print $1, $0
}

This uses the difference between the internal variables FNR and NR to distinguish between the two files. Run as:

Code:

awk 'FNR == NR {_[$1]++} FNR < NR {if ( $1 in _ ) print $1, $0} file1 file2

that is by passing file1 as first argument. Hope this helps.