[SOLVED] Comparing two fields in two files using Awk.
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
But the problem I am facing is, when the code encounters complete mismatch, i.e. when no entries in $3 of file2 match with $7 of file1, ideally i should get a blank outfile.
Thanks a lot Tink
That was great.. it was just a simple getline, n i was trying to improve the code for the past 8 hours..Jeez !
@grail
When I have a list(a) of files to be matched with files in list(b)
Say I have 4 files in each list.
Quote:
List(a) List(b)
A 1
B 2
C 3
D 4
1 is matched with A --> if match then print result
2 is matched with B --> if match then print.. But its Blank, coz there isnt any match in both files
again.. 2 is matched with C !
3 is matched with D !
Line specificity is lost.
Anyway, I got the result. Thanks tink again
You see this line in file2 : #=GS D3DN80_HUMAN 40-478 AC D3DN80.1
and this line in file 1: Tmp39 PF10271.3 423 ENSP00000326063 488 1.2e-201 41-478
40-478...and 41-478 is the same.
However i tried removing the field separators and matching the lines containing similar fields. Face some weird erroneous results.
$cat Unipfam6
#=GF ID 2-oxoacid_dh
#=GF AC PF00198.17
#=GS Q86SW4_HUMAN 203-279 AC Q86SW4.1
#=GS Q86TW7_HUMAN 136-251 AC Q86TW7.1
#=GS Q86TQ8_HUMAN 132-307 AC Q86TQ8.1
#=GS Q16187_HUMAN 218-449 AC Q16187.1
#=GS Q6IBS5_HUMAN 220-451 AC Q6IBS5.1
#=GS B7Z5W8_HUMAN 134-365 AC B7Z5W8.1
#=GS Q86YI5_HUMAN 417-647 AC Q86YI5.1
#=GS B4DLQ2_HUMAN 198-428 AC B4DLQ2.1
#=GS Q01991_HUMAN 81-220 AC Q01991.1
#=GS B4DS43_HUMAN 188-418 AC B4DS43.1
#=GS B4DJX1_HUMAN 361-591 AC B4DJX1.1
#=GS B4DW62_HUMAN 123-274 AC B4DW62.1
#=GS D3DR11_HUMAN 272-501 AC D3DR11.1
#=GS B4E1Q7_HUMAN 67-298 AC B4E1Q7.1
#=GS Q5VVL7_HUMAN 248-317 AC Q5VVL7.1
I get a single result with this code.
When field separator "-" is replaced with " ", the no. fields increase and they are different in different files. How ?
Here it is:
Quote:
2-oxoacid_dh PF00198.17 231 ENSP00000445698 301 3.7e-85 69-29
For this no. fields after replacement = 10
So i really cant use $10 or $9 to match the lines.
I think I will make a field separator unique to $7 so that the hyphen thing wont be a problem.
I'll be back with the final code.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.