LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   bash matching two files lines by lines (https://www.linuxquestions.org/questions/programming-9/bash-matching-two-files-lines-by-lines-923057/)

rperezalejo 01-10-2012 08:42 AM

bash matching two files lines by lines
 
hello people, i have two files,

one with
a,b,c
d,e,f

the other
1
2

I need to do get in one file something like
a,1
b,1
c,1
d,2
e,2
f,2

i did a nested for loop but it doesn't work, and i have no idea of who to solve these problem. I tried to read by a line number but no succeeded.

thank you very much.....

hugs

MensaWater 01-10-2012 08:56 AM

Assuming both files have the same number of lines you could use sdiff to do side by side comparision then massage the output to get what you want:

Code:

sdiff file1 file2 |awk '{print $1","$3}
'

rperezalejo 01-10-2012 10:39 AM

the problem with that is that the rows in the fist file are of different length, so the $3 in not in all the row the same column.
I did something like these, but is wrong because it reads all the letters for each number, and what i want is for each row, read the other row.

Code:

cat $temp_num | while read num; do
        cat $temp_letters | while read lett; do
                IFS=,\
                array=( $lett )
                echo ${#array[@]}
                for ((i=0; i<${#array[@]}; i++))
                do
                        echo "( '${array[i]}', '$num')," >> $query
                done
        done       
done

hugs

ntubski 01-10-2012 02:05 PM

awk:
Code:

awk -F, -vnum_file="$temp_num" '{getline num < num_file;
  for (i=1; i <= NF; i++) printf("%s,%d\n", $i, num) }' "$temp_letters"

just bash:
Code:

# open files for reading
exec 3< "$temp_num" 4< "$temp_letters"

while read -u3 num && IFS=, read -u4 -a letters ; do
    printf "%s,$num\n" "${letters[@]}"
done

# close files
exec 3<&- 4<&-


MensaWater 01-10-2012 03:28 PM

Quote:

Originally Posted by rperezalejo (Post 4571134)
the problem with that is that the rows in the fist file are of different length, so the $3 in not in all the row the same column.

What I wrote works for the data you provided initially. Note that the $3 is NOT the position in either file but rather the position in sdiff output.

sdiff will output data like:

file1line1 | file1line1
file1line2 | file2line2

In the above file1line# = $1, the pipe sign (|) = $2 and file2line# = $3.

So for your data: a,b,c and d,e,f are $1 and 1 and 2 are $3 for their respective lines.

Therefore what I suggested would work IF the data was formatted as you originally indicated (i.e. comma delimited with no spaces). If you are saying you have variable length lines with embedded whitespace in file1 or file2 then you're correct it wouldn't work.

rperezalejo 01-11-2012 01:46 PM

Yes my friend, with the example i wrote it works, but i forgot to put the lines with different length, sorry about that, i found the solution with your help and with the other partner who wrote.

Code:

exec 3< "$temp_pkg" 4< "$temp_class"
while read -u3 pkg && IFS=, read -u4 -a class ; do
        for ((i=0; i<${#class[@]}; i++))
        do
                temp=${class[i]#\ }
                echo "('$temp', '$pkg')," >> $query
        done       
done
exec 3<&- 4<&-

more or less the same, i used both the "for" and "printf" solutions for educational reasons, some time printf is complex to understand.

thank you very much

David the H. 01-12-2012 06:42 AM

That's a neat way to use file descriptors. Thanks for posting it!

By the way, bash has the parameter substitution pattern "${!array[@]}", which outputs a list of all existing indexes for the array. You can use it instead of the c-style loop.

Code:

for i in "${!class[@]}"; do


All times are GMT -5. The time now is 04:26 AM.