[SOLVED] Transpose multiple rows into a single column

wonjusup · 04-07-2011, 12:58 PM

I need to transpose a file with over a 1000 rows of 5 columns of numbers into a file with a single column of numbers. The numbers are separated by a single space and range from one digit to 5 digits each. I tried using awk, but can only get it to grab one column of numbers. Thanks!

Input:
1 2 3 4 50
600 7 8 9000 10
11 12000 13 14 15

Desired output:
1
2
3
4
50
600
7
8
9000
10
11
12000
13
14
15

Tried using: awk '{split($0,a,""); print $NF}' <filename>

and got:
50
10
15

It only grabbed the last number in each row.

arochester · 04-07-2011, 01:17 PM

Homework?

David the H. · 04-07-2011, 01:34 PM

Doesn't sound too much like homework to me. Could be wrong though, of course.

In any case this is a simple job. All you really need to do is convert all spaces to newlines.

Code:

tr -s "[:space:]" "\n" <file

The -s "sqeezes" multiple consecutive instances into one, which in this case will remove any extra trailing spaces or blank lines. Possibly not necessary here.

PS: Please use [code][/code] tags around your code, to preserve formatting and to improve readability.

mayursingru · 04-07-2011, 01:38 PM

Hi wonjusup,
I am rookie to awk but i came up with this:
awk '{FS=" ";OFS="\n"}{print $1, $2, $3, $4, $5}' filename
This is not a homework i assume.

Regards,
Mayur Singru

wonjusup · 04-07-2011, 02:20 PM

David and Mayur..thanks for your help. Both of your methods seemed to work. It's definitely not homework. I'm new to Linux and appreciate the help!

chrism01 · 04-07-2011, 06:25 PM

Pure shell soln

Code:

for num in $(cat t.t)
do
    echo $num
done

David the H. · 04-07-2011, 06:44 PM

cat isn't a built-in, at least for bash, so that's not quite a pure shell solution. But this is...

Code:

for num in $(<file)
do
    echo $num
done

Note that it also depends on your IFS being set to wordbreak on spaces (which it is by default).

chrism01 · 04-07-2011, 08:38 PM

That'll teach me to pay attention to what I'm saying; you're correct about cat. I just get to use it a lot at work, so it becomes a "built-in" for me

David the H. · 04-08-2011, 11:14 AM

Just for the fun of it, here's one more solution using built-ins, as long as the file is small enough to fit into available memory.

First the simple version.

Code:

content=$(<file)
echo "${content//[[:space:]]/$'\n'}"

This simply puts the contents of the file into a variable, then replaces spaces with newlines using parameter substitution as you echo it back out.

It does require the extquote shell option to be enabled first, but this is enabled by default. It's also limited in that it won't compress spaces (like tr minus the -s option).

You can work around these limitations with a couple of additions.

Code:

(
shopt -s extquote extglob
content=$(<file)
echo "${content//+([[:space:]])/$'\n'}"
)

First, you can run the code in a (..) subshell, which allows you to modify its local environment without affecting the parent shell. Not that there are problems with enabling either of the options below in your main shell, of course.

Then we enable extquote as mentioned above, and extglob, which activates extended globbing, so we can match multiple spaces/lines at once.

Other than that, it's basically the same as the above.

grail · 04-08-2011, 11:28 AM

How about we just make it easy

Code:

awk 'RT' RS=" " file

David the H. · 04-08-2011, 12:06 PM

But where's the fun in that?

Seriously though, could you explain that awk command? I've never seen one like that before. What's 'RT'? And why do you set RS after it?

grail · 04-09-2011, 01:18 AM

Hi David

No probs

You can set any of the internal variables after the main code or use the '-v' switch. In this instance it was just cleaner to do it after

RT - again an internal variable and it gets set to the value of RS used. Now in this case it is a simple space but should you choose to make RS a regex
then RT will be whatever is matched, ie RS="a*", then here RT could be a, aa, aaa, etc. The reason for its use here is two fold:

1. Any true expression yields a print to happen
2. There is no space after the last digit hence there are no more records so no extra print done. To test this, replace RT with 1 and see the difference

Let me know if you need anymore?

David the H. · 04-09-2011, 06:09 AM

I appreciate the explanation. It took a minute, but now I get it.

I'd checked the grymoire tutorial for RT, but I see now that it's a gawk extension not listed there. So RT holds the value of the current record, as defined by RS. And setting internal variables after the main expression is new to me. I think I like that.

Which all means that you just told it to print each record in turn, as delimited by spaces. However...

Quote:

2. There is no space after the last digit hence there are no more records so no extra print done. To test this, replace RT with 1 and see the difference

I understand this, but I'm also getting the opposite problem. When I use RT, the final record gets dropped, apparently because there's no separator at the end to define the record. For example:

Code:

$ cat file.txt
1 2 3 4 5
6 7 8 9 10
$ awk 'RT' RS=" " file.txt
1
2
3
4
5
6
7
8
9
$ awk '1' RS=" " file.txt
1
2
3
4
5
6
7
8
9
10
		#extra space at the end
$

If I tack a space onto the end of the final line, then the RT version outputs as expected, and you get double spaces with regular print.

This then, should be more robust, as it covers any number of space/tab/newline delimiters.

Code:

awk 'RT' RS="[ \t\n]+" file

(or more traditionally...)

awk -v RS="[ \t\n]+" 'RT' file

And when newline is included in RS, there seems to be no difference between using RT and the default print.

grail · 04-09-2011, 06:53 AM

Quote:

And when newline is included in RS, there seems to be no difference between using RT and the default print.

Good catch

I had not used it in such a simple context before as usually reserve for when RS is a regex and hence after last
entry is found the following record is often not required.