[SOLVED] Transpose multiple rows into a single column
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I need to transpose a file with over a 1000 rows of 5 columns of numbers into a file with a single column of numbers. The numbers are separated by a single space and range from one digit to 5 digits each. I tried using awk, but can only get it to grab one column of numbers. Thanks!
Doesn't sound too much like homework to me. Could be wrong though, of course.
In any case this is a simple job. All you really need to do is convert all spaces to newlines.
Code:
tr -s "[:space:]" "\n" <file
The -s "sqeezes" multiple consecutive instances into one, which in this case will remove any extra trailing spaces or blank lines. Possibly not necessary here.
PS: Please use [code][/code] tags around your code, to preserve formatting and to improve readability.
Last edited by David the H.; 04-07-2011 at 01:37 PM.
Reason: rewording for clarity
This simply puts the contents of the file into a variable, then replaces spaces with newlines using parameter substitution as you echo it back out.
It does require the extquote shell option to be enabled first, but this is enabled by default. It's also limited in that it won't compress spaces (like tr minus the -s option).
You can work around these limitations with a couple of additions.
First, you can run the code in a (..) subshell, which allows you to modify its local environment without affecting the parent shell. Not that there are problems with enabling either of the options below in your main shell, of course.
Then we enable extquote as mentioned above, and extglob, which activates extended globbing, so we can match multiple spaces/lines at once.
Other than that, it's basically the same as the above.
You can set any of the internal variables after the main code or use the '-v' switch. In this instance it was just cleaner to do it after
RT - again an internal variable and it gets set to the value of RS used. Now in this case it is a simple space but should you choose to make RS a regex
then RT will be whatever is matched, ie RS="a*", then here RT could be a, aa, aaa, etc. The reason for its use here is two fold:
1. Any true expression yields a print to happen
2. There is no space after the last digit hence there are no more records so no extra print done. To test this, replace RT with 1 and see the difference
I appreciate the explanation. It took a minute, but now I get it.
I'd checked the grymoire tutorial for RT, but I see now that it's a gawk extension not listed there. So RT holds the value of the current record, as defined by RS. And setting internal variables after the main expression is new to me. I think I like that.
Which all means that you just told it to print each record in turn, as delimited by spaces. However...
Quote:
2. There is no space after the last digit hence there are no more records so no extra print done. To test this, replace RT with 1 and see the difference
I understand this, but I'm also getting the opposite problem. When I use RT, the final record gets dropped, apparently because there's no separator at the end to define the record. For example:
And when newline is included in RS, there seems to be no difference between using RT and the default print.
Good catch I had not used it in such a simple context before as usually reserve for when RS is a regex and hence after last
entry is found the following record is often not required.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.