LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 08-17-2022, 04:59 AM   #1
TempleStone4510!
LQ Newbie
 
Registered: Mar 2019
Posts: 6

Rep: Reputation: Disabled
transpose selected groups of rows of columns into multiple columns (arrays) with other columns retained to single row-group


Hello,

Wish to know how to do the following [have been trying awk / some program functions (datamash in Ubuntu), plus what has been commented regarding columns to rows and vice versa]:

CSV file having N number of columns, and is sort numerically, with them in groups (e.g. consider column 1 listing the groups small to big), and other columns which change value within that group (an "array"). I am trying to output 1 line per group, and transpose the columns with changing values within a group to rows. Have attached a screen image and CSV (text) files of input and desired output.

Please note - Have shortened the examples: each group can consist of 10's of rows, and a number of columns which are to be transposed into 1 row-multiple columns, and all the other columns which don't change in a group to just 1-row-1-column.

In these files, have given 2 columns which need to be expanded (Level RM), and max of 4 lines in a group. The files I'm trying to change are few hundred lines and several columns with up to 10 (or so & varies group by group) rows which are to be transposed to columns.


I am not yet proficient in Linux: have used it very basically several years ago: simple grep awk sed.
If anyone can show me how to get the examples working that would great - I can then try expanding it from there to the larger files.
Attached Thumbnails
Click image for larger version

Name:	Transpose_Query.jpg
Views:	25
Size:	246.0 KB
ID:	39409  
Attached Files
File Type: txt _Input_csv.txt (649 Bytes, 19 views)
File Type: txt _Output_csv.txt (426 Bytes, 18 views)
 
Old 08-17-2022, 05:42 AM   #2
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,972

Rep: Reputation: 271Reputation: 271Reputation: 271
I'd use tr with , as the delimiter. I also might just open it in a fancier spreadsheet program (I'd use Open Office) to do it.
 
Old 08-18-2022, 12:32 AM   #3
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,377

Rep: Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757
Your example _input_csv.txt file has DOS convention line endings. If you want to process such files using Linux tools, then first convert them. e.g. 'fromdos < _input_csv.txt > input.csv'.

It is easy to combine the groups into a single row using awk.
Code:
awk -F ',' '{if ($1==last) {printf "%s%s",",",$0} else {printf "%s%s","\n",$0};last=$1; next} END {printf "%s", "\n"}' _input.csv > _output.csv
As @RandomTroll suggests, using a spreadsheet program is probably the easiest way to reformat to your desired final output.
 
Old 08-18-2022, 05:35 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,145

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
If only it was that simple - the OP seems to want to merge fields (not whole records) based on value(s) varying.
Ugh.

Looks like one would have to provide the number of (max) fields per each one that varies (4 for original column 2, 3 for original column 4 ...) - either provide it all as input, or parse the entire file to calculate them all, then (re-)process the file.
Bigger Ugh.
 
Old 08-18-2022, 08:21 AM   #5
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,377

Rep: Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757
It is not hard to delete or move entire columns in a spreadsheet. I am suggesting awk to relieve the tedious concatenation of the group data into one row, which I think is what the OP most wants to avoid.
 
Old 08-18-2022, 08:58 AM   #6
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555

Where does the source data come from?

If it's a database query which is exported to CSV, the transposing can be done in SQL, and will likely be more efficient.

Otherwise, if you're stuck with CSV data, Miller will be able to slice/transpose/join/etc to get the data in the format desired.

 
Old 08-18-2022, 10:24 AM   #7
TempleStone4510!
LQ Newbie
 
Registered: Mar 2019
Posts: 6

Original Poster
Rep: Reputation: Disabled
Will try suggestions.

Many thanks for the posts - will try this weekend with the suggestions and reply back.

FYI - boughtonp, the files are generated as XYZ (csv) files, inversion output from a windows-based program, so output is not already in a database type program.

Trying to avoid spreadsheets as much as possible.

Last edited by TempleStone4510!; 08-18-2022 at 10:26 AM.
 
Old 08-18-2022, 01:03 PM   #8
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555

I was curious and trying to get Miller to do this (because I've not actually used it before), but got nowhere with it. Just trying to follow the docs was painful.

So I had a look at GNU Datamash (which I've not previously encountered), and it took very little time to come up with this:

Code:
$ datamash -t, -H --output-delimiter=$'\t' --full -g 1 count 2,4 collapse 2,4 < _Input_csv.txt
HPlt    Level   Stat    RM      X       Y       count(Level)    count(RM)       collapse(Level) collapse(RM)
1001.1  108.339 1762.5  8.54    700050.21       8000022.24      3       3       108.339,72.177,-288.725 8.54,8.98,8.16
1023    107.919 1790    6.99    7000340.79      8000041.51      1       1       107.919 6.99
1040    108.129 1812.5  9.61    7000504.18      8000246.15      3       3       108.129,72.01,30.649    9.61,8.46,8.37
1121    108.968 1837.5  9.87    7002001.33      8000489.73      4       4       108.968,72.251,30.57,-16.819    9.87,8.42,7.92,8.25
2102.525        109.807 1862.5  7.7     7004300.61      8001002.77      2       2       109.807,72.489  7.7,7.06
That's not the desired output format yet, but it does have all the necessary data into individual rows.

Next step would be to identify the max values of the "count" columns, rename the "collapse" headers into (e.g. "RM1,RM2,RM3,RM4"), add any necessary blanks to the corresponding rows (i.e. suffix max-count commas), then output just the desired fields and switch output delimiter back to commas - all of which is something Awk can do.

Looking at the other stuff Datamash can do, it looks like a pretty cool tool.


Last edited by boughtonp; 08-18-2022 at 01:05 PM.
 
Old 08-18-2022, 01:21 PM   #9
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,710

Rep: Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972
Quote:
Originally Posted by TempleStone4510! View Post
Many thanks for the posts - will try this weekend with the suggestions and reply back. FYI - boughtonp, the files are generated as XYZ (csv) files, inversion output from a windows-based program, so output is not already in a database type program.
Would help if you said what this 'windows-based' program is...may be some options there.
Quote:
Trying to avoid spreadsheets as much as possible.
If the data is consistent, I'd personally create a MySQL database table, use the built-in CSV importer, then I could output the data in a LOT of different ways. Along with keeping archival data, it gives a ton of flexibility later on, if this is something you have to do often.
 
1 members found this post helpful.
Old 08-18-2022, 05:06 PM   #10
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555
Quote:
Originally Posted by TB0ne View Post
If the data is consistent, I'd personally create a MySQL database table, use the built-in CSV importer, then I could output the data in a LOT of different ways. Along with keeping archival data, it gives a ton of flexibility later on, if this is something you have to do often.
That's definitely something to consider - even if they don't have/want a MySQL server running, it wouldn't take much to write a simple SQLite import script.

 
Old 08-18-2022, 05:16 PM   #11
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,710

Rep: Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972Reputation: 7972
Quote:
Originally Posted by boughtonp View Post
That's definitely something to consider - even if they don't have/want a MySQL server running, it wouldn't take much to write a simple SQLite import script.
Agreed; I'd use MySQL, but only because I'm more familiar with it, but there is a ton out there on how to import things into a DB. And a script to read things/output wouldn't be difficult, even in bash, but you'd still be able to use python, perl, or any other language as well. And also gives you the ability to use ODBC into a spreadsheet, or any number of web-based tools to visualize data.

Just a ton more flexibility. However...it depends on the consistency of the input data, how much, and how often this needs to be done. If it's a few times for a class project...bite the bullet, spend an hour or so in a spreadsheet and call it done.
 
Old 08-18-2022, 11:18 PM   #12
elgrandeperro
Member
 
Registered: Apr 2021
Posts: 415
Blog Entries: 2

Rep: Reputation: Disabled
I think this works. It uses 2 arrays (level and rm) to store the variable number fields; I set the max to 4 with numlevel and numrm but it could be passed in. The data file I just called data; I removed the header by hand but of course you could easily skip it or if the app produces it, use it to set the number of fields.

Code:
#!/bin/bash
# numlevel and numrm are the max number of each field, could be passed in as an arg.
awk -F, \
 'BEGIN {last="";numlevel=4;numrm=4;i=1;j=1;}
function printline() {
 printf ("%s",hpit);
 # dump the num array
 for (i=1;i<=numlevel;i++) {
	if (level[i]=="") {
		printf(",*");
	} else {
		printf(",%s", level[i]);  
	}
 }
 printf (",%s",stat);
 # dump the rm array
 for (i=1;i<=numrm;i++) {
	if (rm[i]=="") {
		printf(",*");
	} else {
		printf(",%s", rm[i]);  
	}
 }
 printf (",%s,%s",xval,yval);
 printf ("\n");
 # clear our 2 arrays
 delete (level);
 delete (rm);
}
 # if we see a change on field one, dump the line and reset, but also the first record we should not print
 { FS=","; if (($1!=last) && (NR!=1)) {printline(); delete (level); delete(rm); i=1; j=1;} 
 hpit=$1;
 level[i++]=$2;
 stat=$3;
 rm[j++]=$4;
 xval=$5;
 yval=$6;
 last=$1;
 }
 # output the last line
 END {printline(); printf ("\n");}' data

Last edited by elgrandeperro; 08-19-2022 at 01:58 PM.
 
3 members found this post helpful.
Old 08-19-2022, 12:01 AM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,145

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
Cheers - I was going to look at this after walking the mutt, but Kelly Slater tearing up Taehupo'o got in the way ...

Maybe over the weekend ...

Last edited by syg00; 08-19-2022 at 12:03 AM.
 
Old 08-20-2022, 02:46 AM   #14
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,377

Rep: Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757Reputation: 2757
If you are going to script a solution, then you might as well add code to accept the data file name, do an initial run through the data file to find the maximum number of levels required and use that to print a header as well as format the lines.
Code:
#!/bin/bash

if [[ -z $1 ]]; then echo "Usage: $0 <data file>"; exit; fi
if [[ -f $1 ]]; then  data=$1; else echo  "No data file found. Exiting."; exit; fi

# Determine maximum number of levels
numlevel=$(awk -F ',' ' \
  {if (NR!=1) level[$1]++} \
  END {for (i in level) \
    if (level[i] > numlevel) {numlevel=level[i]}; \
      print numlevel}' $data)

# numlevel and numrm are the max number of each field, could be passed in as an arg.
awk -F ',' -v numlevel=$numlevel '
BEGIN {last="";numrm=numlevel;i=1;j=1
  printf("%s", "HPit")
  for (i=1;i<=numlevel;i++) {printf(",%s%s", "Level", i)}
  printf(",%s", "Stat")
  for (i=1;i<=numlevel;i++) {printf(",%s%s", "RM", i)}
  printf(",%s,%s%s", "X", "Y", "\n")
}
function printline() {
 printf ("%s",hpit);
 # dump the num array
 for (i=1;i<=numlevel;i++) {
        if (level[i]=="") {
                printf(",*");
        } else {
                printf(",%s", level[i]);
        }
 }
 printf (",%s",stat);
 # dump the rm array
 for (i=1;i<=numrm;i++) {
        if (rm[i]=="") {
                printf(",*");
        } else {
                printf(",%s", rm[i]);
        }
 }
 printf (",%s,%s",xval,yval);
 printf ("\n");
 # clear our 2 arrays
 delete (level);
 delete (rm);
}
 # if we see a change on field one, dump the line and reset, but also the first record we should not print
 { if (($1!=last) && (NR>2)) {printline(); delete (level); delete(rm); i=1; j=1; }}
 { if (NR>1) { hpit=$1;
 level[i++]=$2;
 stat=$3;
 rm[j++]=$4;
 xval=$5;
 yval=$6;
 last=$1; }
 }
 # output the last line
 END {printline(); printf ("\n");}' $data
PS - I have taken the liberty of changing the logic slightly to avoid printing the first record.
 
2 members found this post helpful.
Old 08-21-2022, 01:56 PM   #15
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555

Here's the Datamash+Awk version - haven't bothered making the level/rm field numbers dynamic - easy to plug that in if necessary.

Code:
filename="_Input_csv.txt"

datamash -t, -H --output-delimiter=$'\t' -g 1,3,5,6 count 2,4 collapse 2,4 < "$filename" \
| awk -vOFS=',' 'BEGIN {num_level=4;num_rm=4}
NR==1 {
  for(r=1;r<=num_level;r++) h_level = h_level",Level"r
  for(r=1;r<=num_rm;r++) h_rm = h_rm",RM"r
  print "HPlt" h_level ",Stat" h_rm ",X,Y"
}
NR>1 {
  pad_level="";pad_rm="";
  for(r=$5;r<num_level;r++) pad_level = pad_level",*";
  for(r=$6;r<num_rm;r++) pad_rm = pad_rm",*";
  print $1,$7 pad_level,$2,$8 pad_rm,$3,$4
}'
Result matches the _Output_csv.txt:
Code:
HPlt,Level1,Level2,Level3,Level4,Stat,RM1,RM2,RM3,RM4,X,Y
1001.1,108.339,72.177,-288.725,*,1762.5,8.54,8.98,8.16,*,700050.21,8000022.24
1023,107.919,*,*,*,1790,6.99,*,*,*,7000340.79,8000041.51
1040,108.129,72.01,30.649,*,1812.5,9.61,8.46,8.37,*,7000504.18,8000246.15
1121,108.968,72.251,30.57,-16.819,1837.5,9.87,8.42,7.92,8.25,7002001.33,8000489.73
2102.525,109.807,72.489,*,*,1862.5,7.7,7.06,*,*,7004300.61,8001002.77
 
1 members found this post helpful.
  


Reply

Tags
csv



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Transpose only few columns to rows in csv arjun_m Programming 16 07-01-2015 06:26 PM
how to make the output in multiple row become single row? ahpin Linux - Newbie 4 07-19-2013 07:49 AM
[SOLVED] need to make a shell script, transpose a matrix, and average across rows JJB83 Linux - Newbie 6 02-24-2012 02:39 AM
[SOLVED] Transpose multiple rows into a single column wonjusup Linux - Newbie 13 04-09-2011 06:53 AM
Paste each single row from n separate text files into a new file containing n rows Mike_V Programming 11 04-27-2009 11:51 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 01:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration