Convert row to column

micyew · 06-27-2012, 09:21 PM

Hi,

Not sure can I do this in awk....

I have the following text file where each field in their own row..

Device ID: SEP001121111111
IP address: 1.1.1.1
Platform: Cisco IP Phone 7970, Capabilities: Host Phone Two-port Mac Relay
Device ID: SEP00115C222222
IP address: 2.2.2.2
Platform: Cisco IP Phone 7940, Capabilities: Host Phone Two-port Mac Relay
Device ID: SEP000D29033333
IP address: 3.3.3.3
Platform: Cisco IP Phone 7960, Capabilities: Host Phone Two-port Mac Relay

I want it to appear as a csv with 3 fields in each record..

Device ID: SEP001121111111;IP address: 1.1.1.1;Platform: Cisco IP Phone 7970, Capabilities: Host Phone Two-port Mac Relay
Device ID: SEP00115C222222;IP address: 2.2.2.2;Platform: Cisco IP Phone 7940, Capabilities: Host Phone Two-port Mac Relay
Device ID: SEP000D29033333;IP address: 3.3.3.3;Platform: Cisco IP Phone 7960, Capabilities: Host Phone Two-port Mac Relay

How do I do this in awk? If can't, any other script that can do this?

thanks in a million...

danielbmartin · 06-27-2012, 10:00 PM

Code:

#!/bin/bash
#   Daniel B. Martin   Jun12
#
#   To execute this program, launch a terminal sesson and enter:
#   bash /home/daniel/Desktop/LQfiles/dbm412.bin
#
# This program inspired by ...
# http://www.linuxquestions.org/questions/programming-9/
#  convert-row-to-column-4175413752/
 
# File identification  
InFile='/home/daniel/Desktop/LQfiles/dbm412inp.txt'
OutFile='/home/daniel/Desktop/LQfiles/dbm412out.txt'

tr '\n' ';' < $InFile  \
|sed 's/Device ID:/\nDevice ID:/g' \
|sed '1d' \
|sed 's/;$//' \
> $OutFile

echo
echo "Our work is done.  Punch out.  Go home."
echo 'Normal end of job.'
echo 'Execution ended.'
echo

exit

Daniel B. Martin

Nominal Animal · 06-27-2012, 10:11 PM

If the fields do not contain semicolons, then

Code:

awk '(tolower($1) == "device") { if (FNR > 1) printf("\n%s", $0) ; else printf("%s", $0); next } { printf(";%s", $0) } END { printf("\n") }'

should do the trick.

To explain why and how it works, here it is in broken-down form:

Code:

awk '(tolower($1) == "device") {
          if (FNR > 1)
               printf("\n%s", $0)
          else
               printf("%s", $0)
          next
     }

     {
          printf(";%s", $0)
     }

     END {
          printf("\n")
     }'

The idea is that whenever the input line starts with Device (ignoring case via converting it to lower case), it will also start a new output line. If it is the very first line of input, we know it is at the very start of the output, so we skip the newline. The next tells awk to skip the rest of the rules, and check the next input line instead.

The middle one prints a semicolon and the input line, without any newline.

The final one is only run when all input lines have been processed. It adds the newline at end.

You could also use tr and sed to get the exact same effect:

Code:

( tr '\n' ';'
  echo
) | (
  sed -e 's|;\([Dd][Ee][Vv][Ii][Cc][Ee][\t\v\f\r ]\)|\n\1|g'
  echo
)

where the first echo adds the newline at end so that all sed variants actually process the line, and the second adds the final newline to the output. Note that if you have the input in a file, you need to use tr '\n' ';' < file .

A third approach would simply replace two out of three newlines with semicolons:

Code:

awk '{ if (FNR % 3 == 0) printf("%s\n", $0) ; else printf("%s;", $0) }'

Reuti · 06-28-2012, 05:09 AM

Code:

$ split -l 3 file
$ paste -d";" -s x*
Device ID: SEP001121111111;IP address: 1.1.1.1;Platform: Cisco IP Phone 7970, Capabilities: Host Phone Two-port Mac Relay
Device ID: SEP00115C222222;IP address: 2.2.2.2;Platform: Cisco IP Phone 7940, Capabilities: Host Phone Two-port Mac Relay
Device ID: SEP000D29033333;IP address: 3.3.3.3;Platform: Cisco IP Phone 7960, Capabilities: Host Phone Two-port Mac Relay

Well, yes – the temporary files need to be removed afterwards. I miss an option in split to output the names of the created files, which could be used to remove them easily.

Nominal Animal · 06-28-2012, 06:07 AM

Quote:

Originally Posted by Reuti

I miss an option in split to output the names of the created files, which could be used to remove them easily.

How about

Code:

Patch on top of coreutils-8.17 to add --print option
to output the file names to standard output.
Use --print= or --print="" to use ASCII NUL separator.

diff -Naur old/src/split.c new/src/split.c
--- old/src/split.c	2012-05-10 11:24:16.000000000 +0300
+++ new/src/split.c	2012-06-28 13:51:35.900787821 +0300
@@ -55,6 +55,10 @@
 /* Process ID of the filter.  */
 static int filter_pid;
 
+/* File handle for printing file names. */
+static FILE *print_handle = NULL;
+static int print_delimiter = '\n';
+
 /* Array of open pipes.  */
 static int *open_pipes;
 static size_t open_pipes_alloc;
@@ -119,6 +123,7 @@
 {
   VERBOSE_OPTION = CHAR_MAX + 1,
   FILTER_OPTION,
+  PRINT_OPTION,
   IO_BLKSIZE_OPTION,
   ADDITIONAL_SUFFIX_OPTION
 };
@@ -136,6 +141,7 @@
    ADDITIONAL_SUFFIX_OPTION},
   {"numeric-suffixes", optional_argument, NULL, 'd'},
   {"filter", required_argument, NULL, FILTER_OPTION},
+  {"print", optional_argument, NULL, PRINT_OPTION},
   {"verbose", no_argument, NULL, VERBOSE_OPTION},
   {"-io-blksize", required_argument, NULL,
    IO_BLKSIZE_OPTION}, /* do not document */
@@ -219,6 +225,7 @@
   -d, --numeric-suffixes[=FROM]  use numeric suffixes instead of alphabetic.\n\
                                    FROM changes the start value (default 0).\n\
   -e, --elide-empty-files  do not generate empty output files with '-n'\n\
+      --print[=DELIMITER] print file names to standard output\n\
       --filter=COMMAND    write to shell COMMAND; file name is $FILE\n\
   -l, --lines=NUMBER      put NUMBER lines per output file\n\
   -n, --number=CHUNKS     generate CHUNKS output files.  See below\n\
@@ -358,6 +365,11 @@
 static int
 create (const char *name)
 {
+  if (print_handle)
+    {
+      fputs (name, print_handle);
+      fputc (print_delimiter, print_handle);
+    }
   if (!filter_command)
     {
       if (verbose)
@@ -1250,6 +1262,12 @@
           elide_empty_files = true;
           break;
 
+        case PRINT_OPTION:
+          print_handle = stdout;
+          if (optarg)
+             print_delimiter = optarg[0];
+          break;
+
         case FILTER_OPTION:
           filter_command = optarg;
           break;

You could always ask for something like the above to be included in coreutils. If you use split often, and miss that option, others probably miss it too. I don't use it that often, so I'm not sure if the interface is what you'd prefer, or would --print, --print0, --fprint=FILE, --fprint0=FILE make more sense. It is literally trivial to modify; I chose a very versatile way to implement it. I think print_delimiter should probably be a string, though; with NULL string indicating a single NUL byte. Just let me know, and I'd be happy to provide a patch you can push upstream.

If your comment was just idle banter, don't worry; I just hadn't looked at Coreutils for quite a while so I thought it was a perfect reason to download the latest sources and look at it.

Edited to add: Ignore the patch above. It works, but existing --filter option can trivially achieve the same functionality:

Code:

split --filter='printf "%s\n" "$FILE"; exec cat > "$FILE"' ...

Reuti · 06-28-2012, 01:29 PM

Quote:

Originally Posted by Nominal Animal

If your comment was just idle banter, don't worry; I just hadn't looked at Coreutils for quite a while so I thought it was a perfect reason to download the latest sources and look at it.

Edited to add: Ignore the patch above. It works, but existing --filter option can trivially achieve the same functionality:

Code:

split --filter='printf "%s\n" "$FILE"; exec cat > "$FILE"' ...

Seems to be a new option, I wasn’t aware of --filter. Thanks for pointing it out. Reminds of rereading certain man pages from time to time.

danielbmartin · 06-28-2012, 02:20 PM

Quote:

Originally Posted by Nominal Animal

... existing --filter option can trivially achieve the same functionality:

Code:

split --filter='printf "%s\n" "$FILE"; exec cat > "$FILE"' ...

I can't make this work. Please give more information and/or examples. Is it possible that my Ubuntu system lacks the --filter option? I can execute info coreutils 'split invocation' and don't see it.

Daniel B. Martin

Reuti · 06-28-2012, 02:32 PM

Which version are you using: split --version I compiled 8.17 and there it is present and working nicely.

danielbmartin · 06-28-2012, 03:13 PM

Quote:

Originally Posted by Reuti

Which version are you using: split --version I compiled 8.17 and there it is present and working nicely.

Compiled? Compiled what? I write a program in REXX or BASH and run it. These are not compiled, they are (to the best of my knowledge) interpreted.

Daniel B. Martin

Reuti · 06-28-2012, 03:20 PM

8.17 of coreutils.

Nominal Animal · 06-28-2012, 03:25 PM

danielbmartin, run split --version to see the version number. Coreutils-8.17 is the latest version (released in May). Reuti compiled it from sources, probably to test the patch I wrote above.

I can confirm the --filter option is available in Coreutils-8.13, split (GNU coreutils) 8.13. It is also listed in both the info and man pages for this version.

danielbmartin · 06-28-2012, 05:23 PM

Quote:

Originally Posted by Nominal Animal

... run split --version to see the version number. Coreutils-8.17 is the latest version (released in May).

Code:

daniel@daniel-desktop:~$ split --version
split (GNU coreutils) 7.4
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjörn Granlund and Richard M. Stallman.

I apply Ubuntu updates as soon as they become available. May I expect that my system will receive a new Coreutils in the near future?

Daniel B. Martin

Reuti · 06-28-2012, 05:55 PM

Quote:

Originally Posted by Nominal Animal

Reuti compiled it from sources, probably to test the patch I wrote above.

Well, no. I compiled it as you mentioned the --filter option. It has various other applications like: insert an additional line after every 3rd rows:

Code:

$ split -l 3 --filter="(cat; echo '--------') >> OUTFILE" INFILE

Even if one uses these tools for years, sometimes new options surprise me.

Reuti · 06-28-2012, 06:09 PM

And going back to the original question, there are no temporary files necessary any more:

Code:

$ split -l 3 --filter="paste -s -d ';' >> OUTFILE" INFILE

Nominal Animal · 06-29-2012, 04:10 AM

Quote:

Originally Posted by danielbmartin

May I expect that my system will receive a new Coreutils in the near future?

Not to Lucid Lynx.

Distributions tend to keep the features at the time of the release, and just backport bug fixes to the packages. It is a good strategy, and keeps the dependencies between packages stable.

Note that Lucid Lynx support ends in April 2013, so you'll have to upgrade anyway in the next year or so.