LinuxQuestions.org - Manipulate data

LinuxQuestions.org (/questions/)

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Manipulate data (https://www.linuxquestions.org/questions/linux-newbie-8/manipulate-data-4175424352/)

arn2025

08-28-2012 02:03 AM

Manipulate data

i have a long text file with
[QUOTE]

a 1 200 10-22-2012
a 2 350 11-20-2012
a 3 222 12-16-2012
b 1 123 01-17-2014
b 2 345
b 3 432 02-02-2012
c 1 675
c 2 0 07-09-2012
c 3 12778 03-08-2012

[/QUOTE]

how do i change the file to

Quote:

n 1 2 3 D1 D2 D3
a 200 350 222 10-22-2012 10-20-2012 12-16-2012
b 123 345 432 01-17-2014 02-02-2012
c 675 0 12778 07-09-2012 03-08-2012

Snark1994

08-28-2012 05:00 AM

I'm afraid what you've given us is nothing but a long list of numbers, letters and dates. How are you going from one set of data to the other? None of my guesses seem to be consistent with the data you've given.

In any case, it looks like it's going to be a more complicated parsing job than just a simple sed/awk script (though I'm sure it could be done that way) so you'd be better off with python or perl or something like that.

arn2025

08-28-2012 05:11 AM

please note the pattern in the first column the conetent is the same, i just want to have all the fields of that column fow which its the same in the same row

Snark1994

08-28-2012 05:15 AM

Ah, is the 2nd date on line 2 now meant to be 11-20-2012? If so, I'll have a look at it.

Are all the 'a' lines going to be consecutive (same for b, c, etc.)? Are spaces always the delimiter?

EDIT: If I was correct about all the above guesses, then I think this programme does what you want:

Code:

#!/usr/bin/env python3



from sys import argv



if len(argv) != 2:

    print("Usage:",argv[0],"<infile>")

    exit(1)

try:

    infile = open(argv[1],"r")

except IOError:

    print("Error:",argv[1],"doesn't exist.")

    exit(2)

print("n 1 2 3 D1 D2 D3")

token = None

numbers = []

dates = []

for line in infile:

    line = line.split()

    if line[0] != token:

        if token != None:

            print(token,' '.join(numbers),' '.join(dates))

        token = line[0]

        numbers = []

        dates = []

    try:

        numbers.append(line[2])

    except IndexError:

        pass

    try:

        dates.append(line[3])

    except IndexError:

        pass

print(token,' '.join(numbers),' '.join(dates))

infile.close()

arn2025

08-28-2012 05:25 AM

sorry, i have corrected that, yess they are all going to be consective and spaces are the delimeters, its just a long list with d's e's and so on

Snark1994

08-28-2012 05:31 AM

Ah, sorry, didn't see your latest post - I have edited my last post to include some code that I believe does what you want.

Hope this helps,

arn2025

08-28-2012 06:49 AM

thaks though wen i look at the code it seems to suggest the first column changeds in 3's whereas at some points it changes after four of 5, ie 5 a's it could be

cristalp

08-28-2012 08:29 AM

It can be achieved simply by a piece of AWK code. I believe it would be much better than the tedious and heavy python codes.

And, it is more general than the previous solution from Snark1994, since here the code can generate the first line according to the file itself rather than specifying it manually as "n 1 2 3 D1 D2 D3".

It dose not matter if you change your file to include more than 3 lines for a, b or c. The code bellow are able to adjust it according to the input file.

Code:

awk '!/^$/{

 f1=$2

 a[$2]=a[$2]$4" "

 b[$2]=b[$2]$5" "

 c[$2]=c[$2]$3" "

 split(c[f1],d," ")

 

}

END {

 printf "1. n "c[f1]

 for (i in d) printf "D"d[i]" "

 printf "\n"

 for (i in a) {

  k++

  printf k+1". "i " "a[i]b[i]

  printf "\n"}

}

' YOURINPUTFILE

grail

08-28-2012 09:53 AM

Well I am glad everyone else picked up on the fact that the data included a header ... had me baffled :(
Anyhoo:

Code:

ruby -ane 'BEGIN{l=[]};if ! l.empty? && l[0] != $F[0]; puts l.join(" ");l.clear;else l<<$F[0] if l.empty?;l<<$F.last;l.insert((l.count/2).ceil,$F[2]);end' file

Snark1994

08-29-2012 04:05 AM

It always makes me cry on the inside a little when you do that, grail.

Sorry, arn2025, I didn't quite understand where you got your header line from - my code would work fine (it would print 4 or 5 numbers on the line) but it would only put "1 2 3" in the header. It would be easy to change it to do this properly too, but seeing as you've got two other solutions, I'll leave you with those.

(Also, if you look, neither grail nor I found it easy to work out what exactly you needed doing - even when the gist of it was clear, the fact that you might get 4 or 5 similar rows, or the presence of a header, was not obvious. Next time you post a thread, perhaps give a bit of thought to explaining what you want clearly and precisely at the start. Just a heads up, I hope I'm not lecturing :) )

If you consider this problem to be solved, can you mark the thread as 'SOLVED' please? Thank you.

grail

08-29-2012 04:12 AM

Quote:

It always makes me cry on the inside a little when you do that, grail.

Get baffled? Happens to me all the time with how some of the questions are phrased :)

Snark1994

08-30-2012 03:59 AM

Quote:

Originally Posted by grail (Post 4767181)

Get baffled? Happens to me all the time with how some of the questions are phrased :)

Goodness no, I meant post a completely cryptic one-liner which baffles me, after I write a 20-line script to do it :L

grail

08-30-2012 09:51 AM

Quote:

Goodness no, I meant post a completely cryptic one-liner which baffles me, after I write a 20-line script to do it :L

Well I did start to teach myself python3 a while back, and do still quite enjoy its benefits, but since having been shown a little bit of ruby from another
LQ member I got stuck in and really enjoy it :)

You are right though, I need to remember to explain them a bit better :(

Code:

-a - Split read lines using the default delimiter into the global array $F

-n - Read in a file

-e - The following is a script to be interpreted



BEGIN{l=[]} - initialize the 'l' array (BEGIN here is the same as awk, ie only read once



if ! l.empty? && l[0] != $F[0] # l array not empty and first element in l and $F arrays not equal

  puts l.join(" ")            # Display the contents of l array separated by a space

  l.clear                      # Reset l array

else 

  l<<$F[0] if l.empty?        # like perl simple tests may come after the action. << means append to array

  l<<$F.last

  l.insert((l.count/2).ceil,$F[2]) # insert an item at given position. ceil is to round up to the nearest whole number

end

Hope that helps explain a bit better :)

Snark1994

08-30-2012 10:05 AM

Mm, I do like ruby, mostly because of its support of functional programming - it bridged the gap between haskell and python, because I got to do a lot of the neat haskell tricks without having the hassle of very strict type checking and pure functions.

Yeah, definitely very nifty code :)

@cristalp, have we solved your problem? If so, please remember to mark the thread as solved.

All times are GMT -5. The time now is 02:59 AM.