extracting data from a csv file

robertkwild · 02-16-2017, 11:01 AM

hi all,

i attach the csv that im trying to extract some data out of it

i have done this command -

awk -F',' '{print $1,$4,$5,$6}' 16-02-2017_XSAreport.csv

and it works awesome, only problem is i want to miss out rows 1, 2 and 3, but im struggling abit

https://s24.postimg.org/ywixdljb9/data.png

cheers,

rob

Turbocapitalist · 02-16-2017, 11:08 AM

The hint you are probably looking for is the built-in variable FNR or NR.

Keep checking the manual page for awk from time to time. More and more will stand out and make sense, eventually.

rknichols · 02-16-2017, 12:47 PM

Note that your simple parsing will work for this file, but CSV files can also contain quoted strings with embedded field separator characters, so generalized parsing is quite a bit more complex.

jerryq · 02-16-2017, 01:38 PM

and for a quick'n dirty one-off, there's always

Code:

tail -n +4 16-02-2017_XSAreport.csv | awk -F',' '{print $1,$4,$5,$6}'

robertkwild · 02-17-2017, 03:20 AM

awk -F',' 'NR>3 {print $1,$4,$5,$6}' 16-02-2017_XSAreport.csv

robertkwild · 02-17-2017, 03:35 AM

now i want to work out the difference between 2 csv reports i have run the awk command on

using the awk command on the csv i gave you i get these results -

Reception 0 0 0
Runners 0 0 0
molinare 136030 46301 8084
Carnaby 3 44 41 58
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0

so lets say i have this days report and i produce another report (using the csv and running an awk command)

how do i work out the differences for the customer "carnaby 3" so report 1 is 44 41 58 and lets say report 2 is 48 45 60

how do i show the difference ie 4 4 2

many thanks,

rob

robertkwild · 02-17-2017, 05:26 AM

mmm...

got abit of a problem and i dont know why this is happening

im doing this command and this is the result i get -

#awk -F',' 'NR>3 {print $1,$4,$5,$6}' 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv
Reception 0 0 0
Runners 0 0 0
molinare 135986 46262 8026
Carnaby 3 0 2 0
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0
Account Name Counter: Total Black + Level 1 Copy and Print Impressions Counter: Total Color Level 2 Copy and Print Impressions Counter: Total Color Level 3 Copy and Print Impressions
Customer Service Engineer Account 0 0 0
System User 316 129 23
Reception 0 0 0
Runners 0 0 0
molinare 136030 46301 8084
Carnaby 3 44 41 58
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0

but the problem is why on the second file is it not doing the NR>3 ie its not missing the first 3 lines?

Turbocapitalist · 02-17-2017, 05:28 AM

Quote:

Originally Posted by robertkwild

but the problem is why on the second file is it not doing the NR>3 ie its not missing the first 3 lines?

Are you feeding both in at once? Check the difference between FNR and NR in the manual page. You may want the former and not the latter.

robertkwild · 02-17-2017, 06:04 AM

yes im feeding both files in the same command

robertkwild · 02-17-2017, 06:09 AM

resolved it -

awk -F',' 'FNR>3 {print $1,$4,$5,$6}' 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv

syg00 · 02-17-2017, 06:36 AM

Good -now you (hopefully) understand NR and FNR you can use them to determine if you are currently reading the first file or not.
If so, you will need to save the data you are interested in (or whole record) for comparison when you get to the second file. Associative arrays are made for things like this.

robertkwild · 02-17-2017, 06:47 AM

no sorry i dont really understand the difference, sorry

now i need a way of finding out the differences between the 3 values of the customer "carnaby 3"

robertkwild · 02-17-2017, 08:22 AM

now i want to calculate the difference between the values in columns 4, 5, 6 in both files and put the output in a new file, so like this

carnaby 3 0 2 0

carnaby 3 44 41 58

so the output file would look like this ie how many prints the client has done between the dates 26-01 - 16-02

carnaby 3 44 39 58

is this going to be easy to do please

many thanks,

rob

chrism01 · 02-21-2017, 03:25 AM

As you will be doing this regularly (apparently), you should write a proper program.
syg00 can prob help if you want to stick with awk.
I would use Perl with the Text::CSV module to avoid the potential issue noted in post #3.

robertkwild · 02-21-2017, 08:38 AM

i feel so proud!

done it via paste and awk command

paste 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv | awk -F, 'FNR>3 {print $1,$4,$5,$6,$24,$25,$26}'

the reason why i have made loads of print commands is because the csv files consist of 21 columns each so thats why so 24,25,26 is the other columns in the other csv file

and the layout looks like this -

Reception 0 0 0 0 0 0
Runners 0 0 0 0 0 0
molinare 135986 46262 8026 136030 46301 8084
Carnaby 3 0 2 0 44 41 58
Sales 4 1 1 4 1 1
Xerox Administrative Group 0 0 0 0 0 0
Bookings 151 36 71 151 36 71
Accounts 135831 46223 7954 135831 46223 7954
IT Admin 0 0 0 0 0 0
XRX_DEF 0 0 0 0 0 0
Admin 0 0 0 0 0 0
Diagnostics 0 0 0 0 0 0
Print Exceptions Group 0 0 0 0 0 0
Guest 0 0 0 0 0 0