LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   CentOS (https://www.linuxquestions.org/questions/centos-111/)
-   -   extracting data from a csv file (https://www.linuxquestions.org/questions/centos-111/extracting-data-from-a-csv-file-4175599907/)

robertkwild 02-16-2017 11:01 AM

extracting data from a csv file
 
hi all,

i attach the csv that im trying to extract some data out of it

i have done this command -

awk -F',' '{print $1,$4,$5,$6}' 16-02-2017_XSAreport.csv

and it works awesome, only problem is i want to miss out rows 1, 2 and 3, but im struggling abit

https://s24.postimg.org/ywixdljb9/data.png

cheers,

rob

Turbocapitalist 02-16-2017 11:08 AM

The hint you are probably looking for is the built-in variable FNR or NR.

Keep checking the manual page for awk from time to time. More and more will stand out and make sense, eventually.

rknichols 02-16-2017 12:47 PM

Note that your simple parsing will work for this file, but CSV files can also contain quoted strings with embedded field separator characters, so generalized parsing is quite a bit more complex.

jerryq 02-16-2017 01:38 PM

and for a quick'n dirty one-off, there's always
Code:

tail -n +4 16-02-2017_XSAreport.csv | awk -F',' '{print $1,$4,$5,$6}'

robertkwild 02-17-2017 03:20 AM

awk -F',' 'NR>3 {print $1,$4,$5,$6}' 16-02-2017_XSAreport.csv

robertkwild 02-17-2017 03:35 AM

now i want to work out the difference between 2 csv reports i have run the awk command on

using the awk command on the csv i gave you i get these results -

Reception 0 0 0
Runners 0 0 0
molinare 136030 46301 8084
Carnaby 3 44 41 58
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0

so lets say i have this days report and i produce another report (using the csv and running an awk command)

how do i work out the differences for the customer "carnaby 3" so report 1 is 44 41 58 and lets say report 2 is 48 45 60

how do i show the difference ie 4 4 2

many thanks,

rob

robertkwild 02-17-2017 05:26 AM

mmm...

got abit of a problem and i dont know why this is happening

im doing this command and this is the result i get -

#awk -F',' 'NR>3 {print $1,$4,$5,$6}' 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv
Reception 0 0 0
Runners 0 0 0
molinare 135986 46262 8026
Carnaby 3 0 2 0
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0
Account Name Counter: Total Black + Level 1 Copy and Print Impressions Counter: Total Color Level 2 Copy and Print Impressions Counter: Total Color Level 3 Copy and Print Impressions
Customer Service Engineer Account 0 0 0
System User 316 129 23
Reception 0 0 0
Runners 0 0 0
molinare 136030 46301 8084
Carnaby 3 44 41 58
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0

but the problem is why on the second file is it not doing the NR>3 ie its not missing the first 3 lines?

Turbocapitalist 02-17-2017 05:28 AM

Quote:

Originally Posted by robertkwild (Post 5671957)
but the problem is why on the second file is it not doing the NR>3 ie its not missing the first 3 lines?

Are you feeding both in at once? Check the difference between FNR and NR in the manual page. You may want the former and not the latter.

robertkwild 02-17-2017 06:04 AM

yes im feeding both files in the same command

robertkwild 02-17-2017 06:09 AM

resolved it -

awk -F',' 'FNR>3 {print $1,$4,$5,$6}' 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv

syg00 02-17-2017 06:36 AM

Good -now you (hopefully) understand NR and FNR you can use them to determine if you are currently reading the first file or not.
If so, you will need to save the data you are interested in (or whole record) for comparison when you get to the second file. Associative arrays are made for things like this.

robertkwild 02-17-2017 06:47 AM

no sorry i dont really understand the difference, sorry

now i need a way of finding out the differences between the 3 values of the customer "carnaby 3"

robertkwild 02-17-2017 08:22 AM

now i want to calculate the difference between the values in columns 4, 5, 6 in both files and put the output in a new file, so like this

carnaby 3 0 2 0

carnaby 3 44 41 58

so the output file would look like this ie how many prints the client has done between the dates 26-01 - 16-02

carnaby 3 44 39 58

is this going to be easy to do please

many thanks,

rob

chrism01 02-21-2017 03:25 AM

As you will be doing this regularly (apparently), you should write a proper program.
syg00 can prob help if you want to stick with awk.
I would use Perl with the Text::CSV module to avoid the potential issue noted in post #3.

robertkwild 02-21-2017 08:38 AM

i feel so proud!

done it via paste and awk command

paste 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv | awk -F, 'FNR>3 {print $1,$4,$5,$6,$24,$25,$26}'

the reason why i have made loads of print commands is because the csv files consist of 21 columns each so thats why so 24,25,26 is the other columns in the other csv file

and the layout looks like this -

Reception 0 0 0 0 0 0
Runners 0 0 0 0 0 0
molinare 135986 46262 8026 136030 46301 8084
Carnaby 3 0 2 0 44 41 58
Sales 4 1 1 4 1 1
Xerox Administrative Group 0 0 0 0 0 0
Bookings 151 36 71 151 36 71
Accounts 135831 46223 7954 135831 46223 7954
IT Admin 0 0 0 0 0 0
XRX_DEF 0 0 0 0 0 0
Admin 0 0 0 0 0 0
Diagnostics 0 0 0 0 0 0
Print Exceptions Group 0 0 0 0 0 0
Guest 0 0 0 0 0 0


All times are GMT -5. The time now is 06:04 AM.