extracting data from a csv file
hi all,
i attach the csv that im trying to extract some data out of it i have done this command - awk -F',' '{print $1,$4,$5,$6}' 16-02-2017_XSAreport.csv and it works awesome, only problem is i want to miss out rows 1, 2 and 3, but im struggling abit https://s24.postimg.org/ywixdljb9/data.png cheers, rob |
The hint you are probably looking for is the built-in variable FNR or NR.
Keep checking the manual page for awk from time to time. More and more will stand out and make sense, eventually. |
Note that your simple parsing will work for this file, but CSV files can also contain quoted strings with embedded field separator characters, so generalized parsing is quite a bit more complex.
|
and for a quick'n dirty one-off, there's always
Code:
tail -n +4 16-02-2017_XSAreport.csv | awk -F',' '{print $1,$4,$5,$6}' |
awk -F',' 'NR>3 {print $1,$4,$5,$6}' 16-02-2017_XSAreport.csv
|
now i want to work out the difference between 2 csv reports i have run the awk command on
using the awk command on the csv i gave you i get these results - Reception 0 0 0 Runners 0 0 0 molinare 136030 46301 8084 Carnaby 3 44 41 58 Sales 4 1 1 Xerox Administrative Group 0 0 0 Bookings 151 36 71 Accounts 135831 46223 7954 IT Admin 0 0 0 XRX_DEF 0 0 0 Admin 0 0 0 Diagnostics 0 0 0 Print Exceptions Group 0 0 0 Guest 0 0 0 so lets say i have this days report and i produce another report (using the csv and running an awk command) how do i work out the differences for the customer "carnaby 3" so report 1 is 44 41 58 and lets say report 2 is 48 45 60 how do i show the difference ie 4 4 2 many thanks, rob |
mmm...
got abit of a problem and i dont know why this is happening im doing this command and this is the result i get - #awk -F',' 'NR>3 {print $1,$4,$5,$6}' 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv Reception 0 0 0 Runners 0 0 0 molinare 135986 46262 8026 Carnaby 3 0 2 0 Sales 4 1 1 Xerox Administrative Group 0 0 0 Bookings 151 36 71 Accounts 135831 46223 7954 IT Admin 0 0 0 XRX_DEF 0 0 0 Admin 0 0 0 Diagnostics 0 0 0 Print Exceptions Group 0 0 0 Guest 0 0 0 Account Name Counter: Total Black + Level 1 Copy and Print Impressions Counter: Total Color Level 2 Copy and Print Impressions Counter: Total Color Level 3 Copy and Print Impressions Customer Service Engineer Account 0 0 0 System User 316 129 23 Reception 0 0 0 Runners 0 0 0 molinare 136030 46301 8084 Carnaby 3 44 41 58 Sales 4 1 1 Xerox Administrative Group 0 0 0 Bookings 151 36 71 Accounts 135831 46223 7954 IT Admin 0 0 0 XRX_DEF 0 0 0 Admin 0 0 0 Diagnostics 0 0 0 Print Exceptions Group 0 0 0 Guest 0 0 0 but the problem is why on the second file is it not doing the NR>3 ie its not missing the first 3 lines? |
Quote:
|
yes im feeding both files in the same command
|
resolved it -
awk -F',' 'FNR>3 {print $1,$4,$5,$6}' 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv |
Good -now you (hopefully) understand NR and FNR you can use them to determine if you are currently reading the first file or not.
If so, you will need to save the data you are interested in (or whole record) for comparison when you get to the second file. Associative arrays are made for things like this. |
no sorry i dont really understand the difference, sorry
now i need a way of finding out the differences between the 3 values of the customer "carnaby 3" |
now i want to calculate the difference between the values in columns 4, 5, 6 in both files and put the output in a new file, so like this
carnaby 3 0 2 0 carnaby 3 44 41 58 so the output file would look like this ie how many prints the client has done between the dates 26-01 - 16-02 carnaby 3 44 39 58 is this going to be easy to do please many thanks, rob |
As you will be doing this regularly (apparently), you should write a proper program.
syg00 can prob help if you want to stick with awk. I would use Perl with the Text::CSV module to avoid the potential issue noted in post #3. |
i feel so proud!
done it via paste and awk command paste 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv | awk -F, 'FNR>3 {print $1,$4,$5,$6,$24,$25,$26}' the reason why i have made loads of print commands is because the csv files consist of 21 columns each so thats why so 24,25,26 is the other columns in the other csv file and the layout looks like this - Reception 0 0 0 0 0 0 Runners 0 0 0 0 0 0 molinare 135986 46262 8026 136030 46301 8084 Carnaby 3 0 2 0 44 41 58 Sales 4 1 1 4 1 1 Xerox Administrative Group 0 0 0 0 0 0 Bookings 151 36 71 151 36 71 Accounts 135831 46223 7954 135831 46223 7954 IT Admin 0 0 0 0 0 0 XRX_DEF 0 0 0 0 0 0 Admin 0 0 0 0 0 0 Diagnostics 0 0 0 0 0 0 Print Exceptions Group 0 0 0 0 0 0 Guest 0 0 0 0 0 0 |
All times are GMT -5. The time now is 06:04 AM. |