LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > CentOS
User Name
Password
CentOS This forum is for the discussion of CentOS Linux. Note: This forum does not have any official participation.

Notices


Reply
  Search this Thread
Old 02-16-2017, 11:01 AM   #1
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Rep: Reputation: Disabled
extracting data from a csv file


hi all,

i attach the csv that im trying to extract some data out of it

i have done this command -

awk -F',' '{print $1,$4,$5,$6}' 16-02-2017_XSAreport.csv

and it works awesome, only problem is i want to miss out rows 1, 2 and 3, but im struggling abit

https://s24.postimg.org/ywixdljb9/data.png

cheers,

rob
 
Old 02-16-2017, 11:08 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,333
Blog Entries: 3

Rep: Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730
The hint you are probably looking for is the built-in variable FNR or NR.

Keep checking the manual page for awk from time to time. More and more will stand out and make sense, eventually.

Last edited by Turbocapitalist; 02-16-2017 at 11:12 AM. Reason: grammar
 
Old 02-16-2017, 12:47 PM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,783

Rep: Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214
Note that your simple parsing will work for this file, but CSV files can also contain quoted strings with embedded field separator characters, so generalized parsing is quite a bit more complex.
 
Old 02-16-2017, 01:38 PM   #4
jerryq
LQ Newbie
 
Registered: Mar 2007
Posts: 4

Rep: Reputation: 0
and for a quick'n dirty one-off, there's always
Code:
tail -n +4 16-02-2017_XSAreport.csv | awk -F',' '{print $1,$4,$5,$6}'
 
Old 02-17-2017, 03:20 AM   #5
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Original Poster
Rep: Reputation: Disabled
awk -F',' 'NR>3 {print $1,$4,$5,$6}' 16-02-2017_XSAreport.csv
 
Old 02-17-2017, 03:35 AM   #6
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Original Poster
Rep: Reputation: Disabled
now i want to work out the difference between 2 csv reports i have run the awk command on

using the awk command on the csv i gave you i get these results -

Reception 0 0 0
Runners 0 0 0
molinare 136030 46301 8084
Carnaby 3 44 41 58
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0

so lets say i have this days report and i produce another report (using the csv and running an awk command)

how do i work out the differences for the customer "carnaby 3" so report 1 is 44 41 58 and lets say report 2 is 48 45 60

how do i show the difference ie 4 4 2

many thanks,

rob

Last edited by robertkwild; 02-17-2017 at 03:39 AM.
 
Old 02-17-2017, 05:26 AM   #7
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Original Poster
Rep: Reputation: Disabled
mmm...

got abit of a problem and i dont know why this is happening

im doing this command and this is the result i get -

#awk -F',' 'NR>3 {print $1,$4,$5,$6}' 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv
Reception 0 0 0
Runners 0 0 0
molinare 135986 46262 8026
Carnaby 3 0 2 0
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0
Account Name Counter: Total Black + Level 1 Copy and Print Impressions Counter: Total Color Level 2 Copy and Print Impressions Counter: Total Color Level 3 Copy and Print Impressions
Customer Service Engineer Account 0 0 0
System User 316 129 23
Reception 0 0 0
Runners 0 0 0
molinare 136030 46301 8084
Carnaby 3 44 41 58
Sales 4 1 1
Xerox Administrative Group 0 0 0
Bookings 151 36 71
Accounts 135831 46223 7954
IT Admin 0 0 0
XRX_DEF 0 0 0
Admin 0 0 0
Diagnostics 0 0 0
Print Exceptions Group 0 0 0
Guest 0 0 0

but the problem is why on the second file is it not doing the NR>3 ie its not missing the first 3 lines?
 
Old 02-17-2017, 05:28 AM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,333
Blog Entries: 3

Rep: Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730
Quote:
Originally Posted by robertkwild View Post
but the problem is why on the second file is it not doing the NR>3 ie its not missing the first 3 lines?
Are you feeding both in at once? Check the difference between FNR and NR in the manual page. You may want the former and not the latter.
 
Old 02-17-2017, 06:04 AM   #9
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Original Poster
Rep: Reputation: Disabled
yes im feeding both files in the same command
 
Old 02-17-2017, 06:09 AM   #10
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Original Poster
Rep: Reputation: Disabled
resolved it -

awk -F',' 'FNR>3 {print $1,$4,$5,$6}' 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv
 
Old 02-17-2017, 06:36 AM   #11
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,144

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Good -now you (hopefully) understand NR and FNR you can use them to determine if you are currently reading the first file or not.
If so, you will need to save the data you are interested in (or whole record) for comparison when you get to the second file. Associative arrays are made for things like this.
 
Old 02-17-2017, 06:47 AM   #12
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Original Poster
Rep: Reputation: Disabled
no sorry i dont really understand the difference, sorry

now i need a way of finding out the differences between the 3 values of the customer "carnaby 3"
 
Old 02-17-2017, 08:22 AM   #13
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Original Poster
Rep: Reputation: Disabled
now i want to calculate the difference between the values in columns 4, 5, 6 in both files and put the output in a new file, so like this

carnaby 3 0 2 0

carnaby 3 44 41 58

so the output file would look like this ie how many prints the client has done between the dates 26-01 - 16-02

carnaby 3 44 39 58

is this going to be easy to do please

many thanks,

rob
 
Old 02-21-2017, 03:25 AM   #14
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,364

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
As you will be doing this regularly (apparently), you should write a proper program.
syg00 can prob help if you want to stick with awk.
I would use Perl with the Text::CSV module to avoid the potential issue noted in post #3.
 
Old 02-21-2017, 08:38 AM   #15
robertkwild
Member
 
Registered: Feb 2015
Posts: 382

Original Poster
Rep: Reputation: Disabled
i feel so proud!

done it via paste and awk command

paste 26-01-2017_XSAreport.csv 16-02-2017_XSAreport.csv | awk -F, 'FNR>3 {print $1,$4,$5,$6,$24,$25,$26}'

the reason why i have made loads of print commands is because the csv files consist of 21 columns each so thats why so 24,25,26 is the other columns in the other csv file

and the layout looks like this -

Reception 0 0 0 0 0 0
Runners 0 0 0 0 0 0
molinare 135986 46262 8026 136030 46301 8084
Carnaby 3 0 2 0 44 41 58
Sales 4 1 1 4 1 1
Xerox Administrative Group 0 0 0 0 0 0
Bookings 151 36 71 151 36 71
Accounts 135831 46223 7954 135831 46223 7954
IT Admin 0 0 0 0 0 0
XRX_DEF 0 0 0 0 0 0
Admin 0 0 0 0 0 0
Diagnostics 0 0 0 0 0 0
Print Exceptions Group 0 0 0 0 0 0
Guest 0 0 0 0 0 0
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Language C: Extracting fields into a CSV file Xeratul Programming 3 01-22-2013 10:13 AM
How to format .csv file with new data rhbegin Programming 6 11-18-2011 01:43 PM
Parsing data and generating a CSV file Striketh Programming 4 11-04-2011 07:15 AM
Need help on writing data to csv file tucs_123 Linux - Newbie 12 12-29-2008 03:42 AM
help extracting data from csv file willinusf Linux - General 10 10-27-2006 09:10 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > CentOS

All times are GMT -5. The time now is 03:13 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration