LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-06-2010, 12:09 PM   #1
gandhigaurav1986
LQ Newbie
 
Registered: Jun 2010
Posts: 2

Rep: Reputation: 0
deleting lines from a file with specific pattern using AWK


Hi,

I have a file which contains milion of records. It contains 12 columns seperated by "||" (delimeter).

First two fields contain first name and last name of a person. Now my requirement is to delete all those records from this file for which:

First two fields does not contain any alphabet.

For e.g i have below mentioned records in file:

gaurav||gandhi||123||456||789
#a%bcd||123abc||89|90||91
12345||@@@||89||123||234
***||!!!!||98||76||90



Now, last two lines should be removed from this file since first two fields does not contain any alphabet for these two records.
Please help me out on this.......
 
Old 06-06-2010, 12:25 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Hi and welcome to LinuxQuestions! If other fields does not contain alphabet characters as in your example, you can simply do:
Code:
awk '/[a-zA-Z]/' file
or using sed:
Code:
sed '/[a-zA-Z]/!d' file
otherwise you should match the two fields specifically, for example by means of something like:
Code:
awk -F"|" '$1 ~ /[a-zA-Z]/ && $3 ~ /[a-zA-Z]/' file
Hope this helps.
 
Old 06-06-2010, 08:53 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Slight adjustment to colucix's last entry as the delimeter is 2 pipes (and in case you weren't aware, you will need to redirect to a new file):
Code:
awk -F"||" '$1 ~ /[a-zA-Z]/ && $3 ~ /[a-zA-Z]/' file > new_file
 
Old 06-06-2010, 10:28 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Does that work ?. And if it does, wouldn't that be $2 ?.
 
Old 06-07-2010, 12:51 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Quote:
Does that work ?. And if it does, wouldn't that be $2 ?.
Seems in my haste I should have done a little testing
Code:
awk -F"[|][|]" '$1 ~ /[a-zA-Z]/ && $2 ~ /[a-zA-Z]/' file > new_file

Last edited by grail; 06-07-2010 at 12:53 AM.
 
Old 06-07-2010, 01:06 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Actually I used a single pipe as delimiter and $3 to match the second field ($2 was the null string between the first two pipes).
 
Old 06-07-2010, 01:23 AM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
My comment was directed at @grail post, not yours @colucix.
I'll be more specific in future ...
 
Old 06-07-2010, 02:28 AM   #8
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Mine too.

For the sake of the OP, if he will ever pop up again, the field separator in awk can be either a single character or a regular expression. Two or more characters have the side effect to set FS to the last one specified.

In the second example posted by grail the presence of two character lists [...] force awk to interpret it as a regular expression, so that you can actually use two consecutive pipes as field separator.

Cheers!
 
Old 06-07-2010, 03:11 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
yes ... yes ... shoot me down .. lol

@colucix - thanks for the explanation
 
Old 06-07-2010, 04:34 AM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
o.k., let's continue the education (mine).
Why is "[|][|]" considered regex (in this context) but [||] isn't - [||]+ works. (remember I'm still coming to terms with awk).
 
Old 06-07-2010, 09:58 AM   #11
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by syg00 View Post
o.k., let's continue the education (mine).
Why is "[|][|]" considered regex (in this context) but [||] isn't - [||]+ works. (remember I'm still coming to terms with awk).
Actually both are considered regexp, but [||] is a character list that means "match a single character, be it either | or |" (not needed redundancy). Instead [||]+ (which is the same as [|]+) matches one or more occurrences of the character, as in extended regular expressions. The grail's solution
Code:
[|][|]
matches exactly two consecutive characters, each one taken from a character list.

The same if you use something like
Code:
[|&;][|&;]
that matches any of these combinations:
Code:
||   |&   |;   &&   &|   &;   ;;   ;|   ;&
 
Old 06-07-2010, 10:30 PM   #12
gandhigaurav1986
LQ Newbie
 
Registered: Jun 2010
Posts: 2

Original Poster
Rep: Reputation: 0
Thanks a lot guys.... my problem is solved now
 
Old 06-08-2010, 02:08 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Quote:
my problem is solved now
Don't forget to mark as SOLVED then
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Replace pattern in specific lines and column with AWK cgcamal Programming 10 04-26-2010 01:11 AM
Text file manipulation: selecting specific lines/columns using awk and print CHARL0TTE Linux - Newbie 2 02-27-2010 02:40 AM
awk loops and deleting lines skray Programming 5 06-08-2009 11:58 AM
Get all lines containing 23 specific words with AWK cgcamal Programming 3 11-05-2008 10:51 AM
awk print lines that doesn't have a pattern huynguye Programming 5 05-04-2006 11:08 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration