LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-22-2010, 10:46 PM   #1
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Rep: Reputation: 16
Migrate Regexp from SED to AWK


Hi guys,

I have a sed script to search and replace a pattern on the next kind of text:


Code:
C/username/Mydocuments/games & music
C/username/Mydocuments/New files 09-17-2007
C/username/settings
The script is:
Code:
sed 's/\([^/]*\/[^/]*\/\).*$/\1New String/g' inputfile
The script search strings with 3 or 2 subfolders level and replaces with "New string "as follow:
If has 3 subfolders, the script replaces last 2 subfolders
from:
Code:
C/username/Mydocuments/games & music
C/username/Mydocuments/New files 09-17-2007

to
Code:
C/username/New String
C/username/New String

If has 2 subfolders, the script replaces last subfolder
from
Code:
C/username/settings
to
Code:
C/username/New String
The last directory strings belongs to column 3 within inputfile, and I would like to
do the same job(search and replace on 3rd column) with awk, but using the same regex it doesn´t work.


I´ve tried two ways, in both get errors:
Code:
1-) awk -F"|" '{gsub(\([^/]*\/[^/]*\/\).*$,"New string",$3};print}' infile
awk: {gsub(\(.*\/\)[^/]*$,"New string",$3};print}
awk:       ^ backslash not last character on line
and
2-) awk -F"|" '{gsub(([^/]*\/[^/]*\/).*$,"New  string",$3};print}' infile
awk: {gsub(([^/]*\/[^/]*\/).*$,"New string",$3};print}
awk:        ^ syntax error
awk: fatal: Invalid regular expression: /]*\/[^/

Maybe somebody give me some help, in which would be the correct regexp to include with awk script to do the same task.


*I´m using Cygwin.

Thanks in advance,

Regards,
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 04-23-2010, 12:39 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
If more than 3 exists we may have to rethink, but this works on example provided:
Code:
awk 'BEGIN{OFS=FS="/"}{print $1,$2,"New String"}' input_file
 
1 members found this post helpful.
Old 04-23-2010, 01:30 AM   #3
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
Hi grail, thanks for your reply.

Actually I´m trying to do it now using awk and its gsub function, because the file now has 40 columns separeted by "|", and I would like to operate only in 3erd column.

A sample would be as follow:

Code:
data1 column1|data1 column2|C/username/Mydocuments/games & music|data1 column4
data2 column1|data2 column2|C/username/Mydocuments/New files 09-17-2007|data2 column4
data3 column1|data3 column2|C/username/settings|data3 column4
I´m trying with the next awk script
Code:
awk -F"|" -v OFS="|" '{gsub(/\([^/]*\/[^/]*\/\).*$/,"New string",$3); print $0}' inputfile
But doesn´t seem o work.

the desired output is:

Code:
data1 column1|data1 column2|C/username/New string|data1 column4
data2 column1|data2 column2|C/username/New string|data2 column4
data3 column1|data3 column2|C/username/New string|data3 column4
Maybe somebody give some help about this.

Many thanks in advance.
 
Old 04-23-2010, 01:59 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Well its not pretty yet but i came up with two alternatives:
Code:
awk 'BEGIN{OFS=FS="|";str="New String"}{gsub(/.*\//,"&"str,$3);gsub(str".*",str,$3)}1' inputfile

or

awk 'BEGIN{OFS=FS="|";str="New String"}{split($3,arr,"/");$3=arr[1]"/"arr[2]"/"str}1' inputfile
 
2 members found this post helpful.
Old 04-23-2010, 02:49 AM   #5
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
Thumbs up

grail,

Simply perfect!.

The second option works even better that I was looking for.

Many thanks really.
 
Old 04-23-2010, 03:03 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Another option, more similar to the sed command, is by means of gensub:
Code:
awk 'BEGIN{OFS=FS="|"}{$3 = gensub(/([^/]*\/[^/]*\/).*$/,"\\1New String",1,$3)}1' inputfile
the only difference is that you must not escape the parentheses and the matched text is evaluated as \\1 in the replacement. Note that this function is available only in GNU awk: if compatibility to other awk implementations is an issue, you cannot use gensub.
 
2 members found this post helpful.
Old 04-23-2010, 04:20 AM   #7
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
Hi colucix,

Thanks for share that and for your explanation, I didn´t know about gensub, and now I understand better why the same regexp wasn´t working in awk for me. It works nice in my cygwin.

One question,

Running your script, but replace the pattern with nothing (""), remains a "/" at the end of 3rd column, I now how to delete it with sed using "sed 's/\/$//'". But only if is applied to last column, otherwise the rest of columns are deleted. In order to do not affect other columns information, is better do it with awk I think.

What code is needed to add to your script to delete the remaining "/" when I replace the pattern with "" nothing?

Thanks in advance.
 
Old 04-23-2010, 04:32 AM   #8
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Well.. you can try a simple sub function whose regexp matches a trailing / (if any):
Code:
 awk 'BEGIN{OFS=FS="|"}{$3 = gensub(/([^/]*\/[^/]*\/).*$/,"\\1",1,$3); sub(/\/$/,"",$3)}1' inputfile
 
2 members found this post helpful.
Old 04-23-2010, 02:51 PM   #9
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
Thanks, thanks colucix. Works great. Something more I´ve learned today!

One more question, maybe somebody knows.

What kind of regexp are used in command shell, for instance, used by awk or sed? Is Perl style or another variant?

I ask this because the regexp used in this problem certainly works with sed and with a little modfication in awk too, but when I test the regexp in a freeware utility like RegExr to test regexp expressions, this regexp doesn´t work.

Thanks in advance for your help.

Regards,
 
Old 04-23-2010, 10:32 PM   #10
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,691
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
Regular-expression libraries do (unfortunately...) vary somewhat, although most of them have followed Perl's lead.

Speaking of Perl... It certainly seems to me that Perl might be "the cat's meow" in this situation. It might well be "the right tool for the job."

In the Unix|Linux environments, you have "an embarrassment of riches" with regard to the choice of tools that you have at your disposal (all of them "for free"). It is therefore particularly important to choose what is the appropriate one. For you. For this task. (And as the Perl folks love to say: "TMTOWTDI = There's More Than One Way To Do It.")

I personally think, for example, that lots of folks do amazing things with "shell scripts," even though "shell scripting" was never really designed to do the things that they are managing to do with it. So... are they right or are they not? (You decide.)

Anyhow... always be mindful that you might inadvertantly be using a wrench to drive a nail. Also, always be very mindful that, no matter what you are setting out to do, you are not the first person to have done so. "The time that you unnecessarily wasted with a computer could have been time that you spent drinking a nice celebratory beer."
 
2 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regexp: difference between sed and Perl matiasar Programming 2 10-15-2009 11:03 AM
awk regexp for one character match nemobluesix Linux - General 7 02-16-2009 10:50 PM
vim or sed multiline regexp matching eentonig Programming 1 09-08-2008 09:06 AM
SED, regexp or such - remove text after space aolong Linux - General 5 03-07-2008 02:36 PM
help with sed / regexp elinenbe Programming 2 02-01-2008 10:09 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration