[SOLVED] Migrate Regexp from SED to AWK

cgcamal · 04-22-2010, 10:46 PM

Hi guys,

I have a sed script to search and replace a pattern on the next kind of text:

Code:

C/username/Mydocuments/games & music
C/username/Mydocuments/New files 09-17-2007
C/username/settings

The script is:

Code:

sed 's/\([^/]*\/[^/]*\/\).*$/\1New String/g' inputfile

The script search strings with 3 or 2 subfolders level and replaces with "New string "as follow:
If has 3 subfolders, the script replaces last 2 subfolders
from:

Code:

C/username/Mydocuments/games & music
C/username/Mydocuments/New files 09-17-2007

to

Code:

C/username/New String
C/username/New String

If has 2 subfolders, the script replaces last subfolder
from

Code:

C/username/settings

to

Code:

C/username/New String

The last directory strings belongs to column 3 within inputfile, and I would like to
do the same job(search and replace on 3rd column) with awk, but using the same regex it doesn´t work.

I´ve tried two ways, in both get errors:

Code:

1-) awk -F"|" '{gsub(\([^/]*\/[^/]*\/\).*$,"New string",$3};print}' infile
awk: {gsub(\(.*\/\)[^/]*$,"New string",$3};print}
awk:       ^ backslash not last character on line
and
2-) awk -F"|" '{gsub(([^/]*\/[^/]*\/).*$,"New  string",$3};print}' infile
awk: {gsub(([^/]*\/[^/]*\/).*$,"New string",$3};print}
awk:        ^ syntax error
awk: fatal: Invalid regular expression: /]*\/[^/

Maybe somebody give me some help, in which would be the correct regexp to include with awk script to do the same task.

*I´m using Cygwin.

Thanks in advance,

Regards,

grail · 04-23-2010, 12:39 AM

If more than 3 exists we may have to rethink, but this works on example provided:

Code:

awk 'BEGIN{OFS=FS="/"}{print $1,$2,"New String"}' input_file

cgcamal · 04-23-2010, 01:30 AM

Hi grail, thanks for your reply.

Actually I´m trying to do it now using awk and its gsub function, because the file now has 40 columns separeted by "|", and I would like to operate only in 3erd column.

A sample would be as follow:

Code:

data1 column1|data1 column2|C/username/Mydocuments/games & music|data1 column4
data2 column1|data2 column2|C/username/Mydocuments/New files 09-17-2007|data2 column4
data3 column1|data3 column2|C/username/settings|data3 column4

I´m trying with the next awk script

Code:

awk -F"|" -v OFS="|" '{gsub(/\([^/]*\/[^/]*\/\).*$/,"New string",$3); print $0}' inputfile

But doesn´t seem o work.

the desired output is:

Code:

data1 column1|data1 column2|C/username/New string|data1 column4
data2 column1|data2 column2|C/username/New string|data2 column4
data3 column1|data3 column2|C/username/New string|data3 column4

Maybe somebody give some help about this.

Many thanks in advance.

grail · 04-23-2010, 01:59 AM

Well its not pretty yet but i came up with two alternatives:

Code:

awk 'BEGIN{OFS=FS="|";str="New String"}{gsub(/.*\//,"&"str,$3);gsub(str".*",str,$3)}1' inputfile

or

awk 'BEGIN{OFS=FS="|";str="New String"}{split($3,arr,"/");$3=arr[1]"/"arr[2]"/"str}1' inputfile

cgcamal · 04-23-2010, 02:49 AM

grail,

Simply perfect!.

The second option works even better that I was looking for.

Many thanks really.

colucix · 04-23-2010, 03:03 AM

Another option, more similar to the sed command, is by means of gensub:

Code:

awk 'BEGIN{OFS=FS="|"}{$3 = gensub(/([^/]*\/[^/]*\/).*$/,"\\1New String",1,$3)}1' inputfile

the only difference is that you must not escape the parentheses and the matched text is evaluated as \\1 in the replacement. Note that this function is available only in GNU awk: if compatibility to other awk implementations is an issue, you cannot use gensub.

cgcamal · 04-23-2010, 04:20 AM

Hi colucix,

Thanks for share that and for your explanation, I didn´t know about gensub, and now I understand better why the same regexp wasn´t working in awk for me. It works nice in my cygwin.

One question,

Running your script, but replace the pattern with nothing (""), remains a "/" at the end of 3rd column, I now how to delete it with sed using "sed 's/\/$//'". But only if is applied to last column, otherwise the rest of columns are deleted. In order to do not affect other columns information, is better do it with awk I think.

What code is needed to add to your script to delete the remaining "/" when I replace the pattern with "" nothing?

Thanks in advance.

colucix · 04-23-2010, 04:32 AM

Well.. you can try a simple sub function whose regexp matches a trailing / (if any):

Code:

 awk 'BEGIN{OFS=FS="|"}{$3 = gensub(/([^/]*\/[^/]*\/).*$/,"\\1",1,$3); sub(/\/$/,"",$3)}1' inputfile

cgcamal · 04-23-2010, 02:51 PM

Thanks, thanks colucix. Works great. Something more I´ve learned today!

One more question, maybe somebody knows.

What kind of regexp are used in command shell, for instance, used by awk or sed? Is Perl style or another variant?

I ask this because the regexp used in this problem certainly works with sed and with a little modfication in awk too, but when I test the regexp in a freeware utility like RegExr to test regexp expressions, this regexp doesn´t work.

Thanks in advance for your help.

Regards,

sundialsvcs · 04-23-2010, 10:32 PM

Regular-expression libraries do (unfortunately...) vary somewhat, although most of them have followed Perl's lead.

Speaking of Perl... It certainly seems to me that Perl might be "the cat's meow" in this situation. It might well be "the right tool for the job."

In the Unix|Linux environments, you have "an embarrassment of riches" with regard to the choice of tools that you have at your disposal (all of them "for free"). It is therefore particularly important to choose what is the appropriate one. For you. For this task. (And as the Perl folks love to say: "TMTOWTDI = There's More Than One Way To Do It.")

I personally think, for example, that lots of folks do amazing things with "shell scripts," even though "shell scripting" was never really designed to do the things that they are managing to do with it.

So... are they right or are they not? (You decide.)

Anyhow... always be mindful that you might inadvertantly be using a wrench to drive a nail. Also, always be very mindful that, no matter what you are setting out to do, you are not the first person to have done so. "The time that you unnecessarily wasted with a computer could have been time that you spent drinking a nice celebratory beer."