[SOLVED] Delete range word to word with sed when all you have is one line

subby80 · 09-01-2010, 07:34 AM

Hi i have a file which contains 2 lines, line 1 is static data. line 2 is a very large string(over 3000char or much more). in that string are tags which i want to delete.

e.g.

Code:

<order1>123</order1><tag1>data</tag1><new>1</new><order2>124</order2><tag1>data</tag1>.

all one one line. now i want to delete the string that starts with <tag1> and end with </tag1>. so that this will be the result:

Code:

<order1>123</order1><new>1</new><order2>124</order2>.

what i have is :

Code:

sed -i 's#<tag1>.*</tag1>##g' filename

But what happens that it delete everything between the first <tag1> and the last </tag1>.

So that the result is:

Code:

<order1>123</order1>

you see that everything is delete between the first <tag1> and the last </tag1>

subby80 · 09-01-2010, 07:46 AM

can someone change the subject sep to sed???

estabroo · 09-01-2010, 07:51 AM

You should be able to make it not greedy by adding a ? after the .*

sed -i 's#/<tag1>.*?</tag1>##g' filename

You also might need to use -r to get extended regex for the ? to work

subby80 · 09-01-2010, 07:54 AM

Hi Estabroo,

when i try your command it doesn't delete anything. everything stays the same.

Thanks

ghostdog74 · 09-01-2010, 08:35 AM

use awk

Code:

awk -vRS="</tag1>"  '{gsub(/<tag1>.*/,"") }1' ORS="" file

syg00 · 09-01-2010, 08:40 AM

Or perl - it honours the non-greedy regex.

subby80 · 09-01-2010, 08:53 AM

Hi Ghostdog74,

That seems to work. but now i have antoher problem i cannot use the " sign. because i'm going to automate the script in a progress program. and the " sign means some thing different. Can i change that "?

Thanks

Kenhelm · 09-01-2010, 09:12 AM

Try

Code:

sed -i 's#<tag1>[^>]*>##g'

subby80 · 09-01-2010, 09:18 AM

@Kenhelm

with you code it just delete's the first tag <tag> en not the complete string <tag>123</tag>.

syg00 · 09-01-2010, 09:18 AM

Presumes no imbedded (nested) tags - not very general. Might suffice, might not - let the OP decide from the various offerings.

Edit: ??? - looks o.k. to me (subject to my comment above)

Kenhelm · 09-01-2010, 09:31 AM

It's working for me using GNU sed.

Code:

echo '<order1>123</order1><tag1>data</tag1><new>1</new><order2>124</order2><tag1>data</tag1>' |
sed  's#<tag1>[^>]*>##g'

<order1>123</order1><new>1</new><order2>124</order2>

subby80 · 09-01-2010, 09:36 AM

I think that is not working because on the real which i have to edit the tag1 is setup like this

Code:

<tag1><cd>value</cd></tag1>

See the tag1 isn't followed directly by the </tag1> as in my example

Kenhelm · 09-01-2010, 09:50 AM

With the real data try

Code:

sed -i 's#<tag1><cd>[^<]*</cd></tag1>##g'

estabroo · 09-01-2010, 08:38 PM

subby80, guess sed doesn't support non-greedy regex (plus that first / was a typo)

well here is the equivalent in perl

perl -pe 's#<tag1>.*?</tag1>##g' < filename

subby80 · 09-02-2010, 01:15 AM

Hi All,

I got it to work with this command

Code:

awk -vRS="</tag1>"  '{gsub(/<tag1>.*/,"") }1' ORS="" file

Only i have to replace <tag1> with a variables. So that numerous tags will be deleted. the only problem that i'm now facing is that with this awk commmand you have to pipe it to an output file.

Code:

awk -vRS="</tag1>"  '{gsub(/<tag1>.*/,"") }1' ORS="" file >newfile

Can this be sovled by just the same same way as sed an infile replacement? pipe it to the the same file gives me a 0 bytes file.