LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 09-01-2010, 07:34 AM   #1
subby80
LQ Newbie
 
Registered: Sep 2010
Posts: 11

Rep: Reputation: Disabled
Question Delete range word to word with sed when all you have is one line


Hi i have a file which contains 2 lines, line 1 is static data. line 2 is a very large string(over 3000char or much more). in that string are tags which i want to delete.

e.g.
Code:
<order1>123</order1><tag1>data</tag1><new>1</new><order2>124</order2><tag1>data</tag1>.
all one one line. now i want to delete the string that starts with <tag1> and end with </tag1>. so that this will be the result:

Code:
<order1>123</order1><new>1</new><order2>124</order2>.
what i have is :

Code:
sed -i 's#<tag1>.*</tag1>##g' filename
But what happens that it delete everything between the first <tag1> and the last </tag1>.

So that the result is:

Code:
<order1>123</order1>
you see that everything is delete between the first <tag1> and the last </tag1>

Last edited by subby80; 09-01-2010 at 09:28 AM.
 
Old 09-01-2010, 07:46 AM   #2
subby80
LQ Newbie
 
Registered: Sep 2010
Posts: 11

Original Poster
Rep: Reputation: Disabled
can someone change the subject sep to sed???
 
Old 09-01-2010, 07:51 AM   #3
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,126
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
You should be able to make it not greedy by adding a ? after the .*

sed -i 's#/<tag1>.*?</tag1>##g' filename

You also might need to use -r to get extended regex for the ? to work

Last edited by estabroo; 09-01-2010 at 07:52 AM. Reason: -r option
 
Old 09-01-2010, 07:54 AM   #4
subby80
LQ Newbie
 
Registered: Sep 2010
Posts: 11

Original Poster
Rep: Reputation: Disabled
Hi Estabroo,

when i try your command it doesn't delete anything. everything stays the same.

Thanks
 
Old 09-01-2010, 08:35 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
use awk
Code:
awk -vRS="</tag1>"  '{gsub(/<tag1>.*/,"") }1' ORS="" file
 
Old 09-01-2010, 08:40 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,143

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Or perl - it honours the non-greedy regex.
 
Old 09-01-2010, 08:53 AM   #7
subby80
LQ Newbie
 
Registered: Sep 2010
Posts: 11

Original Poster
Rep: Reputation: Disabled
Hi Ghostdog74,

That seems to work. but now i have antoher problem i cannot use the " sign. because i'm going to automate the script in a progress program. and the " sign means some thing different. Can i change that "?

Thanks
 
Old 09-01-2010, 09:12 AM   #8
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
Try
Code:
sed -i 's#<tag1>[^>]*>##g'
 
Old 09-01-2010, 09:18 AM   #9
subby80
LQ Newbie
 
Registered: Sep 2010
Posts: 11

Original Poster
Rep: Reputation: Disabled
@Kenhelm

with you code it just delete's the first tag <tag> en not the complete string <tag>123</tag>.
 
Old 09-01-2010, 09:18 AM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,143

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Presumes no imbedded (nested) tags - not very general. Might suffice, might not - let the OP decide from the various offerings.

Edit: ??? - looks o.k. to me (subject to my comment above)

Last edited by syg00; 09-01-2010 at 09:22 AM.
 
Old 09-01-2010, 09:31 AM   #11
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
It's working for me using GNU sed.
Code:
echo '<order1>123</order1><tag1>data</tag1><new>1</new><order2>124</order2><tag1>data</tag1>' |
sed  's#<tag1>[^>]*>##g'

<order1>123</order1><new>1</new><order2>124</order2>
 
Old 09-01-2010, 09:36 AM   #12
subby80
LQ Newbie
 
Registered: Sep 2010
Posts: 11

Original Poster
Rep: Reputation: Disabled
I think that is not working because on the real which i have to edit the tag1 is setup like this

Code:
<tag1><cd>value</cd></tag1>
See the tag1 isn't followed directly by the </tag1> as in my example
 
Old 09-01-2010, 09:50 AM   #13
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
With the real data try
Code:
sed -i 's#<tag1><cd>[^<]*</cd></tag1>##g'
 
Old 09-01-2010, 08:38 PM   #14
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,126
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
subby80, guess sed doesn't support non-greedy regex (plus that first / was a typo)

well here is the equivalent in perl

perl -pe 's#<tag1>.*?</tag1>##g' < filename

Last edited by estabroo; 09-01-2010 at 08:39 PM. Reason: typo'd again
 
Old 09-02-2010, 01:15 AM   #15
subby80
LQ Newbie
 
Registered: Sep 2010
Posts: 11

Original Poster
Rep: Reputation: Disabled
Hi All,

I got it to work with this command

Code:
awk -vRS="</tag1>"  '{gsub(/<tag1>.*/,"") }1' ORS="" file
Only i have to replace <tag1> with a variables. So that numerous tags will be deleted. the only problem that i'm now facing is that with this awk commmand you have to pipe it to an output file.

Code:
awk -vRS="</tag1>"  '{gsub(/<tag1>.*/,"") }1' ORS="" file >newfile
Can this be sovled by just the same same way as sed an infile replacement? pipe it to the the same file gives me a 0 bytes file.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed append word at end of line if word is missing franjo124 Linux - Newbie 3 03-08-2012 08:41 PM
awk command line: blank line record sep, new line field sep robertmarkbram Programming 4 02-21-2010 05:25 AM
print second word in 1st line along with 5th word in all the lines after the first bangaram Programming 5 08-31-2009 03:42 AM
Problems Copying & Pasting In Word When Word Closes - Ubuntu davidx Linux - Software 3 10-22-2008 08:21 PM
sed to delete a line for a word and line above cmontr Programming 11 07-03-2008 08:33 AM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 01:40 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration