LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-18-2015, 04:42 AM   #1
Thirumala!
LQ Newbie
 
Registered: Nov 2015
Posts: 4

Rep: Reputation: Disabled
Deleting n number of consecutive occurrences of a pattern


Hello All,

I want to delete a particular number of consecutive occurrences of a pattern from the file using awk. Please help me with the same.

Example of the file contents

0000
0010
0011
0000
0000
0000
0000
0000
1111
1111
0010
0000

Now I want to delete only the block where 0000 has repeated 5 times consecutively and keep other 0000's unchanged. How can i do this using awk?

Thanks in advance,
Thirumala
 
Old 11-18-2015, 05:07 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,149

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
So you want, what have you attempted ?.
You make the effort, we'll help when you run into trouble.
 
Old 11-18-2015, 05:13 AM   #3
Thirumala!
LQ Newbie
 
Registered: Nov 2015
Posts: 4

Original Poster
Rep: Reputation: Disabled
Hey syg00,

I have tried the below command

cat temp | awk 'N&&sub(PAT,REPL){N--};1' N=291 PAT="0000" REPL="" > temp1
cat temp1 | sed '/^$/d' > temp2

This command deletes first 291 occurrences but I want to delete the 291 consecutive occurrences.

Thanks,
Thirumala

Last edited by Thirumala!; 11-18-2015 at 05:14 AM.
 
Old 11-18-2015, 05:55 AM   #4
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Try this:

When the input line matches the pattern, remember the line in an array and count down. If counter is 0, throw the array away and set the counter back to N.
When it doesn't match the pattern:
- if the array isn't empty, less than N patterns were in a row, so write the array out. Clear the array. Set the counter back to N.
- print the current line

I wonder if it can be done with other commands.
 
Old 11-18-2015, 06:05 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Another thing to consider would be, what if there are more than 5 in a row? Do you delete if it is 6? Or only if another 5, ie. 10?
 
Old 11-18-2015, 06:09 AM   #6
Thirumala!
LQ Newbie
 
Registered: Nov 2015
Posts: 4

Original Poster
Rep: Reputation: Disabled
It should not replace if occurrences are more than n. And it should replace only if next set of occurrences are n again.
 
Old 11-18-2015, 06:32 AM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,149

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
Nope, we are not going to write it for you.
You have been given some hints - incorporate them in your code. The countdown is a good idea, use it to also test if the current record is equal to the previous.
 
1 members found this post helpful.
Old 11-18-2015, 07:05 PM   #8
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Quote:
Originally Posted by berndbausch View Post
Try this:

When the input line matches the pattern, remember the line in an array and count down. If counter is 0, throw the array away and set the counter back to N.
When it doesn't match the pattern:
- if the array isn't empty, less than N patterns were in a row, so write the array out. Clear the array. Set the counter back to N.
- print the current line
Sorry I couldn't resist the itch and ended up writing it. Why not share it then:
Code:
#!/usr/bin/awk -f

BEGIN   { N=5; PAT="0000"; ix=0 }
$0==PAT { saved[ix] = $0; ix++;
          N--
          if (N==0) { delete saved; N=5 }
          next                               }

        { for (i in saved) print saved[i]
          delete saved
          N=5
          print                           }
Adding this condition is left as an exercise:
Quote:
Originally Posted by Thirumala! View Post
It should not replace if occurrences are more than n. And it should replace only if next set of occurrences are n again.
By the way, now I notice that I forget to reset the index variable ix. Thanks to the associative nature of awk arrays, this doesn't seem to be a problem.
 
1 members found this post helpful.
Old 11-19-2015, 12:05 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
@berndbausch - just remember that now this user may expect to be told answers without doing any work in the future too

But, as you have let the cat out of the bag, here are 2 points of interest:

1. What happens if the last 3 entries in the file are the pattern?

2. If you rethink your use of N, you could reduce it to only being needed once outside the definition (hint: consider ix values)
 
Old 11-19-2015, 02:07 AM   #10
Thirumala!
LQ Newbie
 
Registered: Nov 2015
Posts: 4

Original Poster
Rep: Reputation: Disabled
Hello All,

Thanks for the help. This is the first time i am using awk so took more help.
And rest assured that i will not expect any ready answers from you guys.

Thanks,
Thirumala

Last edited by Thirumala!; 11-19-2015 at 06:00 AM.
 
Old 11-19-2015, 03:20 AM   #11
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Quote:
Originally Posted by grail View Post
@berndbausch - just remember that now this user may expect to be told answers without doing any work in the future too

But, as you have let the cat out of the bag, here are 2 points of interest:

1. What happens if the last 3 entries in the file are the pattern?

2. If you rethink your use of N, you could reduce it to only being needed once outside the definition (hint: consider ix values)
Polishing is exercise for the reader, and if somebody has wrong expectations, they can be reset quickly.
Well, whenI have a little more time I may do the polishing just to prove my value
 
Old 11-19-2015, 03:23 AM   #12
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Quote:
Originally Posted by grail View Post
@berndbausch - just remember that now this user may expect to be told answers without doing any work in the future too

But, as you have let the cat out of the bag, here are 2 points of interest:

1. What happens if the last 3 entries in the file are the pattern?

2. If you rethink your use of N, you could reduce it to only being needed once outside the definition (hint: consider ix values)
Well an END clause can take care of #1, and my brain is full so no rethinking #2 for now.
 
Old 11-19-2015, 03:34 AM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,149

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
Quote:
Originally Posted by berndbausch View Post
Sorry I couldn't resist the itch and ended up writing it. Why not share it then:

Quote:
Thanks to the associative nature of awk arrays, this doesn't seem to be a problem.
They have lots of unexpected behaviours - one of the most notable being that they don't guarantee order.
 
Old 11-19-2015, 07:00 AM   #14
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,823

Rep: Reputation: 1213Reputation: 1213Reputation: 1213Reputation: 1213Reputation: 1213Reputation: 1213Reputation: 1213Reputation: 1213Reputation: 1213
Not all awk versions print a
Code:
for (i in array)
in the correct order.
Because the order is to be kept, we can store it in a string as well
Code:
awk '
{ buf=buf sep $0; sep=RS }  # add sep and $0 to buf; undefined variables are "" in string context; RS is newline
$0!="0000" { print buf; f=0; buf=sep=""; next }  # print and clear buffer; "next" skips the following code
++f==5 { f=0; buf=sep="" }  # if 5 found then clear buffer; an undefined variable is 0 in number context
END {if (f>0) print buf}  # print a remaining buffer
' temp

Last edited by MadeInGermany; 11-19-2015 at 07:02 AM.
 
Old 11-19-2015, 07:41 AM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
I think some of you might be getting a little too carried away with the order stuff, try and remember what is being stored in the array, ie. it is only the same pattern (0000), so really
order here is pretty irrelevant
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed: one or more occurrences of a pattern rm_-rf_windows Programming 15 11-06-2012 08:58 AM
[SOLVED] count how many number occurrences in series kpinto Linux - Newbie 4 06-02-2011 08:22 AM
[SOLVED] Need sed help: s/ command won't replace two occurrences of pattern on same line GrapefruiTgirl Programming 7 12-16-2009 02:08 AM
the number of occurrences of the word in file hinetvenkat Linux - Software 1 02-20-2008 06:25 AM
how to count the number of occurrences of a process beeblequix Linux - General 3 09-18-2006 04:17 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration