LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-18-2018, 06:27 AM   #1
L4Z3R
Member
 
Registered: Jan 2018
Posts: 34

Rep: Reputation: Disabled
how can I ignore or remove lines with 2 or more identical numbers in the same line?


Hi!

I need help again. As always I goggle the internet before I ask here. But I get results of something else.

My question is, how can I ignore or remove lines with 2 or more identical numbers in the same line.

For example, here is a sample numbers list

03 02 01
01 02 01
01 05 07

How can I ignore or remove line 2, 01 02 01, which has two 01's in the list.

In other words, I want each line to have unique numbers. I tried this so far:

Code:
cat nums | tr -s '[0-9][0-9]'
03 02 01
01 02 01
01 05 07

Code:
cat nums | tr -s '01'
03 02 01
01 02 01
01 05 07
Neither one works.

I appreciate any help, suggestions and ideas. Thanks

Last edited by L4Z3R; 01-18-2018 at 06:28 AM.
 
Old 01-18-2018, 06:34 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,976

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
this is called grouping and backreference. You need to create a group (this what you are looking for) and use backreference to specify repetition of the same string.
Code:
([0-9][0-9]).*\1
or similar, syntax depends on the tool you use.
 
3 members found this post helpful.
Old 01-18-2018, 06:47 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,145

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
sed is probably the easiest to delete lines based on content. It accepts regex constructs you have already been been directed to in prior threads.
Read the doco.
 
2 members found this post helpful.
Old 01-18-2018, 07:21 AM   #4
L4Z3R
Member
 
Registered: Jan 2018
Posts: 34

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
this is called grouping and backreference. You need to create a group (this what you are looking for) and use backreference to specify repetition of the same string.
Code:
([0-9][0-9]).*\1
or similar, syntax depends on the tool you use.
Code:
egrep -v "([0-9][0-9]).*\1" nums 
03 02 01
01 05 07

It work like a charm!!! kudos to you pan64!!!


+1


Quote:
Originally Posted by syg00 View Post
sed is probably the easiest to delete lines based on content. It accepts regex constructs you have already been been directed to in prior threads.
Read the doco.
Some man pages for some commands are easy to decipher, but man pages for sed, awk and grep can be confusing to understand especially dealing with regex. I know the very, very basics of these commands. Sometimes it's hard to know when to use grouping and how to group it properly. I need to study regex as much as possible.

+1

Last edited by L4Z3R; 01-18-2018 at 07:29 AM.
 
1 members found this post helpful.
Old 01-18-2018, 08:26 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,976

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
you can check probably here: https://www.regextester.com/?fam=100025 (if link works)
 
1 members found this post helpful.
Old 01-18-2018, 10:40 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pan64 View Post
Code:
([0-9][0-9]).*\1
OP asked:
Code:
...how can I ignore or remove lines with 2 or more identical numbers
in the same line.
His example InFile contained two-digit numbers and your solution produced a correct OutFile for this limited case. I tried to extend your solution to numbers of various lengths and was not successful. Please teach us how this is done. You might like to use this sample InFile...
Code:
03 02 01             (keep)
01 02 01             (toss)
01 05 07             (keep)
04 06 06             (toss)
1234 22 56789 33     (keep)
1234 22 1234 33      (toss)
1234 22 123 33       (keep)
Daniel B. Martin

.
 
Old 01-18-2018, 12:11 PM   #7
Sefyir
Member
 
Registered: Mar 2015
Distribution: Linux Mint
Posts: 634

Rep: Reputation: 316Reputation: 316Reputation: 316Reputation: 316
Is there a clear delimitation between each number?

You can trial this with python3.
  1. Split up string into elements of list -> 'a a b c' into ['a', 'a', 'b', 'c'] and remove extra characters like newlines
  2. Put copy of list into set. ['a', 'a', 'b', 'c'] into {'b', 'a', 'c'} (Sets are unordered and can only contain unique values)
  3. Check the number of elements in the list (['a', 'a', 'b', 'c'] = 4) and set ({'b', 'a', 'c'} = 3). If they are equal, print the original string since no duplicates were detected.

Code:
#!/usr/bin/env python3                                                          
import fileinput         

dlm = ' '
for line in fileinput.input():
    dlm_line = line.strip().split(dlm)
    if len(set(dlm_line)) == len(dlm_line):
        print(line, end='')
Code:
$ cat numbers
03 02 01
01 02 01
01 05 07
04 06 06
1234 22 56789 33
1234 22 1234 33
1234 22 123 33
$ ./duplicates < numbers # Or ./duplicates numbers
03 02 01
01 05 07
1234 22 56789 33
1234 22 123 33
 
1 members found this post helpful.
Old 01-18-2018, 12:12 PM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,334
Blog Entries: 3

Rep: Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730
For variable sized numbers, you'd need to add word boundaries before and after the group as part of the pattern. The notation is different for the different styles of regular expression:

Code:
grep -v -E '\<([0-9]+)\>.*\1' numbers.txt
grep -v -E '\<([0-9]+)\>.*\<\1\>' numbers.txt

grep -v -P '\b([0-9]+)\b.*\1' numbers.txt
grep -v -P '\b([0-9]+)\b.*\b\1\b' numbers.txt
In some it might even be [[:<:]] and [[:>:]]

Edit: wrapped \1 in word boundaries as per reminder by pan64 below.

Last edited by Turbocapitalist; 01-19-2018 at 02:21 AM. Reason: wrapped \1 in word boundaries as per reminder by pan64 below.
 
3 members found this post helpful.
Old 01-18-2018, 01:22 PM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Using the method of Sefyir (post #7) and using this InFile ...
Code:
03 02 01               (keep)
01 02 01               (toss)
01 05 07               (keep)
04 06 06               (toss)
1234 22 56789 33       (keep)
1234 22 1234 33        (toss)
1234 22 123 33         (keep)
77 1234 22 1234 22 99  (toss)
... this awk ...
Code:
awk '{delete a; for (j=1;j<=NF;j++) a[$j]++;
   if (length(a)==NF) print}' $InFile >$OutFile
... produced this OutFile ...
Code:
03 02 01               (keep)
01 05 07               (keep)
1234 22 56789 33       (keep)
1234 22 123 33         (keep)
Daniel B. Martin

.
 
2 members found this post helpful.
Old 01-18-2018, 01:46 PM   #10
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Making a fancier result ...

With this InFile ...
Code:
03 02 01               (keep)
01 02 01               (toss)
01 05 07               (keep)
04 06 06               (toss)
1234 22 56789 33       (keep)
1234 22 1234 33        (toss)
1234 22 123 33         (keep)
77 1234 22 1234 22 99  (toss)
... this awk ...
Code:
awk '{delete a; dupes="";
      for (j=1;j<=NF;j++) if (++a[$j]>1) dupes=dupes $j" "
       if (dupes) print $0"  FAILED; repeats were "dupes
       else print}' $InFile >$OutFile
... produced this OutFile ...
Code:
03 02 01               (keep)
01 02 01               (toss)  FAILED; repeats were 01 
01 05 07               (keep)
04 06 06               (toss)  FAILED; repeats were 06 
1234 22 56789 33       (keep)
1234 22 1234 33        (toss)  FAILED; repeats were 1234 
1234 22 123 33         (keep)
77 1234 22 1234 22 99  (toss)  FAILED; repeats were 1234 22
Daniel B. Martin

.
 
1 members found this post helpful.
Old 01-19-2018, 02:11 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,976

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
Quote:
Originally Posted by danielbmartin View Post
I tried to extend your solution to numbers of various lengths and was not successful.
Code:
03 02 01
01 02 01
01 05 07
04 06 06
1234 22 56789 33
1234 22 1234 33
1234 22 123 33
123 456 213 123 678
123 456 786 12345 67
1 5 765346 3
at first, you can simply use +:
Code:
([0-9]+).*\1
but we also need to specify delimiter (to avoid match 234 and 123456), so you need to specify zero length boundaries: http://perldoc.perl.org/perlrebacksl...%7b%7d%2c-%5cB
It is not trivial (looks like zero length pattern cannot be backreferenced), so:
Code:
\b([0-9]+)\b.*\b\1\b
works.
 
1 members found this post helpful.
Old 01-20-2018, 01:15 AM   #12
L4Z3R
Member
 
Registered: Jan 2018
Posts: 34

Original Poster
Rep: Reputation: Disabled
Thanks to all here for the new codes you provided. I am slowly learning this complex regex stuff.

BTW, which is easier to learn perl or python?

+1 rep to all
 
Old 01-20-2018, 03:21 AM   #13
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,334
Blog Entries: 3

Rep: Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730
I think the answer to that question depends on you. But I'll ramble since you ask. I myself find perl much, much easier and quite fun but part of that is that there are some key characteristics of python that I do not like at all and I'm not able to get past that distaste. That said, there was also a big push for a long time to disparage perl. I think it was backed by M$ in an attempt to push one of their failures but instead most people just pivoted to python and (ugh) PHP. perl has much more flexible syntax, a proven mature catalog of modules, and more powerful regular expressions. However, most regex work can still be met by python. In favor of python is that it has be adopted by a great many successful training programmes and initiatives as a training language. The back side of that is that it strikes me as a training language and may end up haunting us 20 years from now in bad ways like BASIC once did. Python enjoys a certain trendiness at the moment. I also suspect, but don't fully have the skill to assess, that perl has been put together better from a CS standpoint.
 
Old 01-20-2018, 03:29 AM   #14
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,145

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
One assumes all those comments pertain to perl 5. Only.
The schism in perl is no more attractive than that in python. The user has been the victim of the developers once again.

I keep trying to get into python, but it just hasn't happened.
 
Old 01-20-2018, 03:33 AM   #15
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,334
Blog Entries: 3

Rep: Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730Reputation: 3730
Quote:
Originally Posted by syg00 View Post
One assumes all those comments pertain to perl 5. Only.
Yes. Perl 6 is a totally different language despite the name and the development team. I have not gotten around to looking carefully at Perl 6, it might be good it might not be. However, it is not ubiquitous like Perl 5 is, and has been for decades.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Remove numbers, spaces, line breaks & paragraphs from .txt on CLI rokyo Linux - Newbie 3 08-14-2017 09:20 AM
[SOLVED] append lines to specific line numbers socalheel Programming 1 07-07-2014 05:01 PM
[SOLVED] remove similar (not identical) lines of text steve51184 Linux - Software 5 04-24-2012 11:45 PM
Using diff to compare file with common lines, but at different line numbers jimieee Linux - Newbie 3 05-10-2004 07:26 AM
remove identical lines in a file benjithegreat98 Linux - General 4 04-24-2004 06:12 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration