LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-15-2012, 04:36 AM   #1
nikmit
Member
 
Registered: May 2011
Location: Nottingham, UK
Distribution: Debian
Posts: 178

Rep: Reputation: 34
Question Diff regex matching


I am diff-ing two files, containing ip addresses.
I want to avoid output about differences in those ip addresses, so I am using:

Code:
diff -I '1[.2.3.4|.2.3.5]' file1 file2   # works
and that works as much as it does ignore 1.2.3.4 and 1.2.3.5
It is not however how one would construct a regex, and what I would think is a better way of doing it fails:
Code:
diff -I '1\.2\.3\.[4|5]' file1 file2   # fails
diff -I '1.2.3.[4|5]' file1 file2   # fails
The square brackets seem to have functionality specific to diff, which is not explained in the man pages.
Matching for two expressions also has it's quirks as I needed to escape the pipe, but pipe does't need to be escaped withing square brackets:

Code:
diff -I 'checksum|hostname' file1 file2    # fails
diff -I 'checksum\|hostname' file1 file2   # matches either checksum or hostname
diff -I '[checksum|hostname]' file1 file2  # works as well, as above
The question - is this expected and normal behaviour or am I basing my script on hacks which might not be available after next update?
Is there any way to escape a dot other than by devising some solution using square brackets?
 
Old 06-15-2012, 05:40 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,976

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
have you tried:
diff -I '1.2.3.[45]' file1 file2
or
diff -I '1.2.3.[4\|5]' file1 file2
?
 
Old 06-15-2012, 09:02 AM   #3
nikmit
Member
 
Registered: May 2011
Location: Nottingham, UK
Distribution: Debian
Posts: 178

Original Poster
Rep: Reputation: 34
Quote:
Originally Posted by pan64 View Post
have you tried:
diff -I '1.2.3.[45]' file1 file2
or
diff -I '1.2.3.[4\|5]' file1 file2
?
The problem is not in getting the 'or' bit to work, but that I don't know how to escape the dots in the ip.
That is why if I move all the dots into the 'or' brackets, like in
Code:
diff -I '1[.2.3.4|.2.3.5]' file1 file2   # works
everything works. Why it works is another thing I don't know

I expected that escaping like '1\.2\.3\.4' should work, but it doesn't. I didn't expect that '1[.2.3.4|.2.3.5]' would work, but it does.

Last edited by nikmit; 06-15-2012 at 09:07 AM.
 
Old 06-15-2012, 11:52 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
I'm not all that familiar with the intricacies of diff, but the info page says that "-I" accepts "grep-style regular expressions". I'm assuming that means basic regex, and not extended, but that's not perfectly clear.

In any case, for characters with special meanings, just give each one you want to be literal its own bracket expression. (You can also backslash-escape them, but I think it's generally cleaner and safer to use brackets.)

Code:
diff -I '1[.]2[.]3[.][45]' file1 file2
BTW, this really shouldn't work, or at least doesn't do what you expect it to:

Code:
'1[.2.3.4|.2.3.5]'
It's equivalent to this:

Code:
'1[2345.|]'
That is, a 1, followed by a single character that's either 2-5, period, or pipe. Bracket expressions don't accept pipes for "or" patterns, as they already are "or" patterns, for single characters only.

Most versions of regex do accept '(string1|string2)' for "or" patterns of longer strings. In gnu programs with basic regex, you have to backslash escape the parentheses and pipe to make them special, so this might work for you too:

Code:
'\(1[.]2[.]3[.]4\|1[.]2[.]3[.]5\)'

Last edited by David the H.; 06-15-2012 at 11:54 AM.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
matching a postix regex in bash casperdaghost Programming 4 05-30-2012 03:37 AM
Perl Regex matching HTML hawk__0 Programming 2 03-19-2010 07:57 PM
Embedded regex matching in Perl GATTACA Programming 5 01-17-2007 09:16 AM
regex matching things like å õ í ë ã è, etc. aunquarra Programming 2 05-04-2005 07:53 AM
perl regex matching exodist Programming 2 11-15-2004 10:50 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration