LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-31-2012, 04:47 AM   #1
b.lundblad@fabula.se
LQ Newbie
 
Registered: Dec 2008
Posts: 3

Rep: Reputation: 0
Gawk - regexp [A-Z] matches [a-z]. How is this possible?


Hi.

I'm having a problem with a regexp i Gawk giving an unpredicted result.

I'm using the following simple code:
--------------------------------
BEGIN {
s="version"
r="[A-Z]"

if ( match(s, r) ) {
printf "%s %s %s\n", s, r, substr(s, RSTART, RLENGTH)
} else {
printf "NO MATCH\n"
}
}
--------------------------------
When I run this I get a match for the first letter "v" in "version"!!!! How is this possible???

I'm running it under the following circumstances:
Operating sys: Linux CENTOS 5.2
Shell: GNU bash ver. 3.2.25
Env.setting: LANG=en_US.UTF-8
Filesetting: Code written with editor Kwrite with encoding both "utf-8" AND "Central European cp 1250" with the same result

When I run the same code under windows gawk 3.1.5 I get the anticipated result of "NO MATCH".

I suspect it has to do with encoding of the file but I cannot figure out how. I did "man gawk" but failed to locate an answer.
I'm grateful for any lead on this "mystery".

I'm a newbie to this forum - hope I'm at the right place.
Thanks
/Bertil
 
Old 08-31-2012, 05:36 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,009

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
What version of gawk are you using in your example? A direct copy n paste of your code gave me the 'NO MATCH' desired result.
 
Old 08-31-2012, 07:46 AM   #3
b.lundblad@fabula.se
LQ Newbie
 
Registered: Dec 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Hi Grail!

Thanks for your time! My linux version of Gawk is the same as for my windows version, 3.1.5

What do you get if you do the following?

set | grep "LANG"

Is it utf-8?.

/Bertil
 
Old 08-31-2012, 08:05 AM   #4
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

Quote from info gawk character lists:
Quote:
2.4 Using Character Lists
=========================

Within a character list, a "range expression" consists of two
characters separated by a hyphen. It matches any single character that
sorts between the two characters, using the locale's collating sequence
and character set. For example, in the default C locale, `[a-dx-z]' is
equivalent to `[abcdxyz]'. Many locales sort characters in dictionary
order, and in these locales, `[a-dx-z]' is typically not equivalent to
`[abcdxyz]'; instead it might be equivalent to `[aBbCcDdxXyYz]', for
example.
To obtain the traditional interpretation of bracket
expressions, you can use the C locale by setting the `LC_ALL'
environment variable to the value `C'.
Code:
$ echo $LANG
en_US.UTF-8
$ gawk -f test.awk
version [A-Z] v
$ LC_ALL=C gawk -f test.awk
NO MATCH
$ mawk -f test.awk
NO MATCH

Last edited by firstfire; 08-31-2012 at 08:07 AM.
 
2 members found this post helpful.
Old 08-31-2012, 08:58 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,009

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Code:
$ set | grep "LANG"
LANG=en_US.UTF-8
$ gawk --version
GNU Awk 4.0.1
 
Old 08-31-2012, 11:15 AM   #6
b.lundblad@fabula.se
LQ Newbie
 
Registered: Dec 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Smile I think it is solved!

Thanks to you both Grail and Firstfire for your interest!

I tried it and it worked exactly as firstfire pointed out by using

LC_ALL=C gawk -f test.awk

I think I now have learned something about the regexp's and the info system, especially "info gawk"! My only excuse for this ignorance is that things have worked so smoothly until now so I never really had a reason (this is almost true :-) .....

Also it might be a good thing to upgrade my gawk

Thanks again guys for helping me out!
/Bertil
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] gawk 3.1.3 vs gawk 3.1.1 sharky Programming 2 04-13-2010 01:55 PM
gawk help needed. Speedy2k Linux - Newbie 2 11-07-2008 10:10 AM
gawk help.... visitnag Linux - Newbie 1 04-12-2008 11:55 AM
bash: routine outputting both matches and non-matches separately??? Bebo Programming 8 07-19-2004 06:52 AM
FS=? in gawk realos Programming 2 05-28-2003 07:30 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration