LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-17-2015, 01:59 PM   #1
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Rep: Reputation: 46
Search string on pattern of special characters Bash


Ladies & Gents

Thanks for all the great help and guidance you offer.

I need to test 550 unique strings for the presence of special characters in them and preform actions based on which pattern is detected.

Those five patterns are;
#- as in the sample string et0106.htm#9-22 (400 of these ?)
#-# as in the sample string et0101.htm#1-et0102.htm#3 (100 of these?)
#-#-# as in the sample string et0119.htm#21-et0120.htm#-et0121.htm#4
; as in the sample string et1027.htm#6-et1028.htm#13;et1029.htm#22-23
and a fall through et0534.htm# (there is only this one)

This does not work
Code:
for ALIYAH in "${arr[@]:1:11}";do
      pat=[#-]
      if [[ "$ALIYAH" == "$pat" ]]; then
	SHIR1="$(echo "$ALIYAH" |awk -F \# '{print $1}')"
	START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"
	END1="$(echo "$ALIYAH" |awk -F - '{print $2}')"
      pat2=[#-#]
      elif [[ "$ALIYAH" == "$pat2" ]]; then
	# some actions
      pat3=[#-#-#]
      elif [[ "$ALIYAH" == "$pat3" ]]; then
	# some action
      pat4=[;]
      elif [[ "$ALIYAH" == "$pat4" ]]; then
	# some actions
      else
	SHIR10="$(echo "$ALIYAH" |awk -F \# '{print $1}')"
      fi
It falls right out the bottom to the fall through "else". Some of the debug
Code:
+ for ALIYAH in '"${arr[@]:2:11}"'
+ pat='[#-]'
+ [[ some.htm#11-38 == \[\#\-\] ]]
++ echo some.htm#11-38
++ awk -F '#' '{print $1}'
+ SHIR10=some.htm
The pattern of the special characters is the only constant in the strings but the number of [a-z][0-9] between them changes as can be seen from the samples above. So clearly something more complex than what I have is required to make it ignore the regular characters between the special characters.

To break the most complex one down

et1027.htm # 6 - et1028.htm # 13 ; et1029.htm # 22 - 23
filename # regex in file - filename # regex in file ; filename # regex in file - regex in file

The file names can be one character longer but never shorter
The regex can be as many as 3 numbers long (123)

Trying to change the special characters to something else is somewhat problematic because of the difference in the strings. I already had to generate them by hand, I don't want to go through and change them by hand too.

Thanks Again
 
Old 03-17-2015, 04:14 PM   #2
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
So after some more looking for the issue I found a possible solution

Code:
if [[ "$ALIYAH" =~ [#-] ]]; then
But as I feared it also tests positive for pattern 2 #-#-# and I really don't want to change the order of the testing to test for the least likely matches first.
 
Old 03-17-2015, 04:39 PM   #3
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
I do have a different problem too that is not related to the conditional tests directly. In the processing resulting from the conditional test I have
Code:
	w3m -dump -T text/html $STORDIR/JPS/$SHIR1 | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' | sed -n "/$START1/,/$END1/p" >  $TMPDIR/$PARSHA$arr[]
But it is not being processed in the correct order by bash for some reason which can be seen in the debug code and the resulting file.
Code:
+ w3m -dump -T text/html /home/kingbee/bin/shabbat/data/JPS/et0306.htm
+ sed 1,5d
+ sed -n /1/,/12/p
+ sed -e :a -e '$d;N;2,6ba' -e 'P;D'
Can I combine some of the sed's somehow?

Thanks again
 
Old 03-17-2015, 07:32 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Yes you can combine the seds, try looking here

Quote:
I really don't want to change the order of the testing to test for the least likely matches first.
In this case you may then find it impossible to solve. You cannot force the system just because you do not wish to change something.
You either wish to find the most regular occurrence first so as the others are only run every now and then or you need to run through process of elimination.
Your examples of #-, #-# and #-#-# obviously grow from the first to the last so if you test for #- first then you can expect to never get to the other solutions.
 
Old 03-17-2015, 07:55 PM   #5
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Thanks grail

I thought that may be the case. In my search looking for an answer I came across a simple way to use the numerical value of the strings to do the test instead. So I changed them to
Code:
if [[ ${#ALIYAH} -gt 13 ]] && [[ ${#ALIYAH} -lt 18 ]]

if [[ ${#ALIYAH} -gt 23 ]] && [[ ${#ALIYAH} -lt 30 ]]

if [[ ${#ALIYAH} -gt 36 ]] && [[ "$ALIYAH" -ne "$pat4" ]]
and it seams to work fairly well.

I will look into the sed link you gave me and see what I can do about the other thing.

Thanks again
 
Old 03-17-2015, 11:33 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Personally I would use (()) over [[]] so you can use arithmetic tests. You can also place the && inside the brackets so only one set is needed.

I would expect the last test to never work as pat4 = ';' and ALIYAH contains ';' but is never equal to a single ';'
 
Old 03-18-2015, 05:37 AM   #7
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Thanks grail

I am having an issue with this line even parsing correctly. Shellcheck reports
Code:
pat4="[;]"
elif (( ${#ALIYAH} > 36 )) && [[ "$ALIYAH" ~= "$pat4" ]]; then
            ^SC1009 The mentioned parser error was in this elif clause.
                                          ^SC1073 Couldn't parse this test expression.
                                                         ^SC1072  Fix any mentioned problems and try again.
I have tried both (()) and [[]] and with && inside and outside, still no joy.

I can't just use a simple string length comparison because the strings that contain the ; and the ones that don't have overlap in length, bummer.

AH, found it
Code:
[[ ${#ALIYAH} -gt 36 ]] -a [["$ALIYAH" ~= [\;] ]]
Thanks again
 
Old 03-18-2015, 05:54 AM   #8
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Guess I spoke too soon.

Shellcheck reports it good but bash be sayin me no think so
Code:
./ReadingTest: line 81: syntax error near unexpected token `-a'                                                                                                                           
./ReadingTest: line 81: `      elif [[ ${#ALIYAH} -gt 36 ]] -a [["$ALIYAH" ~= [\;] ]]; then'
and if I change the -a to && bash says
Code:
$ ./ReadingTest                                                                                                                                  
./ReadingTest: line 94: conditional binary operator expected                                                                                                                              
./ReadingTest: line 94: syntax error near `~='                                                                                                                                            
./ReadingTest: line 94: `      elif [[ "$ALIYAH" ~= [\;] ]]; then'
Which is the same error I had that made me pat4="[;]" thinking that would be parsed correctly but then you tell me no it won't. So now me be even more confused than before.

I can change the ; to something else in the file name, but it would have to be a letter not [abehmt] as those letters and all numbers are part of the file names. Will have to give that some consideration as to what may be the best.
 
Old 03-18-2015, 06:11 AM   #9
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Needed a space and changing the ; to a Z did the trick.

Now back to getting it to process the sed's correctly, and naming the output file to something that makes sense.

Last edited by rbees; 03-18-2015 at 06:13 AM.
 
Old 03-18-2015, 06:49 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
I am guessing all the places you have '~=' is a typo and you have placed these correctly in the working version?
 
Old 03-23-2015, 04:59 PM   #11
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Thanks again for all the guidance & help
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash - escape all special characters in variable string? arashi256 Linux - Newbie 10 01-23-2012 09:43 AM
[SOLVED] /bin/bash if statement pattern search, end of pattern special character? headhunter_unit23 Programming 3 04-29-2010 08:05 AM
search and replace string having multiple special characters say_hi_ravi Linux - Newbie 4 08-26-2009 07:43 AM
How do I replace special characters in a string within a bash script? rhaup0317 Linux - Newbie 2 06-03-2008 11:56 AM
bash command for removing special characters from string kkpal Linux - Newbie 5 05-26-2008 08:14 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration