LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-22-2015, 07:35 PM   #1
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Rep: Reputation: 46
use array index as file name


Ladies & Gents,

Thanks again for all the help and guidance given on this site.

I have looked and looked for a way to use the index of the array in the file name but every thing I have tried has come up The iteration below does not even generate file names and is only a small sample.

All I want is the index number, a simple 1-9 to be the file name. I know that something like ${!arr[key]} is suppose to reference the key value and not the element value. I can get the element value easy enough but it does not make a very good file name. I.E. et1107.htm#21-et1108.htm#3Zet1109.htm#22-23 Where as $PARSHA/$arrayINDEX will make complete sense.

This code also returns in one place
Code:
./ReadingTest: line 190: $TMPDIR/${!arr[@]}: ambiguous redirect
Code:
READING=$(grep ^"$PARSHA" "$STORDIR"/Readings.csv)
    IFS=, read -a arr <<<"$READING"
    for ALIYAH in "${arr[@]:1:9}";do

# pattern 1 ####################################
      if (( ${#ALIYAH} > 13 && ${#ALIYAH} < 18 )); then
	echo "TYPE 1"
	SHIR1="$(echo "$ALIYAH" |awk -F \# '{print $1}')"
	START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"
	END1="$(echo "$ALIYAH" |awk -F - '{print $2}')"
	END1=$((END1 + 1))
	w3m -dump -T text/html $STORDIR/JPS/$SHIR1 | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' | awk '/$START1/,/$END1/' >  $TMPDIR/tempfile1a
	  if (( $START1 > 1 )); then
	    # strip leading words first line
	    FIRSTLINE=$(sed -n -e "1 s/^.*$START1/$START1/p" $TMPDIR/tempfile1a)
	    sed -i "1s/.*/$FIRSTLINE/" $TMPDIR/tempfile1a # repalce first line
	  fi
	# strip trailing words
	sed -i "/$END1/q" $TMPDIR/tempfile1a
	sed -e "s/$END1.*$/$END1/g" -e "s/$END1//" $TMPDIR/tempfile1a
	# generate reading file
	mv $TMPDIR/tempfile1a $TMPDIR/${!arr[@]}
Code:
mv $TMPDIR/tempfile1a $TMPDIR/${!arr[ALIYAH]}
results in
Code:
./ReadingTest: line 77: et0306.htm#1-11: syntax error: invalid arithmetic operator (error token is ".htm#1-11")
I have looked at this http://stackoverflow.com/questions/3...-array-in-bash and countless other examples but have not been able to figure out how to incorporate such into my script.

Thanks for any guidance you are willing to provide.
 
Old 03-22-2015, 09:10 PM   #2
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
I have tried this from http://www.unix.com/shell-programmin...ll-script.html but am getting no joy. But it is also late and my brain is about done. /read "smoke comming out my ears"

Now that I look at this I don't see how it can work as it is using the character count of the element to check aginst the input value which are all the same being three letter months in the original.

Code:
get_array_index "ARRAY=${arr[*]}" "VALUE=$ALIYAH"

# my understanding of usage from
# ARRAY=(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
# VALUE='May'
# Usage: get_array_index "$ARRAY=your-array" "VALUE=$ALIYAH"
get_array_index() {
	for ((index=0; index<${#ARRAY[@]}; index++)); do 
		if [ "${ARRAY[$index]}" = "$VALUE" ]; then
			echo $index
			return
		fi
	done
	echo 'Not Found'
Code:
+ get_array_index 'ARRAY=Tzav et0306.htm#1-11 et0306.htm#12-et0307.htm#10 et0307.htm#11-38 et0308.htm#1-13 et0308.htm#14-21 et0308.htm#22-29 et0308.htm#30-36 et0308.htm#33-36 et1107.htm#21-et1108.htm#3Zet1109.htm#22-23 et1107.htm#21-et1108.htm#3Zet1109.htm#22-23' VALUE=et0308.htm#33-36
+ (( index=0 ))
+ (( index<7 ))
+ '[' 95 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 100 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 105 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 110 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 115 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 120 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 125 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ echo 'Not Found'
Not Found
 
Old 03-23-2015, 12:44 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Code:
mv $TMPDIR/tempfile1a $TMPDIR/${!arr[@]}

mv $TMPDIR/tempfile1a $TMPDIR/${!arr[ALIYAH]}
The above 2 lines are not the same. Also, when '!' is used at the start of the array and you are looking at all elements using '@', then you are returned ALL subscripts for the array.
When using an existing subscript, such as 1, the following will return nothing:
Code:
echo $TMPDIR/${!arr[1]}
If you need to know the subscripts then you will need to do it when calling the array for the for loop.
Code:
for ALIYAH in "${!arr[@]}";do
Then from here you would need an if inside the loop to check it is between 1 and 9.

Another alternative would be to create a counter which is incremented at the start of the loop and used as your subscript info.
 
1 members found this post helpful.
Old 03-23-2015, 02:00 PM   #4
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Thanks grail,

I went with the counter and have that issue fixed. But it seems that I have a variable that is not being passed in correctly to an if/then loop. The loop activates ok but the value of the tested variable is not being passed in so when the loop tries to De-increment it it actually sets the value to -1 and so the loop fails.

This is the code section:
Code:
START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"	
w3m -dump -T text/html $STORDIR/JPS/$SHIR1 | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' | awk '/$START1/,/$END1/' >  $TMPDIR/tempfile1a
  if (( $START1 > 1 )); then
    BLOCKHEAD=$(awk "/$START1/ {print FNR}" $TMPDIR/tempfile1a)
    echo "start $BLOCKHEAD"
    BLOCKHEAD=$((BLOCKHEAD - 1 ))
    echo "end $BLOCKHEAD"
    sed -i "1,$BLOCKHEAD d" $TMPDIR/tempfile1a 
  fi
If I understand what I have read correctly I need to use
Code:
export START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"
instead, to make $START1 available inside the if/then loop.

What I don't understand is why this has never been a problem for me before now. Or is there something else I am missing?

Thanks again.
 
Old 03-23-2015, 03:02 PM   #5
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
well no joy with that
 
Old 03-23-2015, 03:31 PM   #6
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
well maybe. If I can just get the text processing right.
 
Old 03-23-2015, 04:57 PM   #7
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
got it

thanks again
 
Old 03-24-2015, 02:37 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Don't forget to show your solutions as others may be interested to know how you solved it

As you are learning, please try to condense your awk scripts into a single one as there are very few to no reasons to need more than one.

Also, please try to be consistent with your coding:
Code:
if (( $START1 > 1 )); then

BLOCKHEAD=$((BLOCKHEAD - 1 ))
Either use the dollar symbol both times for the variables or neither time.
The last one can be improved too:
Code:
((BLOCKHEAD--))
 
1 members found this post helpful.
Old 03-24-2015, 04:51 PM   #9
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
So the counter works like

Code:
TEMPFILE=$TMPDIR/$$.tmp # First we set up a counter file outside the loop.
echo 0 > $TEMPFILE	# Then make it zero.

index=$(($(cat $TEMPFILE) + 1))	# Inside the loop we grab the vale and increment it and assign it to a variable.

cat $TMPDIR/tempfile1a > $TMPDIR/$PARSHA$index # Use the variable in the file name.

echo $index > $TEMPFILE  # And at the end of the loop write it back out to the counter file.

My variable processing issue may have been related to the text processing not being preformed at the correct time. So exporting the variable may not be needed, but I have not tested it as it works the way it is.

I have not had a chance to see if I could figure out hot to shorten an awk line such as this.

Code:
END8="$(echo "$ALIYAH" |awk -F Z '{print $1}'|awk -F - '{print $2}'|awk -F \# '{print $2}' )"
Which has to take something like et1107.htm#21-et1108.htm#3Zet1109.htm#22-23 and get just the "3" before the Z and put it into $END8. And there is no way to know if that will be a one or two digit number. Some of the others need to get the et1108.htm for the variable.

I do know that I did not have a lot of luck trying to shorten the sed lines with -e and keep the processing in the correct order. Such as these two lines.

Code:
sed -i "/$END9/q" $TMPDIR/tempfile3d
sed -i -e "s/$END9.*$/$END9/g" -e "s/$END9//" $TMPDIR/tempfile3d
The text needs 5 header and 6 footer lines stripped off first, before anything else can be done to it. If I tried to add in something like the above it would not process in the correct order and the text would come out wrong. :bummer:
 
Old 03-25-2015, 12:21 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
How well is the data formulated? Can you guarantee on a single Z in the line and will the number(s) prior to that always have a # before them?
If above is correct, no awk is needed:
Code:
END8=${ALIYAHZ*}
END8=${END8##*#}
Now your counter ... are we storing this in a file because we need to run the script at a later time and will need to know where we left off? This seems unlikely as the loop is
always going to complete.

As for your sed's, I would need to know what sort of data is in END9 and what it is you want from the file? Again I would see no reason to use multiple.
I would add that -i in the sed where you quit serves no purpose as the file is not being changed in anyway.
 
Old 03-25-2015, 05:23 AM   #11
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Thanks grail,

The file reference data is both consistent and not. What I mean is that there are 5 different types.

Type 1 and by far the largest is a file reference like et0112.htm#1-13 and always formated like that. Actually they are all like that basically. Let me break it down.

In this sample et0112.htm is the file name. The et will all ways be first followed normally by a 4 but sometimes 5 digit alphanumeric and then the htm. This becomes the variable $SHIR1=et0112.htm

In this sample the 1 following the # is a reference location in the file and the 13 following the - is a second reference in the file. Both are normally one or two digit numbers but may be as many as 3. This becomes $START1=1 and $END1=13

The second type looks like this et0112.htm#14-et0113.htm#4 where everything on one side of the - is one file/reference and the other side is another. Making $SHIR2 $START2 $SHIR3 $END3

The third type looks like et0119.htm#21-et0120.htm#-et0121.htm#4 which needs to end up as $SHIR4 $START4 $SHIR5 $SHIR6 $END6 In this case there are three different files referenced with the start reference in the first file, all of the second file, and the end reference in the third file.

Then there is a forth type which combines type two followed by type one and looks like et1027.htm#6-et1028.htm#13Zet1029.htm#22-23 as previously posted

The fifth type has only one case and the whole file is used.

The $SHIR variables always have a file name
The $START and $END variables always have a number (which is NOT a line number but may be in some cases)

The $ALIYAH variable is the counter for the loop. I could not figure out how to use it as the part of the file name, which is generated outside the loop, so the counter stored in a file is something I found on line that seams to work.

Thanks again
 
Old 03-25-2015, 10:02 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Maybe see if something like this helps:
Code:
in="$1"

cnt=${in//[^#]/}

case ${#cnt} in
  1) regex='(.+)#(.+)-(.+)';;
  2) regex='([^#]+)#(.+)-(.+)#(.+)';;
  3) if [[ "$in" =~ Z ]]
     then
       regex='([^#]+)#([0-9]+)-([^#]+)#(.+)Z([^#]+)#(.+)-(.+)'
     else
       regex='([^#]+)#([0-9]+)-([^#]+)#-(.+)#(.+)'
     fi;;
esac

[[ "$in" =~ $regex ]] && data=( "${BASH_REMATCH[@]:1}" )

echo "${data[@]}"
In this scenario, $1 is equal to one of your strings :- et0119.htm#21-et0120.htm#-et0121.htm#4
And the array, data, contains the parts you require.
 
Old 03-25-2015, 08:25 PM   #13
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Thanks grail,

I have tried to comment it as I understand it.
Quote:
# grail @ linuxquestions.com
#
# Maybe see if something like this helps:
# In this scenario,
#
# $1 is equal to one of your strings :- et0119.htm#21-et0120.htm#-et0121.htm#4
#
# And the array, data, contains the parts you require.

#in="$1"
in="et1027.htm#6-et1028.htm#13Zet1029.htm#22-23"
cnt=${in//[^#]/}

case ${#cnt} in
1) regex='(.+)#(.+)-(.+)';;
# et01045.htm#19-27
# ${cnt[1]}=et01045.htm
# ${cnt[2]}=19
# ${cnt[3]}=27
2) regex='([^#]+)#(.+)-(.+)#(.+)';;
# et09a03.htm#15-et09a04.htm#1
# ${cnt[1]}=et09a03.htm
# ${cnt[2]}=15
# ${cnt[3]}=et09a04.htm
# $[cnt[4]}=1
3) if [[ "$in" =~ Z ]]
then
regex='([^#]+)#([0-9]+)-([^#]+)#(.+)Z([^#]+)#(.+)-(.+)'
# et1027.htm#6-et1028.htm#13Zet1029.htm#22-23
# ${cnt[1]}=et1027.htm
# ${cnt[2]}=6
# ${cnt[3]}=et1028.htm
# ${cnt[4]}=13
# ${cnt[5]}=et1029.htm
# ${cnt[6]}=22
# ${cnt[7]}=23
else
regex='([^#]+)#([0-9]+)-([^#]+)#-(.+)#(.+)'
# et1312.htm#13-et1313.htm#-et1314.htm#10
# ${cnt[1]}=et1312.htm
# ${cnt[2]}=13
# ${cnt[3]}=et1313.htm
# ${cnt[4]}=et1314.htm
# ${cnt[5]}=10
fi;;
esac

[[ "$in" =~ $regex ]] && data=( "${BASH_REMATCH[@]:1}" )
# This runs the case? && this creates the array.

# Question; Is there a good reason that we couldn't just use ${cnt[@]}

echo "${data[@]}"
 
Old 03-25-2015, 09:54 PM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Quote:
[[ "$in" =~ $regex ]] && data=( "${BASH_REMATCH[@]:1}" )
# This runs the case? && this creates the array.
First part is wrong. The 'regex' set in the case is used here to check the string. Assuming it finds a successful match, it then assigns the data from the builtin array BASH_REMATCH to 'data' array.

Quote:
# Question; Is there a good reason that we couldn't just use ${cnt[@]}
'cnt' has only the # symbols in it, so not sure what you would want to do with that.


However, always funny how sometimes your head gets stuck on one thing as a solution but there is a way easier one you miss

You can replace all the above with the one line below:
Code:
data=( ${1//[-#Z]/ } )

echo "${data[@]}"
Remember you can change $1 for whatever variable you have that line stored in.
 
1 members found this post helpful.
Old 03-26-2015, 05:55 AM   #15
rbees
Member
 
Registered: Mar 2004
Location: northern michigan usa
Distribution: Debian Squeeze, Whezzy, Jessie
Posts: 921

Original Poster
Rep: Reputation: 46
Thanks grail

I don't understand how cnt=${in//[^#]/} puts only the # in. I thought ^# meant "not #" as it appears to in the case/esac. But looking at it now I guess it must mean "void every thing that is "NOT" a # ".

So turning it into a function I came up with this.

Code:
# Break the listing for a reading into parts
# Usage: "$in" 
# I.E. "$in=et1027.htm#6-et1028.htm#13Zet1029.htm#22-23"
# Output: ${data[1]}=et1027.htm
#	  ${data[2]}=6
#	  ${data[3]}=et1028.htm
#	  ${data[4]}=13
#	  ${data[5]}=et1029.htm
#	  ${data[6]}=22
#	  ${data[7]}=23
function Reading_Sections () {
in=$1
cnt=${in//[^#]/}

case ${#cnt} in
  1) regex='(.+)#(.+)-(.+)';;
  2) regex='([^#]+)#(.+)-(.+)#(.+)';;	
  3) if [[ "$in" =~ Z ]]
     then
       regex='([^#]+)#([0-9]+)-([^#]+)#(.+)Z([^#]+)#(.+)-(.+)'
     else
       regex='([^#]+)#([0-9]+)-([^#]+)#-(.+)#(.+)'
     fi;;
esac
data=( ${1//[-#Z]/ } )
echo "${data[@]}"
}

# pattern 1 ####################################
      if (( ${#ALIYAH} > 13 && ${#ALIYAH} < 18 )); then
	index=$(($(cat $TEMPFILE) + 1))
	echo "TYPE 1"
	Reading_Sections "$ALIYAH"
	w3m -dump -T text/html $STORDIR/JPS/${data[1]} | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' >  $TMPDIR/tempfile1a
	  if (( ${data[2]} > 1 )); then
	    BLOCKHEAD=$(awk "/${data[2]}/ {print FNR}" $TMPDIR/tempfile1a)
	    echo "start $BLOCKHEAD"
	    BLOCKHEAD=$((BLOCKHEAD - 1 ))
	    echo "end $BLOCKHEAD"
	    sed -i "1,$BLOCKHEAD d" $TMPDIR/tempfile1a
	    # strip leading words first line
	    FIRSTLINE=$(sed -n -e "1 s/^.*${data[2]}/${data[2]}/p" $TMPDIR/tempfile1a)
	    sed -i "1s/.*/$FIRSTLINE/" $TMPDIR/tempfile1a # repalce first line
	  fi

	# strip trailing words
	sed -i "/${data[3]}/q" $TMPDIR/tempfile1a
	sed -i -e "s/${data[3]}.*$/${data[3]}/g" -e "s/${data[3]}//" $TMPDIR/tempfile1a
	
	# generate reading file
	cat $TMPDIR/tempfile1a > $TMPDIR/$PARSHA$index
	# strip number and special characters
	sed  -i -e 's/{S}*//g' -e 's/{P}*//g' -e 's/[0-9]*//g' $TMPDIR/$PARSHA$index
	rm "$TMPDIR"/tempfile1a
      fi
Thanks again
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to find the index of bash array jags1984 Linux - Newbie 2 01-20-2014 12:06 AM
Decimal in array index micyew Programming 9 07-10-2012 09:28 AM
Bash Script Array index value Kedelfor Programming 10 04-29-2009 04:37 AM
creating array in c++ with string index mohtasham1983 Programming 3 03-11-2007 04:01 PM
index of an element in the array ? thelonius Programming 1 09-24-2005 12:41 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration