[SOLVED] use array index as file name

rbees · 03-22-2015, 07:35 PM

Ladies & Gents,

Thanks again for all the help and guidance given on this site.

I have looked and looked for a way to use the index of the array in the file name but every thing I have tried has come up

The iteration below does not even generate file names and is only a small sample.

All I want is the index number, a simple 1-9 to be the file name. I know that something like ${!arr[key]} is suppose to reference the key value and not the element value. I can get the element value easy enough but it does not make a very good file name. I.E. et1107.htm#21-et1108.htm#3Zet1109.htm#22-23 Where as $PARSHA/$arrayINDEX will make complete sense.

This code also returns in one place

Code:

./ReadingTest: line 190: $TMPDIR/${!arr[@]}: ambiguous redirect

Code:

READING=$(grep ^"$PARSHA" "$STORDIR"/Readings.csv)
    IFS=, read -a arr <<<"$READING"
    for ALIYAH in "${arr[@]:1:9}";do

# pattern 1 ####################################
      if (( ${#ALIYAH} > 13 && ${#ALIYAH} < 18 )); then
	echo "TYPE 1"
	SHIR1="$(echo "$ALIYAH" |awk -F \# '{print $1}')"
	START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"
	END1="$(echo "$ALIYAH" |awk -F - '{print $2}')"
	END1=$((END1 + 1))
	w3m -dump -T text/html $STORDIR/JPS/$SHIR1 | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' | awk '/$START1/,/$END1/' >  $TMPDIR/tempfile1a
	  if (( $START1 > 1 )); then
	    # strip leading words first line
	    FIRSTLINE=$(sed -n -e "1 s/^.*$START1/$START1/p" $TMPDIR/tempfile1a)
	    sed -i "1s/.*/$FIRSTLINE/" $TMPDIR/tempfile1a # repalce first line
	  fi
	# strip trailing words
	sed -i "/$END1/q" $TMPDIR/tempfile1a
	sed -e "s/$END1.*$/$END1/g" -e "s/$END1//" $TMPDIR/tempfile1a
	# generate reading file
	mv $TMPDIR/tempfile1a $TMPDIR/${!arr[@]}

Code:

mv $TMPDIR/tempfile1a $TMPDIR/${!arr[ALIYAH]}

results in

Code:

./ReadingTest: line 77: et0306.htm#1-11: syntax error: invalid arithmetic operator (error token is ".htm#1-11")

I have looked at this http://stackoverflow.com/questions/3...-array-in-bash and countless other examples but have not been able to figure out how to incorporate such into my script.

Thanks for any guidance you are willing to provide.

rbees · 03-22-2015, 09:10 PM

I have tried this from http://www.unix.com/shell-programmin...ll-script.html but am getting no joy. But it is also late and my brain is about done. /read "smoke comming out my ears"

Now that I look at this I don't see how it can work as it is using the character count of the element to check aginst the input value which are all the same being three letter months in the original.

Code:

get_array_index "ARRAY=${arr[*]}" "VALUE=$ALIYAH"

# my understanding of usage from
# ARRAY=(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
# VALUE='May'
# Usage: get_array_index "$ARRAY=your-array" "VALUE=$ALIYAH"
get_array_index() {
	for ((index=0; index<${#ARRAY[@]}; index++)); do 
		if [ "${ARRAY[$index]}" = "$VALUE" ]; then
			echo $index
			return
		fi
	done
	echo 'Not Found'

Code:

+ get_array_index 'ARRAY=Tzav et0306.htm#1-11 et0306.htm#12-et0307.htm#10 et0307.htm#11-38 et0308.htm#1-13 et0308.htm#14-21 et0308.htm#22-29 et0308.htm#30-36 et0308.htm#33-36 et1107.htm#21-et1108.htm#3Zet1109.htm#22-23 et1107.htm#21-et1108.htm#3Zet1109.htm#22-23' VALUE=et0308.htm#33-36
+ (( index=0 ))
+ (( index<7 ))
+ '[' 95 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 100 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 105 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 110 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 115 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 120 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ '[' 125 = '' ']'
+ (( index++ ))
+ (( index<7 ))
+ echo 'Not Found'
Not Found

grail · 03-23-2015, 12:44 AM

Code:

mv $TMPDIR/tempfile1a $TMPDIR/${!arr[@]}

mv $TMPDIR/tempfile1a $TMPDIR/${!arr[ALIYAH]}

The above 2 lines are not the same. Also, when '!' is used at the start of the array and you are looking at all elements using '@', then you are returned ALL subscripts for the array.
When using an existing subscript, such as 1, the following will return nothing:

Code:

echo $TMPDIR/${!arr[1]}

If you need to know the subscripts then you will need to do it when calling the array for the for loop.

Code:

for ALIYAH in "${!arr[@]}";do

Then from here you would need an if inside the loop to check it is between 1 and 9.

Another alternative would be to create a counter which is incremented at the start of the loop and used as your subscript info.

rbees · 03-23-2015, 02:00 PM

Thanks grail,

I went with the counter and have that issue fixed. But it seems that I have a variable that is not being passed in correctly to an if/then loop. The loop activates ok but the value of the tested variable is not being passed in so when the loop tries to De-increment it it actually sets the value to -1 and so the loop fails.

This is the code section:

Code:

START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"	
w3m -dump -T text/html $STORDIR/JPS/$SHIR1 | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' | awk '/$START1/,/$END1/' >  $TMPDIR/tempfile1a
  if (( $START1 > 1 )); then
    BLOCKHEAD=$(awk "/$START1/ {print FNR}" $TMPDIR/tempfile1a)
    echo "start $BLOCKHEAD"
    BLOCKHEAD=$((BLOCKHEAD - 1 ))
    echo "end $BLOCKHEAD"
    sed -i "1,$BLOCKHEAD d" $TMPDIR/tempfile1a 
  fi

If I understand what I have read correctly I need to use

Code:

export START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"

instead, to make $START1 available inside the if/then loop.

What I don't understand is why this has never been a problem for me before now. Or is there something else I am missing?

Thanks again.

rbees · 03-23-2015, 03:02 PM

well no joy with that

rbees · 03-23-2015, 03:31 PM

well maybe. If I can just get the text processing right.

rbees · 03-23-2015, 04:57 PM

got it

thanks again

grail · 03-24-2015, 02:37 AM

Don't forget to show your solutions as others may be interested to know how you solved it

As you are learning, please try to condense your awk scripts into a single one as there are very few to no reasons to need more than one.

Also, please try to be consistent with your coding:

Code:

if (( $START1 > 1 )); then

BLOCKHEAD=$((BLOCKHEAD - 1 ))

Either use the dollar symbol both times for the variables or neither time.
The last one can be improved too:

Code:

((BLOCKHEAD--))

rbees · 03-24-2015, 04:51 PM

So the counter works like

Code:

TEMPFILE=$TMPDIR/$$.tmp # First we set up a counter file outside the loop.
echo 0 > $TEMPFILE	# Then make it zero.

index=$(($(cat $TEMPFILE) + 1))	# Inside the loop we grab the vale and increment it and assign it to a variable.

cat $TMPDIR/tempfile1a > $TMPDIR/$PARSHA$index # Use the variable in the file name.

echo $index > $TEMPFILE  # And at the end of the loop write it back out to the counter file.

My variable processing issue may have been related to the text processing not being preformed at the correct time. So exporting the variable may not be needed, but I have not tested it as it works the way it is.

I have not had a chance to see if I could figure out hot to shorten an awk line such as this.

Code:

END8="$(echo "$ALIYAH" |awk -F Z '{print $1}'|awk -F - '{print $2}'|awk -F \# '{print $2}' )"

Which has to take something like et1107.htm#21-et1108.htm#3Zet1109.htm#22-23 and get just the "3" before the Z and put it into $END8. And there is no way to know if that will be a one or two digit number. Some of the others need to get the et1108.htm for the variable.

I do know that I did not have a lot of luck trying to shorten the sed lines with -e and keep the processing in the correct order. Such as these two lines.

Code:

sed -i "/$END9/q" $TMPDIR/tempfile3d
sed -i -e "s/$END9.*$/$END9/g" -e "s/$END9//" $TMPDIR/tempfile3d

The text needs 5 header and 6 footer lines stripped off first, before anything else can be done to it. If I tried to add in something like the above it would not process in the correct order and the text would come out wrong. :bummer:

grail · 03-25-2015, 12:21 AM

How well is the data formulated? Can you guarantee on a single Z in the line and will the number(s) prior to that always have a # before them?
If above is correct, no awk is needed:

Code:

END8=${ALIYAHZ*}
END8=${END8##*#}

Now your counter ... are we storing this in a file because we need to run the script at a later time and will need to know where we left off? This seems unlikely as the loop is
always going to complete.

As for your sed's, I would need to know what sort of data is in END9 and what it is you want from the file? Again I would see no reason to use multiple.
I would add that -i in the sed where you quit serves no purpose as the file is not being changed in anyway.

rbees · 03-25-2015, 05:23 AM

Thanks grail,

The file reference data is both consistent and not. What I mean is that there are 5 different types.

Type 1 and by far the largest is a file reference like et0112.htm#1-13 and always formated like that. Actually they are all like that basically. Let me break it down.

In this sample et0112.htm is the file name. The et will all ways be first followed normally by a 4 but sometimes 5 digit alphanumeric and then the htm. This becomes the variable $SHIR1=et0112.htm

In this sample the 1 following the # is a reference location in the file and the 13 following the - is a second reference in the file. Both are normally one or two digit numbers but may be as many as 3. This becomes $START1=1 and $END1=13

The second type looks like this et0112.htm#14-et0113.htm#4 where everything on one side of the - is one file/reference and the other side is another. Making $SHIR2 $START2 $SHIR3 $END3

The third type looks like et0119.htm#21-et0120.htm#-et0121.htm#4 which needs to end up as $SHIR4 $START4 $SHIR5 $SHIR6 $END6 In this case there are three different files referenced with the start reference in the first file, all of the second file, and the end reference in the third file.

Then there is a forth type which combines type two followed by type one and looks like et1027.htm#6-et1028.htm#13Zet1029.htm#22-23 as previously posted

The fifth type has only one case and the whole file is used.

The $SHIR variables always have a file name
The $START and $END variables always have a number (which is NOT a line number but may be in some cases)

The $ALIYAH variable is the counter for the loop. I could not figure out how to use it as the part of the file name, which is generated outside the loop, so the counter stored in a file is something I found on line that seams to work.

Thanks again

grail · 03-25-2015, 10:02 AM

Maybe see if something like this helps:

Code:

in="$1"

cnt=${in//[^#]/}

case ${#cnt} in
  1) regex='(.+)#(.+)-(.+)';;
  2) regex='([^#]+)#(.+)-(.+)#(.+)';;
  3) if [[ "$in" =~ Z ]]
     then
       regex='([^#]+)#([0-9]+)-([^#]+)#(.+)Z([^#]+)#(.+)-(.+)'
     else
       regex='([^#]+)#([0-9]+)-([^#]+)#-(.+)#(.+)'
     fi;;
esac

[[ "$in" =~ $regex ]] && data=( "${BASH_REMATCH[@]:1}" )

echo "${data[@]}"

In this scenario, $1 is equal to one of your strings :- et0119.htm#21-et0120.htm#-et0121.htm#4
And the array, data, contains the parts you require.

rbees · 03-25-2015, 08:25 PM

Thanks grail,

I have tried to comment it as I understand it.

Quote:

# grail @ linuxquestions.com
#
# Maybe see if something like this helps:
# In this scenario,
#
# $1 is equal to one of your strings :- et0119.htm#21-et0120.htm#-et0121.htm#4
#
# And the array, data, contains the parts you require.

#in="$1"
in="et1027.htm#6-et1028.htm#13Zet1029.htm#22-23"
cnt=${in//[^#]/}

case ${#cnt} in
1) regex='(.+)#(.+)-(.+)';;
# et01045.htm#19-27
# ${cnt[1]}=et01045.htm
# ${cnt[2]}=19
# ${cnt[3]}=27
2) regex='([^#]+)#(.+)-(.+)#(.+)';;
# et09a03.htm#15-et09a04.htm#1
# ${cnt[1]}=et09a03.htm
# ${cnt[2]}=15
# ${cnt[3]}=et09a04.htm
# $[cnt[4]}=1
3) if [[ "$in" =~ Z ]]
then
regex='([^#]+)#([0-9]+)-([^#]+)#(.+)Z([^#]+)#(.+)-(.+)'
# et1027.htm#6-et1028.htm#13Zet1029.htm#22-23
# ${cnt[1]}=et1027.htm
# ${cnt[2]}=6
# ${cnt[3]}=et1028.htm
# ${cnt[4]}=13
# ${cnt[5]}=et1029.htm
# ${cnt[6]}=22
# ${cnt[7]}=23
else
regex='([^#]+)#([0-9]+)-([^#]+)#-(.+)#(.+)'
# et1312.htm#13-et1313.htm#-et1314.htm#10
# ${cnt[1]}=et1312.htm
# ${cnt[2]}=13
# ${cnt[3]}=et1313.htm
# ${cnt[4]}=et1314.htm
# ${cnt[5]}=10
fi;;
esac

[[ "$in" =~ $regex ]] && data=( "${BASH_REMATCH[@]:1}" )
# This runs the case? && this creates the array.

# Question; Is there a good reason that we couldn't just use ${cnt[@]}

echo "${data[@]}"

grail · 03-25-2015, 09:54 PM

Quote:

[[ "$in" =~ $regex ]] && data=( "${BASH_REMATCH[@]:1}" )
# This runs the case? && this creates the array.

First part is wrong. The 'regex' set in the case is used here to check the string. Assuming it finds a successful match, it then assigns the data from the builtin array BASH_REMATCH to 'data' array.

Quote:

# Question; Is there a good reason that we couldn't just use ${cnt[@]}

'cnt' has only the # symbols in it, so not sure what you would want to do with that.

However, always funny how sometimes your head gets stuck on one thing as a solution but there is a way easier one you miss

You can replace all the above with the one line below:

Code:

data=( ${1//[-#Z]/ } )

echo "${data[@]}"

Remember you can change $1 for whatever variable you have that line stored in.

rbees · 03-26-2015, 05:55 AM

Thanks grail

I don't understand how cnt=${in//[^#]/} puts only the # in. I thought ^# meant "not #" as it appears to in the case/esac. But looking at it now I guess it must mean "void every thing that is "NOT" a # ".

So turning it into a function I came up with this.

Code:

# Break the listing for a reading into parts
# Usage: "$in" 
# I.E. "$in=et1027.htm#6-et1028.htm#13Zet1029.htm#22-23"
# Output: ${data[1]}=et1027.htm
#	  ${data[2]}=6
#	  ${data[3]}=et1028.htm
#	  ${data[4]}=13
#	  ${data[5]}=et1029.htm
#	  ${data[6]}=22
#	  ${data[7]}=23
function Reading_Sections () {
in=$1
cnt=${in//[^#]/}

case ${#cnt} in
  1) regex='(.+)#(.+)-(.+)';;
  2) regex='([^#]+)#(.+)-(.+)#(.+)';;	
  3) if [[ "$in" =~ Z ]]
     then
       regex='([^#]+)#([0-9]+)-([^#]+)#(.+)Z([^#]+)#(.+)-(.+)'
     else
       regex='([^#]+)#([0-9]+)-([^#]+)#-(.+)#(.+)'
     fi;;
esac
data=( ${1//[-#Z]/ } )
echo "${data[@]}"
}

# pattern 1 ####################################
      if (( ${#ALIYAH} > 13 && ${#ALIYAH} < 18 )); then
	index=$(($(cat $TEMPFILE) + 1))
	echo "TYPE 1"
	Reading_Sections "$ALIYAH"
	w3m -dump -T text/html $STORDIR/JPS/${data[1]} | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' >  $TMPDIR/tempfile1a
	  if (( ${data[2]} > 1 )); then
	    BLOCKHEAD=$(awk "/${data[2]}/ {print FNR}" $TMPDIR/tempfile1a)
	    echo "start $BLOCKHEAD"
	    BLOCKHEAD=$((BLOCKHEAD - 1 ))
	    echo "end $BLOCKHEAD"
	    sed -i "1,$BLOCKHEAD d" $TMPDIR/tempfile1a
	    # strip leading words first line
	    FIRSTLINE=$(sed -n -e "1 s/^.*${data[2]}/${data[2]}/p" $TMPDIR/tempfile1a)
	    sed -i "1s/.*/$FIRSTLINE/" $TMPDIR/tempfile1a # repalce first line
	  fi

	# strip trailing words
	sed -i "/${data[3]}/q" $TMPDIR/tempfile1a
	sed -i -e "s/${data[3]}.*$/${data[3]}/g" -e "s/${data[3]}//" $TMPDIR/tempfile1a
	
	# generate reading file
	cat $TMPDIR/tempfile1a > $TMPDIR/$PARSHA$index
	# strip number and special characters
	sed  -i -e 's/{S}*//g' -e 's/{P}*//g' -e 's/[0-9]*//g' $TMPDIR/$PARSHA$index
	rm "$TMPDIR"/tempfile1a
      fi

Thanks again