ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Thanks again for all the help and guidance given on this site.
I have looked and looked for a way to use the index of the array in the file name but every thing I have tried has come up The iteration below does not even generate file names and is only a small sample.
All I want is the index number, a simple 1-9 to be the file name. I know that something like ${!arr[key]} is suppose to reference the key value and not the element value. I can get the element value easy enough but it does not make a very good file name. I.E. et1107.htm#21-et1108.htm#3Zet1109.htm#22-23 Where as $PARSHA/$arrayINDEX will make complete sense.
This code also returns in one place
Code:
./ReadingTest: line 190: $TMPDIR/${!arr[@]}: ambiguous redirect
Code:
READING=$(grep ^"$PARSHA" "$STORDIR"/Readings.csv)
IFS=, read -a arr <<<"$READING"
for ALIYAH in "${arr[@]:1:9}";do
# pattern 1 ####################################
if (( ${#ALIYAH} > 13 && ${#ALIYAH} < 18 )); then
echo "TYPE 1"
SHIR1="$(echo "$ALIYAH" |awk -F \# '{print $1}')"
START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"
END1="$(echo "$ALIYAH" |awk -F - '{print $2}')"
END1=$((END1 + 1))
w3m -dump -T text/html $STORDIR/JPS/$SHIR1 | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' | awk '/$START1/,/$END1/' > $TMPDIR/tempfile1a
if (( $START1 > 1 )); then
# strip leading words first line
FIRSTLINE=$(sed -n -e "1 s/^.*$START1/$START1/p" $TMPDIR/tempfile1a)
sed -i "1s/.*/$FIRSTLINE/" $TMPDIR/tempfile1a # repalce first line
fi
# strip trailing words
sed -i "/$END1/q" $TMPDIR/tempfile1a
sed -e "s/$END1.*$/$END1/g" -e "s/$END1//" $TMPDIR/tempfile1a
# generate reading file
mv $TMPDIR/tempfile1a $TMPDIR/${!arr[@]}
Code:
mv $TMPDIR/tempfile1a $TMPDIR/${!arr[ALIYAH]}
results in
Code:
./ReadingTest: line 77: et0306.htm#1-11: syntax error: invalid arithmetic operator (error token is ".htm#1-11")
Now that I look at this I don't see how it can work as it is using the character count of the element to check aginst the input value which are all the same being three letter months in the original.
Code:
get_array_index "ARRAY=${arr[*]}" "VALUE=$ALIYAH"
# my understanding of usage from
# ARRAY=(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
# VALUE='May'
# Usage: get_array_index "$ARRAY=your-array" "VALUE=$ALIYAH"
get_array_index() {
for ((index=0; index<${#ARRAY[@]}; index++)); do
if [ "${ARRAY[$index]}" = "$VALUE" ]; then
echo $index
return
fi
done
echo 'Not Found'
The above 2 lines are not the same. Also, when '!' is used at the start of the array and you are looking at all elements using '@', then you are returned ALL subscripts for the array.
When using an existing subscript, such as 1, the following will return nothing:
Code:
echo $TMPDIR/${!arr[1]}
If you need to know the subscripts then you will need to do it when calling the array for the for loop.
Code:
for ALIYAH in "${!arr[@]}";do
Then from here you would need an if inside the loop to check it is between 1 and 9.
Another alternative would be to create a counter which is incremented at the start of the loop and used as your subscript info.
I went with the counter and have that issue fixed. But it seems that I have a variable that is not being passed in correctly to an if/then loop. The loop activates ok but the value of the tested variable is not being passed in so when the loop tries to De-increment it it actually sets the value to -1 and so the loop fails.
This is the code section:
Code:
START1="$(echo "$ALIYAH" |awk -F \# '{print $2}'|awk -F - '{print $1}')"
w3m -dump -T text/html $STORDIR/JPS/$SHIR1 | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' | awk '/$START1/,/$END1/' > $TMPDIR/tempfile1a
if (( $START1 > 1 )); then
BLOCKHEAD=$(awk "/$START1/ {print FNR}" $TMPDIR/tempfile1a)
echo "start $BLOCKHEAD"
BLOCKHEAD=$((BLOCKHEAD - 1 ))
echo "end $BLOCKHEAD"
sed -i "1,$BLOCKHEAD d" $TMPDIR/tempfile1a
fi
If I understand what I have read correctly I need to use
TEMPFILE=$TMPDIR/$$.tmp # First we set up a counter file outside the loop.
echo 0 > $TEMPFILE # Then make it zero.
index=$(($(cat $TEMPFILE) + 1)) # Inside the loop we grab the vale and increment it and assign it to a variable.
cat $TMPDIR/tempfile1a > $TMPDIR/$PARSHA$index # Use the variable in the file name.
echo $index > $TEMPFILE # And at the end of the loop write it back out to the counter file.
My variable processing issue may have been related to the text processing not being preformed at the correct time. So exporting the variable may not be needed, but I have not tested it as it works the way it is.
I have not had a chance to see if I could figure out hot to shorten an awk line such as this.
Which has to take something like et1107.htm#21-et1108.htm#3Zet1109.htm#22-23 and get just the "3" before the Z and put it into $END8. And there is no way to know if that will be a one or two digit number. Some of the others need to get the et1108.htm for the variable.
I do know that I did not have a lot of luck trying to shorten the sed lines with -e and keep the processing in the correct order. Such as these two lines.
Code:
sed -i "/$END9/q" $TMPDIR/tempfile3d
sed -i -e "s/$END9.*$/$END9/g" -e "s/$END9//" $TMPDIR/tempfile3d
The text needs 5 header and 6 footer lines stripped off first, before anything else can be done to it. If I tried to add in something like the above it would not process in the correct order and the text would come out wrong. :bummer:
How well is the data formulated? Can you guarantee on a single Z in the line and will the number(s) prior to that always have a # before them?
If above is correct, no awk is needed:
Code:
END8=${ALIYAHZ*}
END8=${END8##*#}
Now your counter ... are we storing this in a file because we need to run the script at a later time and will need to know where we left off? This seems unlikely as the loop is
always going to complete.
As for your sed's, I would need to know what sort of data is in END9 and what it is you want from the file? Again I would see no reason to use multiple.
I would add that -i in the sed where you quit serves no purpose as the file is not being changed in anyway.
The file reference data is both consistent and not. What I mean is that there are 5 different types.
Type 1 and by far the largest is a file reference like et0112.htm#1-13 and always formated like that. Actually they are all like that basically. Let me break it down.
In this sample et0112.htm is the file name. The et will all ways be first followed normally by a 4 but sometimes 5 digit alphanumeric and then the htm. This becomes the variable $SHIR1=et0112.htm
In this sample the 1 following the # is a reference location in the file and the 13 following the - is a second reference in the file. Both are normally one or two digit numbers but may be as many as 3. This becomes $START1=1 and $END1=13
The second type looks like this et0112.htm#14-et0113.htm#4 where everything on one side of the - is one file/reference and the other side is another. Making $SHIR2 $START2 $SHIR3 $END3
The third type looks like et0119.htm#21-et0120.htm#-et0121.htm#4 which needs to end up as $SHIR4 $START4 $SHIR5 $SHIR6 $END6 In this case there are three different files referenced with the start reference in the first file, all of the second file, and the end reference in the third file.
Then there is a forth type which combines type two followed by type one and looks like et1027.htm#6-et1028.htm#13Zet1029.htm#22-23 as previously posted
The fifth type has only one case and the whole file is used.
The $SHIR variables always have a file name
The $START and $END variables always have a number (which is NOT a line number but may be in some cases)
The $ALIYAH variable is the counter for the loop. I could not figure out how to use it as the part of the file name, which is generated outside the loop, so the counter stored in a file is something I found on line that seams to work.
# grail @ linuxquestions.com
#
# Maybe see if something like this helps:
# In this scenario,
#
# $1 is equal to one of your strings :- et0119.htm#21-et0120.htm#-et0121.htm#4
#
# And the array, data, contains the parts you require.
[[ "$in" =~ $regex ]] && data=( "${BASH_REMATCH[@]:1}" )
# This runs the case? && this creates the array.
First part is wrong. The 'regex' set in the case is used here to check the string. Assuming it finds a successful match, it then assigns the data from the builtin array BASH_REMATCH to 'data' array.
Quote:
# Question; Is there a good reason that we couldn't just use ${cnt[@]}
'cnt' has only the # symbols in it, so not sure what you would want to do with that.
However, always funny how sometimes your head gets stuck on one thing as a solution but there is a way easier one you miss
You can replace all the above with the one line below:
Code:
data=( ${1//[-#Z]/ } )
echo "${data[@]}"
Remember you can change $1 for whatever variable you have that line stored in.
I don't understand how cnt=${in//[^#]/} puts only the # in. I thought ^# meant "not #" as it appears to in the case/esac. But looking at it now I guess it must mean "void every thing that is "NOT" a # ".
So turning it into a function I came up with this.
Code:
# Break the listing for a reading into parts
# Usage: "$in"
# I.E. "$in=et1027.htm#6-et1028.htm#13Zet1029.htm#22-23"
# Output: ${data[1]}=et1027.htm
# ${data[2]}=6
# ${data[3]}=et1028.htm
# ${data[4]}=13
# ${data[5]}=et1029.htm
# ${data[6]}=22
# ${data[7]}=23
function Reading_Sections () {
in=$1
cnt=${in//[^#]/}
case ${#cnt} in
1) regex='(.+)#(.+)-(.+)';;
2) regex='([^#]+)#(.+)-(.+)#(.+)';;
3) if [[ "$in" =~ Z ]]
then
regex='([^#]+)#([0-9]+)-([^#]+)#(.+)Z([^#]+)#(.+)-(.+)'
else
regex='([^#]+)#([0-9]+)-([^#]+)#-(.+)#(.+)'
fi;;
esac
data=( ${1//[-#Z]/ } )
echo "${data[@]}"
}
# pattern 1 ####################################
if (( ${#ALIYAH} > 13 && ${#ALIYAH} < 18 )); then
index=$(($(cat $TEMPFILE) + 1))
echo "TYPE 1"
Reading_Sections "$ALIYAH"
w3m -dump -T text/html $STORDIR/JPS/${data[1]} | sed '1,5d' | sed -e :a -e '$d;N;2,6ba' -e 'P;D' > $TMPDIR/tempfile1a
if (( ${data[2]} > 1 )); then
BLOCKHEAD=$(awk "/${data[2]}/ {print FNR}" $TMPDIR/tempfile1a)
echo "start $BLOCKHEAD"
BLOCKHEAD=$((BLOCKHEAD - 1 ))
echo "end $BLOCKHEAD"
sed -i "1,$BLOCKHEAD d" $TMPDIR/tempfile1a
# strip leading words first line
FIRSTLINE=$(sed -n -e "1 s/^.*${data[2]}/${data[2]}/p" $TMPDIR/tempfile1a)
sed -i "1s/.*/$FIRSTLINE/" $TMPDIR/tempfile1a # repalce first line
fi
# strip trailing words
sed -i "/${data[3]}/q" $TMPDIR/tempfile1a
sed -i -e "s/${data[3]}.*$/${data[3]}/g" -e "s/${data[3]}//" $TMPDIR/tempfile1a
# generate reading file
cat $TMPDIR/tempfile1a > $TMPDIR/$PARSHA$index
# strip number and special characters
sed -i -e 's/{S}*//g' -e 's/{P}*//g' -e 's/[0-9]*//g' $TMPDIR/$PARSHA$index
rm "$TMPDIR"/tempfile1a
fi
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.