BASH: Noob -Script help

Skiddo · 06-09-2022, 07:52 AM

Hi All,

New to bash scripting.. like very new. I've really only written the most basic of things echo "hello world" basic.

I'm looking for some pointers / guidance (happy to research once i know where I'm looking).

Anyway, what i would like to do is to write a script that i can copy paste an email into and then have it parse the email for d/l links based on file type.

Then

Confirm those links are correct (yes / no)

Then

WGET the files, extract them and push them to location X

it sounds fairly simple but in terms of getting it to parse the email for the links.. i've not got the first idea and google is turning into a rabbit hole.

boughtonp · 06-09-2022, 09:13 AM

What format is your email in?

i.e. are you looking at RFC 5322 Internet Message Format (what mail clients tend to show if you select view source), or the resolved message body only; if the latter, are you dealing with plain text, HTML, or both?

Also, what is your definition of "correct" for this context?

Generally, Greg Wooledge's BashFAQ is a good place for seeing the optimal way to perform specific common tasks, but probably doesn't get you past the first step in this instance, and as you're very new you might want to try BashGuide first.

The Bash Reference Manual is worth keeping bookmarked too.

Skiddo · 06-10-2022, 02:34 AM

Email will be just plain text, Literally copy and paste the whole email.

I would normally have 10-15 image links included in the email, which at the moment I've been extracting by hand and feeding into a script to unpack..I'd rather just dump the email into the script, have it scan it for links including lets .iso endings and then come back
with a list and say.. are these correct (i.e have a pulled all links and excluded anything not needed).

I then say Y and it would start the WGET process.

Turbocapitalist · 06-10-2022, 02:52 AM

If you're parsing text, then the scripting language you need to turn to is most likely going to be Perl.

Skiddo · 06-10-2022, 04:35 AM

Quote:

Originally Posted by Turbocapitalist

If you're parsing text, then the scripting language you need to turn to is most likely going to be Perl.

Maybe in the future, but for now i really need to get solid with BASH Scripting.

evo2 · 06-10-2022, 04:45 AM

Hi,

not sure how much help you are looking for here. But maybe the following will help.

- Loop over the text broken up by whitespace.
- For each piece of text see if it starts with "http://" or "https://"
- If it does, wget it.

Try to do the above and let us know where you get stuck.

HTH,

Evo2.

teckk · 06-10-2022, 08:09 AM

Quote:

Email will be just plain text,

Example:

Code:

text="
Email will be just plain text, Literally copy and paste the whole email.
https://link1.com

I would normally have 10-15 image links included in the email, which at the 
https://link2.com
moment I've been extracting by hand and feeding into a script to unpack..I'd rather just 
dump the email into the script, https://link3.com have it scan it for links including 
lets .iso endings and then come back
with a list and say.. are these correct https://link4.com (i.e have a pulled all links 
and excluded anything not needed).
https://link5.com
https://link6.com

I then say Y and it would start the WGET process. http://link8.com
http://link7.com
http://link9.com
"

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" <<< "$text"

https://link1.com
https://link2.com
https://link3.com
https://link4.com
https://link5.com
https://link6.com
http://link8.com
http://link7.com
http://link9.com

Quote:

Confirm those links are correct (yes / no)

Example2:

Code:

urls=(
https://link1.com
https://link2.com
https://link3.com
https://link4.com
https://link5.com
https://link6.com
http://link8.com
http://link7.com
http://link9.com
)

agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0"

for u in "${urls[@]}"; do
    if curl -LIA "$agent" --retry 1 --max-time 1 --silent --fail "$u" -o /dev/null; then
        echo ""$u" is good"
    else
        echo ""$u" is bad"
    fi
done

https://link1.com is good: 
https://link2.com is bad
https://link3.com is good: 
https://link4.com is bad
https://link5.com is good: 
https://link6.com is bad
http://link8.com is good: 
http://link7.com is good: 
http://link9.com is good:

I didn't know there was a link1.com.

Also:

Code:

for u in "${urls[@]}"; do
    if wget -U "$agent" --spider --tries=1 --timeout=1 "$u" 2>/dev/null; then
        echo ""$u" is good"
    else
        echo ""$u" is bad"
    fi
done

boughtonp · 06-10-2022, 08:59 AM

Quote:

Originally Posted by evo2

not sure how much help you are looking for here. But maybe the following will help.

- Loop over the text broken up by whitespace.
- For each piece of text see if it starts with "http://" or "https://"
- If it does, wget it.

Try to do the above and let us know where you get stuck.

Yep, since there's no specific HTML or RFC5322 parsing to do, this is a reasonable approach that can be done using Bash IFS word splitting, looping and conditionals.

Adding a user confirmation might involve "storing urls in an array" in order to print them and prompt for Y/N before wget is called on it.

Of course, since wget accepts multiple URLs, one could instead simply do a grep for "https?://\S+" inside a Command Substitution - but that approach is less helpful with regards to learning Bash.

michaelk · 06-10-2022, 11:31 AM

Here are some more ideas. zenity is a way to create dialogs for command line programs. Easier to parse if you just copy/pasted the URLs versus the entire email.

Code:

#!/bin/bash
results=$(zenity --text-info --editable\
       --title="URLs" )
case $? in
    1)
        echo "Script Canceled"
        exit
	;;
    -1)
        echo "An unexpected error has occurred."
        exit
	;;
esac

# if one URL per line
for url in "$results"; do
  echo "$url"
done

zenity should be available or maybe yad which works the same.

Skiddo · 06-14-2022, 08:24 AM

Ok so some fupping about. I've come up with this.. But its failing at the last segment.

any suggestions on how to have this complete properly?

Code:

# grab image information from mail
imgs=$(awk -F': ' '
    $1 == "Download link" {link=$2}
    $1 == "Zip password" {pass=$2}
    $1 == "Zip SHA1" {sha1=$2; print link " " pass " " sha1}
    ' <<< "${text}")

# prompt for it
awk '{print $1}' <<<"${imgs}" 
read -p "Download these? (y/n)?" prompt
case "${prompt}" in
    y|Y ) echo "yes";;
    n|N ) exit 0;;
    * ) exit 1;;
esac

#mkdir -p /dir/dir1
#pushd /dir/dir1

while read link pass sha1
do
    pwd
    echo curl -sSL --fail --retry 3 -o "$(basename ${link})" "${link}"

    # check checksum
    echo "${sha1} $(basename ${link})" #| sha1sum -c -

    # download, unzip and export
    
  wget "${imgs}"
  
  unzip -P 12345 \*.zip
s3cmd put -P *.qcow2 s3://image-store/
rm *
  
done

boughtonp · 06-14-2022, 11:17 AM

Failing how?

ShellCheck is a useful tool; a good idea to run it through that and solve the issues highlighted before trying to debug further.

Skiddo · 06-15-2022, 06:34 AM

Quote:

Originally Posted by boughtonp

Failing how?

ShellCheck is a useful tool; a good idea to run it through that and solve the issues highlighted before trying to debug further.

i got it going.

i just need to get this to work now

Code:

val "$data"
    wget "$(basename ${link})" "${link}" | unzip -P 12345 \*.zip | s3cmd put -P *.qcow2 s3://image-bucket
rm *

Once the download completes, it *should* then pass the images to unzip and then push them to s3. I thought i could pipe them to the next command but nope.. It just downloads and then stops

pan64 · 06-15-2022, 06:58 AM

Code:

wget "$(basename ${link})" "${link}" && unzip -P 12345 \*.zip && s3cmd put -P *.qcow2 s3://image-bucket

probably

Skiddo · 06-15-2022, 09:06 AM

Quote:

Originally Posted by pan64

Code:

wget "$(basename ${link})" "${link}" && unzip -P 12345 \*.zip && s3cmd put -P *.qcow2 s3://image-bucket

probably

Nope.

This is the output

Code:

root@host:~# bash scripty.sh 
https://random.image.download.qcow2.zip
Download these? (y/n)?y
yes
~/export ~
--2022-06-15 13:50:36--  https://random.image.download.qcow2.zip
Resolving random.image.download.qcow2.zip (random.image.download.qcow2.zip)... failed: Name or service not known.
wget: unable to resolve host address ‘random.image.download.qcow2.zip’
--2022-06-15 13:50:36--  https://random.image.download.qcow2.zip
Resolving random.image.download.com (random.image.download.com)... 174.111.123.123
Connecting to random.image.download.com (random.image.download.com)|174.111.123.123|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8026236680 (7.5G) 
Saving to: ‘random-image-download.qcow2.zip’

random.image.download.qcow2.zip 100%[===========================================>]   7.47G  17.7MB/s    in 7m 20s  

2022-06-15 13:57:56 (17.4 MB/s) - ‘random-image-download.qcow2.zip’ saved [8026236680/8026236680]

FINISHED --2022-06-15 13:57:56--
Total wall clock time: 7m 20s
Downloaded: 1 files, 7.5G in 7m 20s (17.4 MB/s)

chrism01 · 06-16-2022, 01:13 AM

Personally I'd put 'set -xv' just before that line and also simplify it by putting each cmd on a separate line.