LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-12-2022, 07:42 PM   #1
Tsuga
Member
 
Registered: Jul 2020
Distribution: Slackware64 15.0
Posts: 63

Rep: Reputation: Disabled
Bash script to read list from xml file for requests with wget


I have a list of Sentinel-2 files in XML format. I don't really know how to parse XML in code. I want to download the files. I have a wget statement that can do list item at time. I want to code a wget statement that can get each item in the list.

This is a small part of the list:
Code:
<?xml version="1.0" encoding="UTF-8"?><metalink xmlns="urn:ietf:params:xml:ns:metalink">
<file name="S2A_MSIL1C_20151116T161042_N0204_R097_T18TTM_20151116T161042"><hash type="MD5">7c3d80be6658e02f408329cf9f194bce</hash><size>814488077</size><url>https://scihub.copernicus.eu/dhus/odata/v1/Products('80ba113d-b5c5-4f4f-8e4e-3231bd2c2859')/$value</url></file><file name="S2A_MSIL1C_20151116T161042_N0204_R097_T17TPG_20151116T161042"><hash type="MD5">b4bbfe460677864d7e31fa6833981df1</hash><size>653453581</size><url>https://scihub.copernicus.eu/dhus/odata/v1/Products('d0d43133-8089-4ec9-ab35-37d26323af63')/$value</url></file><file name="S2A_MSIL1C_20151116T161042_N0204_R097_T17TQJ_20151116T161042"><hash type="MD5">461F32396E40CD40D431A3BCB0AEB587</hash><size>680393012</size><url>https://scihub.copernicus.eu/dhus/odata/v1/Products('cefbc29e-f567-4410-b0b3-860617cf6a19')/$value</url></file><file name="S2A_MSIL1C_20151116T161042_N0204_R097_T18TUP_20151116T161042"><hash type="MD5">C46D1D72A3D2C45B2D5C86C28AA1B2FE</hash><size>668335329</size><url>https://scihub.copernicus.eu/dhus/odata/v1/Products('63e2415f-249c-4d68-a908-d26efc34bf82')/$value</url></file><file name="S2A_MSIL1C_20151116T161042_N0204_R097_T17TPJ_20151116T161042"><hash type="MD5">64540FA552D95E3C61B9AC8040469DB2</hash><size>273103410</size><url>https://scihub.copernicus.eu/dhus/odata/v1/Products('c7de7226-5769-451d-9348-b8bf85e7b93f')/$value</url></file><file name="S2A_MSIL1C_20151116T161042_N0204_R097_T18TUN_20151116T161042"><hash type="MD5">17C2E9C89EC1DEDBE3D04BB528184028</hash><size>822199792</size><url>https://scihub.copernicus.eu/dhus/odata/v1/Products('13168064-4914-4320-aecc-2482b7f9d005')/$value</url></file>
This is the wget statement. It worked for one file and I want a loop or some other way to feed the whole list to it and make it run over each item.
Code:
wget --content-disposition --continue --user=... --password=... "https://scihub.copernicus.eu/dhus/odata/v1/Products('80ba113d-b5c5-4f4f-8e4e-3231bd2c2859')/\$value"
I'm using Slackware64 14.2, but I don't think that matter so much, as long as the code will work in linux.
 
Old 05-12-2022, 11:39 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,364

Rep: Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752Reputation: 2752
You could use a specialist tool like xmlstarlet, although personally I'd use Perl which has module(s) specifically for doing that and you could incorporate the equiv of wget as well.
 
1 members found this post helpful.
Old 05-13-2022, 12:07 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Yeah, ruby for me and python for others. You can bash it (pardon the pun), but the sed/awk/grep option might be encumbersome.
 
1 members found this post helpful.
Old 05-13-2022, 07:58 AM   #4
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555

You need an XML parser to handle XML, but getting XmlStarlet to extract the URLs is not quite as straight forward as it could be...

Code:
$ xmlstarlet select -t -v '/_:metalink/_:file/_:url' -n input.xml
https://scihub.copernicus.eu/dhus/odata/v1/Products('80ba113d-b5c5-4f4f-8e4e-3231bd2c2859')/$value
https://scihub.copernicus.eu/dhus/odata/v1/Products('d0d43133-8089-4ec9-ab35-37d26323af63')/$value
https://scihub.copernicus.eu/dhus/odata/v1/Products('cefbc29e-f567-4410-b0b3-860617cf6a19')/$value
https://scihub.copernicus.eu/dhus/odata/v1/Products('63e2415f-249c-4d68-a908-d26efc34bf82')/$value
https://scihub.copernicus.eu/dhus/odata/v1/Products('c7de7226-5769-451d-9348-b8bf85e7b93f')/$value
https://scihub.copernicus.eu/dhus/odata/v1/Products('13168064-4914-4320-aecc-2482b7f9d005')/$value
Where input.xml is a file containing your code, but with the missing "</metalink>" appended. (You could instead from stdin.)

See "xmlstarlet select --help" (or online user guide) for explanation of the syntax (which does not follow typical command-line conventions).

The "_:" bit in the XPath expression is required due to the xmlns attribute on the root element.

Without a xmlns it would just be "/metalink/file/url" (and trying to use _: there results in an "Undefined namespace prefix" error.)

If there might be other "url" tags outside of "file" tags, you could use:
Code:
$ xmlstarlet select -t -v '//_:url' -n input.xml
The output of both is a newline-delimited list, which can be looped in the usual way and passed to wget/curl/whatever.

The "-n" is needed to output a trailing newline at the end, which some command-line tools are picky about.

 
2 members found this post helpful.
  


Reply

Tags
bash, wget



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Apache 2.4 requests to non-SSL site with "Upgrade-Insecure-Requests: 1" and no trailing / get redirected to default site owendelong Linux - Server 2 06-22-2021 02:08 PM
libvirt: post virt-clone, unchanged values in resulting XML file from original XML file CptSupermrkt Linux - Virtualization and Cloud 1 04-14-2016 08:20 AM
Any Way to Read a SINGLE XML VALUE from a big XML File in Linux? Or...? gmark Programming 3 01-17-2016 10:51 AM
[SOLVED] Loop through list of URLs in txt file, parse out parameters, pass to wget in bash. dchol Linux - Newbie 16 07-27-2011 02:19 PM
Looking for a Bash / Perl script to read the New RSS feeds (XML url) frenchn00b Programming 1 02-13-2011 08:06 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:42 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration