Bash script to read list from xml file for requests with wget
I have a list of Sentinel-2 files in XML format. I don't really know how to parse XML in code. I want to download the files. I have a wget statement that can do list item at time. I want to code a wget statement that can get each item in the list.
This is a small part of the list: Code:
<?xml version="1.0" encoding="UTF-8"?><metalink xmlns="urn:ietf:params:xml:ns:metalink"> Code:
wget --content-disposition --continue --user=... --password=... "https://scihub.copernicus.eu/dhus/odata/v1/Products('80ba113d-b5c5-4f4f-8e4e-3231bd2c2859')/\$value" |
You could use a specialist tool like xmlstarlet, although personally I'd use Perl which has module(s) specifically for doing that and you could incorporate the equiv of wget as well.
|
Yeah, ruby for me and python for others. You can bash it (pardon the pun), but the sed/awk/grep option might be encumbersome.
|
You need an XML parser to handle XML, but getting XmlStarlet to extract the URLs is not quite as straight forward as it could be...
Code:
$ xmlstarlet select -t -v '/_:metalink/_:file/_:url' -n input.xml See "xmlstarlet select --help" (or online user guide) for explanation of the syntax (which does not follow typical command-line conventions). The "_:" bit in the XPath expression is required due to the xmlns attribute on the root element. Without a xmlns it would just be "/metalink/file/url" (and trying to use _: there results in an "Undefined namespace prefix" error.) If there might be other "url" tags outside of "file" tags, you could use: Code:
$ xmlstarlet select -t -v '//_:url' -n input.xml The "-n" is needed to output a trailing newline at the end, which some command-line tools are picky about. |
All times are GMT -5. The time now is 10:05 AM. |