LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-20-2022, 12:26 PM   #1
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Rep: Reputation: Disabled
Simple REGEX to return price and date from web page for kmymoney online price updater


Hello,

I use kmymoney to manage my finances and it uses regexes to retrieve investment prices/dates for its online price updater. Unfortunately, most (if not all) of the online sources have proven extremely unreliable and I am fed up with having constant errors so I have decided to use my actual investment's firm website and regexes to retrieve the prices directly.

Unfortunately, I have never worked with regexes before. Kmymoney seems to "download" a copy of the web page source code, then using regex, return price/date. For my actual case, the relevant section of the web page's source looks like (heavily truncated):

Code:
                        </div>
                                              </div>

                    </td>

                    <td class="unwrappable text-right">2022-02-18</td>
                    <td class="text-right">21.8080</td>
                    <td class="text-right">-0.2535</td>
                    <td class="text-right">-0.5045</td>
                                        <td class="text-right">
                        
                            <a  target="_blank"
                               href="/en/ajax/fund-fact.html?series_id=261">
                                (PDF <span >127K)</span>
                            </a>
                                            </td>
                                        <td class="text-right">

I need a regex to return the date (2022-02-18) between the
Code:
<td class="unwrappable text-right">
and
Code:
</td>
tags.

The best I could come up with is

Code:
<td class="unwrappable text-right">(\d{4}([.\-/])\d{2}([.\-/])\d{2})</td>
which unfortunately returns the entire line where the date is located. That wont work. I need only YYYY-MM-DD.

The same issue happens for the price. I must be close but I couldnt get it to work.

Any regex gurus that can help?

Thanks!

Last edited by lpallard; 02-20-2022 at 12:28 PM.
 
Old 02-20-2022, 01:15 PM   #2
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,678

Rep: Reputation: Disabled
Quote:
Originally Posted by lpallard View Post
Kmymoney seems to "download" a copy of the web page source code, then using regex, return price/date.
I doubt this is what KMyMoney actually does. It fetches stock quotes from finance.yahoo.com, and the latter provides data as CSV.

Check if your investment firm provides those data as CSV as well. Or the site has a JSON API, or whatever. Other than that, use an HTML parser. E.g. HTML-XML-utils
Code:
hxselect -cs\\n td.unwrappable.text-right
or pup
Code:
pup 'td.unwrappable.text-right text{}'
Many of XPath tools will do as well. You may start with something like
Code:
xidel -se //td
or
Code:
xmlstarlet sel -t -v //td
and work your way from there.

Last edited by shruggy; 02-20-2022 at 02:12 PM.
 
1 members found this post helpful.
Old 02-20-2022, 02:18 PM   #3
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Original Poster
Rep: Reputation: Disabled
@shruggy: what you're proposing is very interesting.

However:

1. Most of these investment firms (I am doing business with 6 of them) have NO api of any sort, nor do they offer what I call "raw data" (like a simple web page with raw data). They do provide a page with detailled ticker data (the investment characteristics, price, fluctuation, all other stuff) and some page with tables but the page alone is several MB's because it has LOTS of scripting and useless stuff......

2. Some offer downloadable data (like an excel file, no CSV) but their excel spreadsheets are organized with the name of the investments, not the ticker. Moreover the investments are US/CAD, etc..... A real mess.

If I was a conspiracy theorist, I'd say they are making it cumbersome and difficult to automate data retrieval from their site...

I also looked at the web pages of some investments from the SAME firm, and the web URL's are different!!! If the URL's were identical with the exception of a unique number or ticker, that would make things easier....

I am thinking to try to make a script that would be called once per day via cron, make a local copy of all the pages for all the investments, use regexes or other utilities (such as those you suggested), then extract the data and dump it on a small html file that kmymoney could connect to and grab the numbers.

However, if that works, how long will it work? Until these firms change something on their sites, which happens constantly.

I am open to any suggestions at this point.
 
Old 02-20-2022, 02:59 PM   #4
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,678

Rep: Reputation: Disabled
Quote:
Originally Posted by lpallard View Post
2. Some offer downloadable data (like an excel file, no CSV)
Of course, there are ways to convert from Excel to CSV. Gnumeric comes with ssconvert and ssgrep. catdoc has xls2csv for the old Excel format. For the new format (.xlsx), there are XLSX I/O and xlsx2csv. LibreOffice is also an option (it can be started with --headless on the command line, or use a wrapper script like unoconv).

But given all the uncertainties, I would just download quotes from finance.yahoo.com as everybody else does.
 
Old 02-20-2022, 03:49 PM   #5
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Original Poster
Rep: Reputation: Disabled
Hey

I tried to work with the excel files but believe it or not, 2 of my investments are NOT listed in this file. I've sent an email to ask to the investment firm, but I agree with you, its already complicated enough, I'd rather work with web pages the way KMM was intended to do.

I was using Yahoo a while ago but they did some changes and some investments disappeared. The Globe and Mail, on the other hand, seems to carry all of them. I used to work with this one but they changed the page format (source code) and the regex that came with KMM stopped working hence why I switched to yahoo...

Now I just need to find a way to get the prices/dates from Globe and Mail and all will work like before.

The page source shows this:

Code:
aria-describedby="tradeTime-legend-caption"> <barchart-field binding="false" symbol="INV6565.CF" type="time" name="tradeTime" value="02/18/22"></barchart-field> </span> </span> </div> </div> </div> </div> <div class="col-xs-12 col-sm-12 col-md-3 col-lg-5"> <div class="bc-quick-action-tools"> <a is="barchart-alert" class="" quote='{"symbol":"INV6565.CF","symbolName":"Investment ABC Global Stocks","exchange":"CADFUNDS","lastPrice":"26.6252","priceChange":"-0.1220","percentChange":"-0.64%",
I need to extract 26.6252 and 02/18/22 from that blob...

Do you have an idea how? I must have tried 100 ways to get the data each time always working incorrectly (returning everything after the number, or before it, only the integers (19 and not 19.0026), or nothing... If you are good with regex could you point how ?

I need to study these regex....
 
Old 02-20-2022, 05:47 PM   #6
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,678

Rep: Reputation: Disabled
From what I can see, the Glob and Mail uses Barchart and Polygon.io. The latter seems to offer a free basic API for personal use.

I would scrape the page in question with xidel and jq like this
Code:
#!/bin/sh
url=https://www.theglobeandmail.com/investing/markets/stocks/GOOG
xidel "$url" -se '//a[@is="barchart-alert"]/@quote'|
  jq -r .lastPrice,.tradeTime
pup cannot read directly from a URL, so you'll have to pipe it from curl:
Code:
#!/bin/sh
url=https://www.theglobeandmail.com/investing/markets/stocks/GOOG
curl -s "$url"|
  pup -p 'a[is="barchart-alert"] attr{quote}'|
  jq -r .lastPrice,.tradeTime

Last edited by shruggy; 02-20-2022 at 06:08 PM.
 
Old 02-21-2022, 03:56 PM   #7
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,249

Rep: Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323
Don't try to write your own regex for this!

For plain HTML (as in posts #1 and #5), use this, as Shruggy says:

https://github.com/ericchiang/pup

If you need to execute scripts on the page in order to render it and get the data you need, use this:

https://phantomjs.org/

Last edited by dugan; 02-21-2022 at 04:03 PM.
 
Old 02-22-2022, 12:01 PM   #8
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by dugan View Post
Nice read, esp. the screenshot of a coder losing their mind
 
Old 03-08-2022, 10:33 AM   #9
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Original Poster
Rep: Reputation: Disabled
Thanks guys for saving my mental health.... That coder who lost his mind had be laughing!!!!

For the resolution of this topic, what I ended up doing is writing a very simple bash script using PUP and JQ as the main tools to extract prices and dates, then passing the output to Kmymoney with the precious help of the main dev of this project.

Shruggy, your proposed script helped a LOT!

Code:
#!/bin/sh
url=https://www.theglobeandmail.com/investing/markets/stocks/GOOG
curl -s "$url"|
  pup -p 'a[is="barchart-alert"] attr{quote}'|
  jq -r .lastPrice,.tradeTime
Thanks!
 
1 members found this post helpful.
Old 03-13-2022, 01:27 PM   #10
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Original Poster
Rep: Reputation: Disabled
Hello,

I'm trying to do something simple (?) with pup to automatically retrieve a specific value from discogs. The median price is what I'm after. They have an API which would be the best way to do this, but unfortunately (AFAIK), the stats (lowest, median and max prices, last sold, etc) are not available thru the API....

Examining the webpage with Chrome's html inspector, I see 2 sections where the median price is provided. Example here: https://www.discogs.com/release/2193...ustice-For-All

Section 1:

Code:
<section id="release-stats" class="section_9nUx6 open_BZ6Zt mobile_SYavk">
   <header class="header_W2hzl" role="button">
      <h3>Statistics</h3>
   </header>
   <div class="content_1TFzi">
      <div class="items_3gMeU">
         <ul>
            <li>
               <h4>
                  Have<!-- -->:
               </h4>
               <a href="/release/stats/2193760" hreflang="en" class="link_1ctor">815</a>
            </li>
            <li>
               <h4>
                  Want<!-- -->:
               </h4>
               <a href="/release/stats/2193760" hreflang="en" class="link_1ctor">608</a>
            </li>
            <li>
               <h4>
                  Avg Rating<!-- -->:
               </h4>
               <span>
                  4.68<!-- --> / 5
               </span>
            </li>
            <li>
               <h4>
                  Ratings<!-- -->:
               </h4>
               <a href="/release/stats/2193760" hreflang="en" class="link_1ctor">98</a>
            </li>
         </ul>
         <ul>
            <li>
               <h4>
                  Last Sold<!-- -->:
               </h4>
               <a href="/sell/history/2193760" hreflang="en" class="link_1ctor"><time datetime="2022-03-04">4 Mar 2022</time></a>
            </li>
            <li>
               <h4>
                  Lowest<!-- -->:
               </h4>
               <span>CA$114.20</span>
            </li>
            <li>
               <h4>
                  Median<!-- -->:
               </h4>
               <span>CA$156.69</span>
            </li>
            <li>
               <h4>
                  Highest<!-- -->:
               </h4>
               <span>CA$223.42</span>
            </li>
         </ul>
      </div>
   </div>
</section>
Section 2:

Code:
<script id="dsdata" type="application/json">{"config":{"BUILD_ID":"c26355d1-02616cba","NODE_ENV":"production","SEARCH_URL":"","SENTRY_CLIENT_DSN":"https://31684db80f89494bbbc2a5e385f387b0@o8337.ingest.sentry.io/5173537","DATADOG_APP_ID":"49b2300a-317f-4807-839c-30a5c950a730","DATADOG_CLIENT_TOKEN":"pub36c1cd06ed7ff2432b450daaa9a5896c","DATADOG_SAMPLE_RATE":0,"GTM_CLIENT_ID":"GTM-KMW8HRV","REDIRECT_DEFAULT_LOCALE":true,"AD_SCRIPT":"https://lngtd.com/discogs_a.js","DISABLE_ADS":false,"ASSET_HOST":"https://catalog-assets.discogs.com/","YOUTUBE_API_KEY":"AIzaSyBapJWXaq7LdQ-kJ6IYCo8VgMSSf4zb2r8","DISCOGS_HOST":"https://www.discogs.com"},"data":{"ROOT_QUERY":{"__typename":"Query","viewer":null,"unreadMessagesCount":{"__typename":"ProfileUnreadMessagesCountConnection","totalCount":0},"cartCount":{"__typename":"MarketplaceCartCountConnection","totalCount":0},"release({\"discogsId\":2193760})":{"__ref":"Release:{\"discogsId\":2193760}"}},"Release:{\"discogsId\":2193760}":{"discogsId":2193760,"__typename":"Release","title":"...And Justice For All","blockedFromSale":false,"formats":[{"__typename":"Format","name":"Vinyl","quantity":"4","description":["12\"","45 RPM","Album","Reissue","Remastered"],"text":null},{"__typename":"Format","name":"Box Set","quantity":"1","description":[],"text":null}],"listings({\"first\":0})":{"__typename":"InventoryItemConnection","totalCount":14},"lowestPrice":{"__typename":"Price","converted({\"toCurrency\":\"CAD\"})":{"__typename":"Price","amount":139.27153416997535,"currency":"CAD"}},"ratings({\"first\":1})":{"__typename":"RatingConnection","averageRating":4.68,"totalCount":98},"inCollectionCount":815,"inWantlistCount":608,"statistics":{"__typename":"ReleaseStatisticsConnection","lastSaleDate":"2022-03-04T14:02:54-08:00","max":{"__typename":"Price","converted({\"toCurrency\":\"CAD\"})":{"__typename":"Price","amount":223.41661453612326,"currency":"CAD"}},"median":{"__typename":"Price","converted({\"toCurrency\":\"CAD\"})":{"__typename":"Price","amount":156.6852634401052,"currency":"CAD"}},"min":{"__typename":"Price","converted({\"toCurrency\":\"USD\"})":{"__typename":"Price","currency":"USD","amount":89.45},"converted({\"toCurrency\":\"CAD\"})":{"__typename":"Price","amount":114.19780668717844,"currency":"CAD"},"amount":89.45,"currency":"USD"}},"siteUrl":"/release/2193760-Metallica-And-Justice-For-All","labels":[{"__typename":"LabelRelationship","catalogNumber":"00600753135440","labelRole":"LABEL","label":{"__typename":"Label","discogsId":22532,"siteUrl":"/label/22532-Universal","name":"Universal"},"displayName":"Universal"},{"__typename":"LabelRelationship","catalogNumber":null,"labelRole":"SERIES","label":{"__typename":"Label","discogsId":322430,"siteUrl":"/label/322430-Metallica-45-RPM-Series","name":"Metallica 45 RPM Series"},"displayName":"Metallica 45 RPM Series"}],"isOffensive":false,"dataQuality":"CORRECT","visibility":"PUBLIC","masterRelease":{"__ref":"MasterRelease:{\"discogsId\":6571}"},"barcodes":[{"__typename":"Barcode","type":"BARCODE","description":null,"value":"6 00753 13544 0"},{"__typename":"Barcode","type":"LABEL_CODE","description":null,"value":"LC 01633"},{"__typename":"Barcode","type":"RIGHTS_SOCIETY","description":null,"value":"BEIM/SABAM"}],"contributors":[{"__typename":"ReleaseContributor","isOriginalSubmitter":true,"user":{"__typename":"User","username":"doomtrax"}},{"__typename":"ReleaseContributor","isOriginalSubmitter":false,"user":{"__typename":"User","username":"jh59095"}},{"__typename":"ReleaseContributor","isOriginalSubmitter":false,"user":{"__typename":"User","username":"GoVinylGo"}},{"__typename":"ReleaseContributor","isOriginalSubmitter":false,"user":{"__typename":"User","username":"LLUZNIAK"}},{"__typename":"ReleaseContributor","isOriginalSubmitter":false,"user":{"__typename":"User","username":"japc"}},{"__typename":"ReleaseContributor","isOriginalSubmitter":false,"user":{"__typename":"User","username":"ashmataz"}},{"__typename":"ReleaseContributor","isOriginalSubmitter":false,"user":{"__typename":"User","username":"syke"}},{"__typename":"ReleaseContributor","isOriginalSubmitter":false,"user":{"__typename":"User","username":"kbell75"}},{"__typename":"ReleaseContributor","isOriginalSubmitter":false,"user":{"__typename":"User","username":"vpaluzga"}}],"releaseCredits":[{"__typename":"ReleaseCredit","displayName":"James Hetfield","artist":{"__typename":"Artist","discogsId":251874,"siteUrl":"/artist/251874-James-Hetfield","name":"James Hetfield"},"nameVariation":"Hetfield","creditRole":"Arranged By","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Lars Ulrich","artist":{"__typename":"Artist","discogsId":251550,"siteUrl":"/artist/251550-Lars-Ulrich","name":"Lars Ulrich"},"nameVariation":"Ulrich","creditRole":"Arranged By","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Jason Newsted","artist":{"__typename":"Artist","discogsId":390503,"siteUrl":"/artist/390503-Jason-Newsted","name":"Jason Newsted"},"nameVariation":null,"creditRole":"Bass","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"James Hetfield","artist":{"__typename":"Artist","discogsId":251874,"siteUrl":"/artist/251874-James-Hetfield","name":"James Hetfield"},"nameVariation":"Hetfield","creditRole":"Design Concept [Cover Concept]","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Lars Ulrich","artist":{"__typename":"Artist","discogsId":251550,"siteUrl":"/artist/251550-Lars-Ulrich","name":"Lars Ulrich"},"nameVariation":"Ulrich","creditRole":"Design Concept [Cover Concept]","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Reiner Design Consultants, Inc.","artist":{"__typename":"Artist","discogsId":1829130,"siteUrl":"/artist/1829130-Reiner-Design-Consultants-Inc","name":"Reiner Design Consultants, Inc."},"nameVariation":null,"creditRole":"Design, Layout","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Lars Ulrich","artist":{"__typename":"Artist","discogsId":251550,"siteUrl":"/artist/251550-Lars-Ulrich","name":"Lars Ulrich"},"nameVariation":null,"creditRole":"Drums","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Flemming Rasmussen","artist":{"__typename":"Artist","discogsId":202516,"siteUrl":"/artist/202516-Flemming-Rasmussen","name":"Flemming Rasmussen"},"nameVariation":null,"creditRole":"Engineer","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Toby Wright","artist":{"__typename":"Artist","discogsId":248142,"siteUrl":"/artist/248142-Toby-Wright","name":"Toby Wright"},"nameVariation":"Toby \"Rage\" Wright","creditRole":"Engineer [Assistant And Additional Engineering]","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"George Cowan","artist":{"__typename":"Artist","discogsId":469857,"siteUrl":"/artist/469857-George-Cowan","name":"George Cowan"},"nameVariation":null,"creditRole":"Engineer [Assistant Mixing Engineer]","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"James Hetfield","artist":{"__typename":"Artist","discogsId":251874,"siteUrl":"/artist/251874-James-Hetfield","name":"James Hetfield"},"nameVariation":null,"creditRole":"Guitar [Harmony, Melody], Vocals, Acoustic Guitar, Rhythm Guitar","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Pushead","artist":{"__typename":"Artist","discogsId":270207,"siteUrl":"/artist/270207-Pushead","name":"Pushead"},"nameVariation":null,"creditRole":"Illustration [Hammer Illustration]","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Stephen Gorman","artist":{"__typename":"Artist","discogsId":2015686,"siteUrl":"/artist/2015686-Stephen-Gorman","name":"Stephen Gorman"},"nameVariation":null,"creditRole":"Illustration, Cover","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Kirk Hammett","artist":{"__typename":"Artist","discogsId":18836,"siteUrl":"/artist/18836-Kirk-Hammett","name":"Kirk Hammett"},"nameVariation":null,"creditRole":"Lead Guitar","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"James Hetfield","artist":{"__typename":"Artist","discogsId":251874,"siteUrl":"/artist/251874-James-Hetfield","name":"James Hetfield"},"nameVariation":"Hetfield","creditRole":"Lyrics By","applicableTracks":"A to F, H"},{"__typename":"ReleaseCredit","displayName":"Bob Ludwig","artist":{"__typename":"Artist","discogsId":271098,"siteUrl":"/artist/271098-Bob-Ludwig","name":"Bob Ludwig"},"nameVariation":null,"creditRole":"Mastered By","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Steve Thompson & Michael Barbiero","artist":{"__typename":"Artist","discogsId":142166,"siteUrl":"/artist/142166-Steve-Thompson-Michael-Barbiero","name":"Steve Thompson & Michael Barbiero"},"nameVariation":"Steve Thompson And Michael Barbiero","creditRole":"Mixed By","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Ross Halfin","artist":{"__typename":"Artist","discogsId":1456861,"siteUrl":"/artist/1456861-Ross-Halfin","name":"Ross Halfin"},"nameVariation":"Ross \"Tobacco Road\" Halfin","creditRole":"Photography By","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Metallica","artist":{"__typename":"Artist","discogsId":18839,"siteUrl":"/artist/18839-Metallica","name":"Metallica"},"nameVariation":null,"creditRole":"Producer","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Flemming Rasmussen","artist":{"__typename":"Artist","discogsId":202516,"siteUrl":"/artist/202516-Flemming-Rasmussen","name":"Flemming Rasmussen"},"nameVariation":null,"creditRole":"Producer [Produced With]","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"James Hetfield","artist":{"__typename":"Artist","discogsId":251874,"siteUrl":"/artist/251874-James-Hetfield","name":"James Hetfield"},"nameVariation":"Hetfield","creditRole":"Written-By","applicableTracks":null},{"__typename":"ReleaseCredit","displayName":"Kirk Hammett","artist":{"__typename":"Artist","discogsId":18836,"siteUrl":"/artist/18836-…</script>
As far as I can tell, section 1 provides basic data, section 2 seems to be providing raw data from their database...

I tried with section 1 with the following command:

Code:
curl -s https://www.discogs.com/release/2193760-Metallica-And-Justice-For-All | pup -p 'section#release-stats#li#h4 span text{}'
which returns

Code:
4.68
 / 5
CA$114.20
CA$156.69
CA$223.42
Question 1: Would you use section 1 or 2 and does it matters?
Question 2: The output of the above command provides all values inside of "span" tags. I tried to return only the value which is preceded by a "h4" tag containing the exact word "Median" but it didn't work.

I also tried using the section 2, with
Code:
curl -s https://www.discogs.com/release/2193760-Metallica-And-Justice-For-All | pup -p 'script#dsdata'
which returns the entire block of the script section with ID=dsdata but retrieving the median price from that blob is a challenge... Is it possible?
 
Old 03-14-2022, 02:18 AM   #11
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
It says that it's json! So, pipe it into jq again, after removing the script tags.

That said, discogs apparently has an API you can use: https://www.discogs.com/developers
 
Old 03-14-2022, 07:49 PM   #12
lpallard
Senior Member
 
Registered: Nov 2008
Posts: 1,045

Original Poster
Rep: Reputation: Disabled
Its working!

Thanks @ondoho for pointing out that jq could be used based on the json nature of the scraped page...

Code:
curl -s "$url/$1" | pup -p 'script#dsdata text{}' | jq '.data."Release:{\"discogsId\":2193760}".statistics.median."converted({\"toCurrency\":\"CAD\"})".amount'
Now if I need to extract multiple values from a single page, I'd have to curl that page multiple times because the piped commands are "drilling down" in the json contents and therefore do not allow to "climb" back up to perform another query.... Is there a better way to perform multiple queries on a single page without "bombing" the remote server with multiple downloads or is there no issue doing so?

Last edited by lpallard; 03-14-2022 at 07:50 PM.
 
1 members found this post helpful.
Old 03-15-2022, 08:34 AM   #13
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,628

Rep: Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557
Quote:
Originally Posted by lpallard View Post
Is there a better way to perform multiple queries on a single page without "bombing" the remote server with multiple downloads or is there no issue doing so?
Store the results in a file then perform the query on that file, roughly:
Code:
curl ... | pup ... > filename.json
jq ... filename.json
jq ... filename.json
The > writes the command's stdout to the filename specified. In some instances you would need to use "command ... < filename.json" to subsequently read from stdin, but in this instance jq accepts the filename as arguments so that's not necessary.

When working on the command line, it's very useful to understand piping and redirecting.

 
1 members found this post helpful.
  


Reply

Tags
html, json



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: KMyMoney Is as Simple to Use as Quicken LXer Syndicated Linux News 0 11-22-2012 12:12 AM
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 01:10 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:42 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration