[SOLVED] Simple REGEX to return price and date from web page for kmymoney online price updater
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Simple REGEX to return price and date from web page for kmymoney online price updater
Hello,
I use kmymoney to manage my finances and it uses regexes to retrieve investment prices/dates for its online price updater. Unfortunately, most (if not all) of the online sources have proven extremely unreliable and I am fed up with having constant errors so I have decided to use my actual investment's firm website and regexes to retrieve the prices directly.
Unfortunately, I have never worked with regexes before. Kmymoney seems to "download" a copy of the web page source code, then using regex, return price/date. For my actual case, the relevant section of the web page's source looks like (heavily truncated):
Kmymoney seems to "download" a copy of the web page source code, then using regex, return price/date.
I doubt this is what KMyMoney actually does. It fetches stock quotes from finance.yahoo.com, and the latter provides data as CSV.
Check if your investment firm provides those data as CSV as well. Or the site has a JSON API, or whatever. Other than that, use an HTML parser. E.g. HTML-XML-utils
@shruggy: what you're proposing is very interesting.
However:
1. Most of these investment firms (I am doing business with 6 of them) have NO api of any sort, nor do they offer what I call "raw data" (like a simple web page with raw data). They do provide a page with detailled ticker data (the investment characteristics, price, fluctuation, all other stuff) and some page with tables but the page alone is several MB's because it has LOTS of scripting and useless stuff......
2. Some offer downloadable data (like an excel file, no CSV) but their excel spreadsheets are organized with the name of the investments, not the ticker. Moreover the investments are US/CAD, etc..... A real mess.
If I was a conspiracy theorist, I'd say they are making it cumbersome and difficult to automate data retrieval from their site...
I also looked at the web pages of some investments from the SAME firm, and the web URL's are different!!! If the URL's were identical with the exception of a unique number or ticker, that would make things easier....
I am thinking to try to make a script that would be called once per day via cron, make a local copy of all the pages for all the investments, use regexes or other utilities (such as those you suggested), then extract the data and dump it on a small html file that kmymoney could connect to and grab the numbers.
However, if that works, how long will it work? Until these firms change something on their sites, which happens constantly.
2. Some offer downloadable data (like an excel file, no CSV)
Of course, there are ways to convert from Excel to CSV. Gnumeric comes with ssconvert and ssgrep. catdoc has xls2csv for the old Excel format. For the new format (.xlsx), there are XLSX I/O and xlsx2csv. LibreOffice is also an option (it can be started with --headless on the command line, or use a wrapper script like unoconv).
But given all the uncertainties, I would just download quotes from finance.yahoo.com as everybody else does.
I tried to work with the excel files but believe it or not, 2 of my investments are NOT listed in this file. I've sent an email to ask to the investment firm, but I agree with you, its already complicated enough, I'd rather work with web pages the way KMM was intended to do.
I was using Yahoo a while ago but they did some changes and some investments disappeared. The Globe and Mail, on the other hand, seems to carry all of them. I used to work with this one but they changed the page format (source code) and the regex that came with KMM stopped working hence why I switched to yahoo...
Now I just need to find a way to get the prices/dates from Globe and Mail and all will work like before.
I need to extract 26.6252 and 02/18/22 from that blob...
Do you have an idea how? I must have tried 100 ways to get the data each time always working incorrectly (returning everything after the number, or before it, only the integers (19 and not 19.0026), or nothing... If you are good with regex could you point how ?
Thanks guys for saving my mental health.... That coder who lost his mind had be laughing!!!!
For the resolution of this topic, what I ended up doing is writing a very simple bash script using PUP and JQ as the main tools to extract prices and dates, then passing the output to Kmymoney with the precious help of the main dev of this project.
I'm trying to do something simple (?) with pup to automatically retrieve a specific value from discogs. The median price is what I'm after. They have an API which would be the best way to do this, but unfortunately (AFAIK), the stats (lowest, median and max prices, last sold, etc) are not available thru the API....
Question 1: Would you use section 1 or 2 and does it matters?
Question 2: The output of the above command provides all values inside of "span" tags. I tried to return only the value which is preceded by a "h4" tag containing the exact word "Median" but it didn't work.
Now if I need to extract multiple values from a single page, I'd have to curl that page multiple times because the piped commands are "drilling down" in the json contents and therefore do not allow to "climb" back up to perform another query.... Is there a better way to perform multiple queries on a single page without "bombing" the remote server with multiple downloads or is there no issue doing so?
Is there a better way to perform multiple queries on a single page without "bombing" the remote server with multiple downloads or is there no issue doing so?
Store the results in a file then perform the query on that file, roughly:
The > writes the command's stdout to the filename specified. In some instances you would need to use "command ... < filename.json" to subsequently read from stdin, but in this instance jq accepts the filename as arguments so that's not necessary.
When working on the command line, it's very useful to understand piping and redirecting.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.