ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
As a user of SoftMaker® office (closed-source, commercial) I am quite fond of their support for some of the XML-file formats in use nowadays.
While I seek to automate modifications in these files by use of shell- or ruby-scripts, I have not yet advanced over the state of removing unwanted content from the unzipped version of the files, like e.g. removing a SoftMaker-specific tag from the definition of “embedded” graphic-files or removing all links to additional workbooks in an XLSX-file.
In the future, I want to limit all embedded images in a document (docx or tmdx) to a size that corresponds to what is displayed, rather than scale the images to fit in the document. On the XML-level, these images are just linked in and the graphic-files are provided in a sub-folder. I hope to render some documents much smaller, this way.
Are you having experience with this kind of manipulations and do you have examples and best practices to share?
One reason for my endeavor is the fact that SoftMaker do not provide a programming interface in the Linux-version of their office-package. Under Windows, there is a Basic-dialect and the usual OLE/COM-interface.
In the end, I deem hacking the XML-code much better, also because there are so many ways to achieve the same.
Last edited by Michael Uplawski; 03-03-2018 at 02:15 PM.
Reason: Dumb wording. Embedded against linked graphics.
XML manipulating software exists. xmlstarlet might be a good fit.
i use xmllint, but i think it's read-only, and fairly low-level.
still sounds like a lot of work to script it.
XML manipulating software exists. xmlstarlet might be a good fit.
i use xmllint, but i think it's read-only, and fairly low-level.
still sounds like a lot of work to script it.
Thanks ondoho.
My xml-software is
*) nokogiri, either as a „stand-alone” command-line tool or as a module in my Ruby-programs
*) Apache-Fop, as a XSL/FO processor, stand-alone or in Java.
*) Firefox to open and view Office xml-files.
What I find difficult is the documentation of the OOXML-standard and how I should apply the information to my own task at hand. I guess I would need simple examples to get on with my coding work... I do not need to create new office-documents, for the time, and a downright conversion between file-formats is also not needed.
It is quite clear that everybody has different needs and if one “must” manipulate ODS-, Docx- or XLSX-files, it will be for a specific task. However, as my coding skills are diminishing I hope to get inspiration from the work and words of others...
What I seek in the end, is a way to automate these manipulations, like you do with the OLE/COM-interface and e.g. Visual Basic or just any other language that provides OLE-functionality. I am not talking about occasional interventions to modify 1 specific file. This would be too simple and I understand your remark in this way, too.
It's actually an easy thing to do. The technology that you want is called XSLT.
Like all such XML formats, the OpenOffice document formats are standardized and described by a so-called "schema." They can be validated, using appropriate tools, to prove that any document conforms to its schema.
Then, XSLT allows you to write transformation rules – built in XML – to convert one XML structure to another. You can write rules, without writing a single computer program, to "transform away" whatever you don't want, or to turn it into something else. You then re-validate the resulting file to be sure that it still conforms to the schema: this is an automated way to be sure that OO will probably still accept it, even if it doesn't do the right thing with it. (Yes, XSLT transforms can have "bugs" in them.)
You don't have to write a single line of "custom programming" to do any of these things.
(DocBook, which is "where all those technical books with animals on the cover" actually came from, is an astonishing demonstration of what XSLT can do.)
Last edited by sundialsvcs; 03-05-2018 at 07:34 AM.
It's actually an easy thing to do. The technology that you want is called XSLT.
XSLT is one technology to do it. I do it. I did it.
But I express myself badly, miserably, it appears. You dwell on the technology. I know the technology.
My problem is with the application and the so-called “standards“. As I seek examples, I may just as well describe one, myself: Someone wants to modify all inline-images in a Text-Processor file-format to e.g. replace the current images against new ones.
I find terrible tag-hierarchies, where I had expected just one or two tags. Reading up on the meaning of these tags, I get lost and shut-down my computer.
In the past, I have created many different reports from different data-sources using XSLT either from a java-program or by employing the Apache-Fop “xsl/fo”-processor. Some of my most recent tools use nokogiri to convert xml to pdf or to analyze (X)HTML for useful content. This is all an easy game against the task to find your way in the labyrinthine OOXML code.
In my own opinion, of course.
But folks, I tend to let it rest for now. I will be playing around with what I have and maybe publish some of my findings (as usual) in my blog.., if I deem them worth it.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.