LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-03-2018, 03:29 AM   #1
Michael Uplawski
Senior Member
 
Registered: Dec 2015
Posts: 1,624
Blog Entries: 40

Rep: Reputation: Disabled
Office XML-file hacking anybody?


Good morning.

As a user of SoftMaker® office (closed-source, commercial) I am quite fond of their support for some of the XML-file formats in use nowadays.

While I seek to automate modifications in these files by use of shell- or ruby-scripts, I have not yet advanced over the state of removing unwanted content from the unzipped version of the files, like e.g. removing a SoftMaker-specific tag from the definition of “embedded” graphic-files or removing all links to additional workbooks in an XLSX-file.

In the future, I want to limit all embedded images in a document (docx or tmdx) to a size that corresponds to what is displayed, rather than scale the images to fit in the document. On the XML-level, these images are just linked in and the graphic-files are provided in a sub-folder. I hope to render some documents much smaller, this way.

Are you having experience with this kind of manipulations and do you have examples and best practices to share?

One reason for my endeavor is the fact that SoftMaker do not provide a programming interface in the Linux-version of their office-package. Under Windows, there is a Basic-dialect and the usual OLE/COM-interface.

In the end, I deem hacking the XML-code much better, also because there are so many ways to achieve the same.

Last edited by Michael Uplawski; 03-03-2018 at 02:15 PM. Reason: Dumb wording. Embedded against linked graphics.
 
Old 03-04-2018, 02:23 AM   #2
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
XML manipulating software exists.
xmlstarlet might be a good fit.
i use xmllint, but i think it's read-only, and fairly low-level.
still sounds like a lot of work to script it.
 
Old 03-04-2018, 12:00 PM   #3
Michael Uplawski
Senior Member
 
Registered: Dec 2015
Posts: 1,624

Original Poster
Blog Entries: 40

Rep: Reputation: Disabled
Quote:
Originally Posted by ondoho View Post
XML manipulating software exists.
xmlstarlet might be a good fit.
i use xmllint, but i think it's read-only, and fairly low-level.
still sounds like a lot of work to script it.
Thanks ondoho.

My xml-software is
*) nokogiri, either as a „stand-alone” command-line tool or as a module in my Ruby-programs
*) Apache-Fop, as a XSL/FO processor, stand-alone or in Java.
*) Firefox to open and view Office xml-files.

What I find difficult is the documentation of the OOXML-standard and how I should apply the information to my own task at hand. I guess I would need simple examples to get on with my coding work... I do not need to create new office-documents, for the time, and a downright conversion between file-formats is also not needed.

It is quite clear that everybody has different needs and if one “must” manipulate ODS-, Docx- or XLSX-files, it will be for a specific task. However, as my coding skills are diminishing I hope to get inspiration from the work and words of others...

What I seek in the end, is a way to automate these manipulations, like you do with the OLE/COM-interface and e.g. Visual Basic or just any other language that provides OLE-functionality. I am not talking about occasional interventions to modify 1 specific file. This would be too simple and I understand your remark in this way, too.

Cheerio.
 
Old 03-05-2018, 07:30 AM   #4
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,706
Blog Entries: 4

Rep: Reputation: 3949Reputation: 3949Reputation: 3949Reputation: 3949Reputation: 3949Reputation: 3949Reputation: 3949Reputation: 3949Reputation: 3949Reputation: 3949Reputation: 3949
It's actually an easy thing to do. The technology that you want is called XSLT.

Like all such XML formats, the OpenOffice document formats are standardized and described by a so-called "schema." They can be validated, using appropriate tools, to prove that any document conforms to its schema.

Then, XSLT allows you to write transformation rules – built in XML – to convert one XML structure to another. You can write rules, without writing a single computer program, to "transform away" whatever you don't want, or to turn it into something else. You then re-validate the resulting file to be sure that it still conforms to the schema: this is an automated way to be sure that OO will probably still accept it, even if it doesn't do the right thing with it. (Yes, XSLT transforms can have "bugs" in them.)

You don't have to write a single line of "custom programming" to do any of these things.

(DocBook, which is "where all those technical books with animals on the cover" actually came from, is an astonishing demonstration of what XSLT can do.)

Last edited by sundialsvcs; 03-05-2018 at 07:34 AM.
 
Old 03-05-2018, 12:05 PM   #5
Michael Uplawski
Senior Member
 
Registered: Dec 2015
Posts: 1,624

Original Poster
Blog Entries: 40

Rep: Reputation: Disabled
Quote:
Originally Posted by sundialsvcs View Post
It's actually an easy thing to do. The technology that you want is called XSLT.
XSLT is one technology to do it. I do it. I did it.
But I express myself badly, miserably, it appears. You dwell on the technology. I know the technology.

My problem is with the application and the so-called “standards“. As I seek examples, I may just as well describe one, myself: Someone wants to modify all inline-images in a Text-Processor file-format to e.g. replace the current images against new ones.

I find terrible tag-hierarchies, where I had expected just one or two tags. Reading up on the meaning of these tags, I get lost and shut-down my computer.

In the past, I have created many different reports from different data-sources using XSLT either from a java-program or by employing the Apache-Fop “xsl/fo”-processor. Some of my most recent tools use nokogiri to convert xml to pdf or to analyze (X)HTML for useful content. This is all an easy game against the task to find your way in the labyrinthine OOXML code.

In my own opinion, of course.

But folks, I tend to let it rest for now. I will be playing around with what I have and maybe publish some of my findings (as usual) in my blog.., if I deem them worth it.

Thanks anyway for your responses.
 
Old 03-06-2018, 12:13 AM   #6
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
if libreoffice uses it, it must be possible to get at the specs.
maybe: https://duckduckgo.com/html?q=ooxml%20specifications

another search query: https://duckduckgo.com/html?q=linux%...20office%20xml
it seems you will have to read a few microsoft documents even if you want to do this on linux :-(
 
  


Reply

Tags
docx, macro, ooxml, softmaker office, xml



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
libvirt: post virt-clone, unchanged values in resulting XML file from original XML file CptSupermrkt Linux - Virtualization and Cloud 1 04-14-2016 08:20 AM
Any Way to Read a SINGLE XML VALUE from a big XML File in Linux? Or...? gmark Programming 3 01-17-2016 10:51 AM
using XML::Twig and DBI for storing a xml-file into a myql-db sayhello_to_the_world Programming 3 05-26-2014 10:54 AM
how to add xml-stylesheet tag in a XML File using libxml2 ? peacemission Programming 6 05-26-2012 02:20 AM
LXer: Publication of ISO/IEC 29500:2008, Office Open XML file formats LXer Syndicated Linux News 0 11-21-2008 12:41 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:22 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration