Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I ran a test. I took an XML dump of a Drupal website database, and converted it to pyx format, then back to xml ... and repeated this 499 times. It did NOT converge. In fact, it collapsed to about 1/3 the size of the original. It appears to be lossy. Program bug?
Actually, I don't really know all that much about it. I just recently discovered the option and saw some potential usefulness to it. It doesn't appear to be particularly designed for round-tripping though, and is more there for avoiding having to struggle with the structure of xml when doing content parsing.
Following the link at the bottom of the xmlstarlet documentation, the more detailed description here points out that it certainly isn't completely lossless:
Quote:
You should notice that the transformation loses the DOCTYPE declaration and the comment in the original XML document. For many purposes, this is not important (parsers often discard this information as well). The PYX format, in contrast to the XML format, allows one to easily pose a variety of ad hoc questions about a document. For example: What are all the attribute values in the sample document?
It all comes down to what your ultimate purpose is, I guess. If you need to round-trip it with fidelity, it's probably not the format for you.
Actually, I don't really know all that much about it. I just recently discovered the option and saw some potential usefulness to it. It doesn't appear to be particularly designed for round-tripping though, and is more there for avoiding having to struggle with the structure of xml when doing content parsing.
Agreed. It looks very simple.
IMHO, that it can even be done shows that XML is a design that should never have been refactored for raw data. It was, and is, a format suitable for documents. Calling a database table a document, however, is just wrong. It's as wrong as calling a document a table.
Quote:
Originally Posted by David the H.
Following the link at the bottom of the xmlstarlet documentation, the more detailed description here points out that it certainly isn't completely lossless:
Quote:
You should notice that the transformation loses the DOCTYPE declaration and the comment in the original XML document. For many purposes, this is not important (parsers often discard this information as well). The PYX format, in contrast to the XML format, allows one to easily pose a variety of ad hoc questions about a document. For example: What are all the attribute values in the sample document?
That would lead me to believe that going back from PYX to XML would create a lesser XML because the lost data is not there. But this should be a specific loss. XML->PYX->XML->PYX should be no less than XML->PYX alone. But MORE is lost the 2nd time. Still MORE is lost the 3rd time. More was lost the 499th time. Also, the slope was not even. There was one point where it lost about 50% in one pass. That just hints at very defective. The concept looks fine. The specs might have an issue. But I suspect the implementation might have a bug.
Quote:
Originally Posted by David the H.
It all comes down to what your ultimate purpose is, I guess. If you need to round-trip it with fidelity, it's probably not the format for you.
Agreed.
Maybe I need to just design my own format somewhat like PYX, but focusing on database/table/row/column/value encoding rather than trying to convert XML. It it weren't for the fact that mysqldump is itself very complex, I might try to add an output format to it, or extract the code pieces that "recurse" through all the databases, tables, rows, and columns, and make a tool for that. The issue I see is figuring how the right way to encode various database column types. Numbers and strings are obvious. I'd have to consult how they do that in SQL and hope there is some commonality I can use for all database types.
Such a format might look like:
Code:
Bdatabasename
Ttablename
R
scolumn1string
scolumn2string
scolumn3string
ncolumn4number
ncolumn5number
ncolumb6number
R
scolumn1string
scolumn2string
scolumn3string
ncolumn4number
ncolumn5number
ncolumb6number
E
Last edited by Skaperen; 06-16-2012 at 06:35 PM.
Reason: Added the "E" line to the sample
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.