LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-14-2012, 02:58 PM   #16
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,688

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176

Quote:
Originally Posted by schneidz View Post
can you use somthing like tr "\|" "" on mysqlshow's output ?
It would need to be more than that. It would need to remove the tops and bottoms, and all the space added, without removing it from content.
 
Old 06-14-2012, 07:55 PM   #17
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by Skaperen View Post
I ran a test. I took an XML dump of a Drupal website database, and converted it to pyx format, then back to xml ... and repeated this 499 times. It did NOT converge. In fact, it collapsed to about 1/3 the size of the original. It appears to be lossy. Program bug?

Actually, I don't really know all that much about it. I just recently discovered the option and saw some potential usefulness to it. It doesn't appear to be particularly designed for round-tripping though, and is more there for avoiding having to struggle with the structure of xml when doing content parsing.


Following the link at the bottom of the xmlstarlet documentation, the more detailed description here points out that it certainly isn't completely lossless:

Quote:
You should notice that the transformation loses the DOCTYPE declaration and the comment in the original XML document. For many purposes, this is not important (parsers often discard this information as well). The PYX format, in contrast to the XML format, allows one to easily pose a variety of ad hoc questions about a document. For example: What are all the attribute values in the sample document?
It all comes down to what your ultimate purpose is, I guess. If you need to round-trip it with fidelity, it's probably not the format for you.
 
Old 06-14-2012, 09:00 PM   #18
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
Quote:
Originally Posted by Skaperen View Post
It would need to be more than that. It would need to remove the tops and bottoms, and all the space added, without removing it from content.
i dont have mysqld running so this is untested (i'm sure this can be done way more efficiently in a single awk or sed):
Code:
for db in `mysqlshow | cut -b 3- | awk 'NR>3 {print $1}' | grep -v ^---`
do
 for tab in `mysqlshow $db | cut -b 3- | awk 'NR>3 {print $1}' | grep -v ^---`
 do
  for col in `mysqlshow $db $tab | cut -b 3- | awk 'NR>3 {print $1}' | grep -v ^---`
  do
   #echo $db-$tab-$col >> columns.lst
   mysql $db -e"select $col from $tab" | cut -b 3- | awk 'NR>3 {print $1}' | grep -v ^--- | sed s/^/$db-$tab-$col- >> crazy-dump-format.lst
  done
 done
done

Last edited by schneidz; 06-14-2012 at 09:20 PM.
 
Old 06-16-2012, 06:15 PM   #19
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,688

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by David the H. View Post
Actually, I don't really know all that much about it. I just recently discovered the option and saw some potential usefulness to it. It doesn't appear to be particularly designed for round-tripping though, and is more there for avoiding having to struggle with the structure of xml when doing content parsing.
Agreed. It looks very simple.

IMHO, that it can even be done shows that XML is a design that should never have been refactored for raw data. It was, and is, a format suitable for documents. Calling a database table a document, however, is just wrong. It's as wrong as calling a document a table.

Quote:
Originally Posted by David the H. View Post
Following the link at the bottom of the xmlstarlet documentation, the more detailed description here points out that it certainly isn't completely lossless:
Quote:
You should notice that the transformation loses the DOCTYPE declaration and the comment in the original XML document. For many purposes, this is not important (parsers often discard this information as well). The PYX format, in contrast to the XML format, allows one to easily pose a variety of ad hoc questions about a document. For example: What are all the attribute values in the sample document?
That would lead me to believe that going back from PYX to XML would create a lesser XML because the lost data is not there. But this should be a specific loss. XML->PYX->XML->PYX should be no less than XML->PYX alone. But MORE is lost the 2nd time. Still MORE is lost the 3rd time. More was lost the 499th time. Also, the slope was not even. There was one point where it lost about 50% in one pass. That just hints at very defective. The concept looks fine. The specs might have an issue. But I suspect the implementation might have a bug.

Quote:
Originally Posted by David the H. View Post
It all comes down to what your ultimate purpose is, I guess. If you need to round-trip it with fidelity, it's probably not the format for you.
Agreed.

Maybe I need to just design my own format somewhat like PYX, but focusing on database/table/row/column/value encoding rather than trying to convert XML. It it weren't for the fact that mysqldump is itself very complex, I might try to add an output format to it, or extract the code pieces that "recurse" through all the databases, tables, rows, and columns, and make a tool for that. The issue I see is figuring how the right way to encode various database column types. Numbers and strings are obvious. I'd have to consult how they do that in SQL and hope there is some commonality I can use for all database types.

Such a format might look like:

Code:
Bdatabasename
Ttablename
R
scolumn1string
scolumn2string
scolumn3string
ncolumn4number
ncolumn5number
ncolumb6number
R
scolumn1string
scolumn2string
scolumn3string
ncolumn4number
ncolumn5number
ncolumb6number
E

Last edited by Skaperen; 06-16-2012 at 06:35 PM. Reason: Added the "E" line to the sample
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to parse /etc/passwd by means of grep? PianoLinux Linux - Newbie 11 02-28-2012 10:43 PM
does PHPDoc parse database accesses? dping Programming 2 07-04-2009 12:47 PM
how can I format/parse the output of time? luusac Linux - Newbie 4 04-09-2009 05:18 AM
perl script to parse following format suomali Programming 11 09-24-2008 01:51 PM
Need help with grep, trying to parse/filter a file... patsnip Programming 4 08-29-2003 02:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 01:28 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration