Database dump in easy to parse/grep format?

Skaperen · 06-12-2012, 02:26 PM

I'm looking for a way to dump an entire database to a stream that can be easily parsed or grepped. The database engine (for now) is MySQL. And I do have full dump backup files, so something that can convert one of those (in massive SQL format) to a suitable output stream should work, too.

Basically the output should be one text line per data cell, with the database name, table name, column name, and data value. Alternatively I could probably work with a JSON format output. XML is less desired, but might still be usable.

I have full rights to access the database, but the MYSQL tools don't give me a program-friendly output of everything.

schneidz · 06-12-2012, 02:37 PM

i think a custom script based on mysqldump mite generate the desired output.

Tinkster · 06-12-2012, 04:23 PM

Quote:

Originally Posted by schneidz

i think a custom script based on mysqldump mite generate the desired output.

I want one of those output generating mites, too! ;}

But yeah, you're right. And mysqldump w/

Code:

       ·   --fields-terminated-by=..., --fields-enclosed-by=..., --fields-optionally-enclosed-by=..., --fields-escaped-by=...

can make it as "machine-friendly" as the OP desires.

Cheers,
Tink

Skaperen · 06-13-2012, 01:05 PM

Quote:

Originally Posted by Tinkster

I want one of those output generating mites, too! ;}

But yeah, you're right. And mysqldump w/

Code:

       ·   --fields-terminated-by=..., --fields-enclosed-by=..., --fields-optionally-enclosed-by=..., --fields-escaped-by=...

can make it as "machine-friendly" as the OP desires.

Cheers,
Tink

These seem to require the --tab=path (or -T path) option, and that option will not write to stdout as a stream. Instead it writes a bunch of files named per table. This also requires running on the same machine as the database engine. It's also unclear how the files will be named for multiple databases, or maybe it can only do one database.

The format I want is:

databasename<delimiter>tablename<delimiter>columnname<delimiter>value<newline>

with the value encoded in some way so delimiter, newline, and binary bytes are encoded as some notation like backslash-octal or percent-hexadecimal or such. Documentation seems to suggest this is not possible.

Any other ideas?

Skaperen · 06-13-2012, 01:13 PM

Quote:

Originally Posted by schneidz

i think a custom script based on mysqldump mite generate the desired output.

Yeah. But how to get mysqldump to work in some suitable way appears to be the standing issue.

I think it is unlikely mysqldump is usable for this. I suspect this will require some independent program that can access MySQL. I also think it would be better if it can do that access at a layer below SQL processing ... e.g. a raw layer, since it's going to get everything.

Skaperen · 06-13-2012, 01:39 PM

It looks like the --xml option is the only thing viable with mysqldump. But now I need to find something to convert the XML over to something usable.

David the H. · 06-13-2012, 01:59 PM

Quote:

Originally Posted by Skaperen

It looks like the --xml option is the only thing viable with mysqldump. But now I need to find something to convert the XML over to something usable.

xmlstarlet can convert xml to pyx, which is, if not easy to parse, at least uniform in structure and better suited to tools like sed and awk.

http://xmlstar.sourceforge.net/doc/UG/ch04s07.html

There are probably other tools as well that can do the same.

schneidz · 06-13-2012, 02:01 PM

can you do something custom like pipe the output of a sql script thru cut, awk, sed, tr, ... ?

can you share a dump of a sample table with a few rows and columns so we can get an idea if this is bash-able ?

edit: you can play around with the mysqlshow command to get the db, table and column names then loop thru the list with mysql.

Tinkster · 06-13-2012, 04:02 PM

Quote:

Originally Posted by Skaperen

databasename<delimiter>tablename<delimiter>columnname<delimiter>value<newline>

with the value encoded in some way so delimiter, newline, and binary bytes are encoded as some notation like backslash-octal or percent-hexadecimal or such. Documentation seems to suggest this is not possible.

Any other ideas?

That data format seems kind of strange, very garrulous, but you could
conceivably get that result w/ perl or awk, assuming that your mysqldump
looks similar to the sample I just created (I don't use MySQL - bleagh).

You get

Code:

CREATE TABLE 'blah'
  'a' type constraint
  'b' type constraint
  'c' type constraint

You can capture these ('blah','a','b', ... ) in a numbered array

When you encounter an

Code:

INSERT INTO `blah` VALUES ( ...

and can get the values only (per row, if there's not ),( in any column).

Cheers,
Tink

chrism01 · 06-13-2012, 11:31 PM

I reckon you need to write some Perl to parse the mysqldump. That's basically an ASCII plain text file, so should be no problem.
In fact this kind of text/data munging is one of Perl's strong suits.

If you want to simplify the Perl a bit, ensure the dumps are set to create a new INSERT line for each DB rec (default mode iirc) instead of the (optional) multi-rec INSERT statements.

You can even just access the DB directly using DBI/DBD modules.
I've done it loads of times, even joining 2 DBs together to txfr data, ie ETL but no intervening temp files reqd

What exactly is the purpose here?

chrism01 · 06-13-2012, 11:37 PM

PS

http://perldoc.perl.org/
http://www.perlmonks.org/?node=Tutorials
http://www.unixmexico.org/files/html.../dbi/index.htm
http://www.perlmonks.org/?node_id=284436

Skaperen · 06-14-2012, 12:58 AM

Quote:

Originally Posted by schneidz

can you do something custom like pipe the output of a sql script thru cut, awk, sed, tr, ... ?

can you share a dump of a sample table with a few rows and columns so we can get an idea if this is bash-able ?

edit: you can play around with the mysqlshow command to get the db, table and column names then loop thru the list with mysql.

I do not see myself writing any kind of SQL parser in any language (or XML). That's why I'm looking for it "already done" in some tool. I'd even prefer to avoid an SQL database where I can. But other things like Drupal and Mediawiki want it.

The mysqlshow command might have some uses in scripts if there is a way to get it to stop drawing boxes and just give a raw list of data.

Skaperen · 06-14-2012, 01:14 AM

Quote:

Originally Posted by chrism01

What exactly is the purpose here?

To avoid writing a parser.

Skaperen · 06-14-2012, 01:17 AM

Quote:

Originally Posted by David the H.

xmlstarlet can convert xml to pyx, which is, if not easy to parse, at least uniform in structure and better suited to tools like sed and awk.

http://xmlstar.sourceforge.net/doc/UG/ch04s07.html

There are probably other tools as well that can do the same.

Interesting format (pyx). It looks like lexified XML. That may simplify the parsing enough to make it doable for a "parsing challenged programmer" like me (I can handle tab-delimited, though).

UPDATE:

I ran a test. I took an XML dump of a Drupal website database, and converted it to pyx format, then back to xml ... and repeated this 499 times. It did NOT converge. In fact, it collapsed to about 1/3 the size of the original. It appears to be lossy. Program bug?

schneidz · 06-14-2012, 08:00 AM

Quote:

Originally Posted by Skaperen

The mysqlshow command might have some uses in scripts if there is a way to get it to stop drawing boxes and just give a raw list of data.

can you use somthing like tr "\|" "" on mysqlshow's output ?