XML - beautifying
You want to make your XML more readable
You have just extracted an XML from a source and it is all un-indented and so not very readable. That happens in my workplace where a single XML can be many MBs in size.
Using xmllint
xmllint --format <your_existing_xml_file_name> > new_xml_file_name
If you do not have xmllint installed, you have the perl method. This is slower, but works the same. This is not my code but see the author info. Works good.
#!/usr/bin/perl
#
# Purpose: Read an XML file and indent it for ease of reading
# Author: RedGrittyBrick 2011.
# Licence: Creative Commons Attribution-ShareAlike 3.0 Unported License
#
use strict;
use warnings;
my $filename = $ARGV[0];
die "Usage: $0 filename\n" unless $filename;
open my $fh , '<', $filename
or die "Can't read '$filename' because $!\n";
my $xml = '';
while (<$fh>) { $xml .= $_; }
close $fh;
$xml =~ s|>[\n\s]+<|><|gs; # remove superfluous whitespace
$xml =~ s|><|>\n<|gs; # split line at consecutive tags
my $indent = 0;
for my $line (split /\n/, $xml) {
if ($line =~ m|^</|) { $indent--; }
print ' 'x$indent, $line, "\n";
if ($line =~ m|^<[^/\?]|) { $indent++; } # indent after <foo
if ($line =~ m|^<[^/][^>]*>[^<]*</|) { $indent--; } # but not <foo>..</foo>
if ($line =~ m|^<[^/][^>]*/>|) { $indent--; } # and not <foo/>
}
You have just extracted an XML from a source and it is all un-indented and so not very readable. That happens in my workplace where a single XML can be many MBs in size.
Using xmllint
xmllint --format <your_existing_xml_file_name> > new_xml_file_name
If you do not have xmllint installed, you have the perl method. This is slower, but works the same. This is not my code but see the author info. Works good.
#!/usr/bin/perl
#
# Purpose: Read an XML file and indent it for ease of reading
# Author: RedGrittyBrick 2011.
# Licence: Creative Commons Attribution-ShareAlike 3.0 Unported License
#
use strict;
use warnings;
my $filename = $ARGV[0];
die "Usage: $0 filename\n" unless $filename;
open my $fh , '<', $filename
or die "Can't read '$filename' because $!\n";
my $xml = '';
while (<$fh>) { $xml .= $_; }
close $fh;
$xml =~ s|>[\n\s]+<|><|gs; # remove superfluous whitespace
$xml =~ s|><|>\n<|gs; # split line at consecutive tags
my $indent = 0;
for my $line (split /\n/, $xml) {
if ($line =~ m|^</|) { $indent--; }
print ' 'x$indent, $line, "\n";
if ($line =~ m|^<[^/\?]|) { $indent++; } # indent after <foo
if ($line =~ m|^<[^/][^>]*>[^<]*</|) { $indent--; } # but not <foo>..</foo>
if ($line =~ m|^<[^/][^>]*/>|) { $indent--; } # and not <foo/>
}
Total Comments 0