LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-19-2007, 05:21 PM   #1
veritas
Member
 
Registered: Aug 2003
Location: Dallas,TX
Distribution: Ubuntu Server, Slackware, Red Hat 6.1
Posts: 241

Rep: Reputation: 30
php+XML - escaped ampersands not escaping correctly


hello

I'm writing a script that parses an XML file. Right now I have the php code echoing whatever is inside a segment tag. Sometimes the tag content will contain ampersands (which are escaped in the xml file like &, but when they are encountered by the parser, something strange happens..

what the XML in the file looks like:

Code:
<segment>part1of36.54oUmWqvRiP6jU&amp;0JoVS@powerpost2000AA.local</segment>
What the output looks like (I just echo whatever is in <segment>):

Quote:
part1of36.54oUmWqvRiP6jU
&
0JoVS@powerpost2000AA.local
The parser seems to split the segment data into 3 parts and treats each part like it just found a new segment. I know this because its adding a <br> to the end of each line, which only happens when it finds a new segment tag. Why would it be doing this? BTW - every other piece of data between <segment> displays correctly if it doesn't have an ampersand.

PHP 4.4.4 slack11
 
Old 06-20-2007, 06:06 PM   #2
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
It might help if you showed the appropriate bits of code that you are using.
 
Old 06-20-2007, 07:27 PM   #3
veritas
Member
 
Registered: Aug 2003
Location: Dallas,TX
Distribution: Ubuntu Server, Slackware, Red Hat 6.1
Posts: 241

Original Poster
Rep: Reputation: 30
Code:
if (! ($xmlparser = xml_parser_create()) )
{ 
   die ("Error: cannot create XML parser");
}

$current = "";

// called when an open tag is found
function open_tag($parser, $tagname, $attribs)
{
  global $current;
  $current = $tagname;
}

// the function called when a closing tag is found
function close_tag($parser, $tagname)
{
// I dont need it to do anything right now
}

// setting up the tag handler
xml_set_element_handler($xmlparser, "open_tag", "close_tag");

// function called when data between tags is found
function tag_contents($parser, $tagdata)
{ 
  global $current;
  if($current == "SEGMENT") //if it detects a segment tag, echo whats inside
  {
    echo $tagdata."<br>"."\n";
  }
}

// setting up the handler for tag contents
xml_set_character_data_handler($xmlparser, "tag_contents");

// set the filename var
$filename = $_FILES['nzbfile']['tmp_name'];

//open the file for reading
if(!($fp=fopen($filename, "r")))
{
  die("Cannot open the nzb entitled ".$filename);
}

// read the file, then send the data to the parser
while($data = fread($fp, $_FILES['nzbfile']['size']))
{
  if (!xml_parse($xmlparser, $data, feof($fp)))
  {
    $reason = xml_error_string(xml_get_error_code($xmlparser));
    $reason .= xml_get_current_line_number($xmlparser);
    die($reason);
  }
}
// destroy parser
xml_parser_free($xmlparser);
fclose($fp);
 
Old 06-21-2007, 12:05 AM   #4
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
The XML parser that you are using is calling the tag_contents() function multiple times for those lines. You might want to consider using the simpleXML functions. Insert the following code, it should get you going.

PHP Code:
$xmlobj simplexml_load_file($filename);
var_dump($xmlobj); 
 
Old 06-21-2007, 01:53 AM   #5
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,453

Rep: Reputation: 448Reputation: 448Reputation: 448Reputation: 448Reputation: 448
Hi
"tag_contents" can be called many times. The way the parser is designed, you don't need to read the whole XML in memory at once, even if you do in your example. The parser works on a stream and buffers can run empty any time.

Change it to:
PHP Code:
function close_tag($parser$tagname)
{
  global 
$current;
  if(
$current == "SEGMENT")
    echo 
"<br>\n";
}

function 
tag_contents($parser$tagdata)

  global 
$current;
  if(
$current == "SEGMENT"//if it detects a segment tag, echo whats inside
  
{
    echo 
$tagdata;
  }


Last edited by Guttorm; 06-21-2007 at 01:58 AM.
 
Old 06-21-2007, 08:50 PM   #6
veritas
Member
 
Registered: Aug 2003
Location: Dallas,TX
Distribution: Ubuntu Server, Slackware, Red Hat 6.1
Posts: 241

Original Poster
Rep: Reputation: 30
Guttorm -- thank you, that fixed it the glitch. Could you explain why placing the breaks in close_tag() changed it? To me it seems either way would do the same thing, but obviously that isn't so
 
Old 06-22-2007, 04:31 AM   #7
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,453

Rep: Reputation: 448Reputation: 448Reputation: 448Reputation: 448Reputation: 448
Hi again.

Well I tried to say, the SAX XML parser is made to operate on streams - not just memory.

PHP Code:
// read the file, then send the data to the parser
while($data fread($fp$_FILES['nzbfile']['size']))
{
  if (!
xml_parse($xmlparser$datafeof($fp)))
  {
    
$reason xml_error_string(xml_get_error_code($xmlparser));
    
$reason .= xml_get_current_line_number($xmlparser);
    die(
$reason);
  }

In your while loop, you read the entire file at once - passing $_FILES['nzbfile']['size'] as how much you want to read. You could have used a smaller buffer, like 100 bytes, and it still works - even if the segment tag contained more than 100 bytes.

The disadvantage is that tag_contents can be called many times, but it's not a problem since open_tag and close_tag is called so you know when the tag starts and finishes.

The advantage is that the XML file can be gigabytes long, and you can parse it without gigabytes of memory.
 
Old 06-22-2007, 05:51 PM   #8
veritas
Member
 
Registered: Aug 2003
Location: Dallas,TX
Distribution: Ubuntu Server, Slackware, Red Hat 6.1
Posts: 241

Original Poster
Rep: Reputation: 30
Thanks, I think i'm all cleared up now.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
php regex: escaped characters? Thinking Programming 1 02-09-2006 10:01 AM
Why wont XML::Parser install correctly? erisco Linux - Software 3 10-22-2005 09:32 AM
Escaping host limits in PHP hw-tph Programming 7 09-22-2004 03:46 AM
PHP w/ XML ridertech Linux - Software 0 05-05-2004 02:50 AM
PHP and XML Bill Barrington Linux - Software 0 06-22-2003 08:08 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration