Thread

Posted on Wed Mar 19 20:25:46 2008 by bq
Error Messages

Be gentle this is my first foray into XML::Simple ---

I'm getting the following error message and can't understand why I'm getting it: Invalid name in entity [Ln: 290, Col: 689]

I'm also unable to use a string as the argument to XMLin(). When I do the script just burns CPU and never returns.

The XML file I'm trying to parse loads just fine into Firefox http://<url>/test.xml. The xml is simple but there's a lot of it.

How do I figure out if there's really bad data or make XMLin() not blow up when it doesn't like something? I'd really like to pass the string - how do I do that? Where am I going wrong?

The code is fairly simple as well-

#!/usr/bin/perl # # xmlgb.pl # use strict; use XML::Simple; my @logtopdirs=qw( /var/logs/3300 ); my @logdirs=qw( 109 110 111 112 ); my $gunzip='/usr/bin/gunzip -c'; my @files; my $xmlref; my $xmlstr; my $xmlfil; my $tstfile="/tmp/test.xml"; foreach my $td ( @logtopdirs ) { if ( ! chdir($td) ) { print "ERROR: Can't chdir to top dir " . $td . "\n"; next; } print "Working on top dir ->>" . $td . "\n"; foreach my $dir ( @logdirs ) { if ( ! chdir($dir) ) { print "ERROR: Can't chdir to log dir " . $dir . "\n"; next; } @files=glob("*.gz"); foreach my $f ( @files ) { print $dir . "/" . $f . "\n"; if ( ! -f $f ) { print "No file " . $f . "\n"; } else { if ( ! open(XML,"$gunzip $f | ") ) { print "Failed to open stream for " . $f . "\n"; next; } my $xs = XML::Simple->new(KeyAttr=>[],ForceArray=>1,KeepRoot=>1); $xmlstr=""; while ( <XML> ) { $xmlstr .= $_; } open(TST,">$tstfile"); print TST $xmlstr; close(TST); print $xs . "\n"; $xmlref = $xs->XMLin($tstfile); %$xmlref={}; } } chdir ( $td ); } } exit();
Direct Responses: 7418 | Write a response
Posted on Wed Mar 19 21:31:40 2008 by grantm in response to 7417
Re: Error Messages

There's an entry in the Perl XML FAQ about this.

Loading documents into Firefox is not a particularly useful test.

Direct Responses: 7446 | Write a response
Posted on Mon Mar 24 19:03:38 2008 by bq in response to 7418
Re: Error Messages

Your response helped. I am now able to pinpoint the specific strings in the XML files that are causing the errors. However, I can't yet understand why it's causing the parser to fail.

The XML files have embedded &#32 entities in the data elements. Every space in every data element has been rewritten to &#32. I can't yet associate any specific context that causes the failure but not every instance of the &#32 fails (only 1 or 2 out of several hundred in the file). Strangely, in the specific instances where I do get a failure I can substitute a literal space and prevent the failure for just that specific instance. If I try and replace every &#32 with a literal space the parser has even more problems.

Any ideas ??

Write a response