Thread

Posted on Wed Aug 29 09:00:27 2007 by libsys
'Inavlid entity' errors when parsing XML
I have a large (35mb) xml file which I need to get some information out of. At the moment, I'm just doing a simple parse and dump to get to grips with XML::Simple, however I've run into a frustrating problem.

The code I'm using is very simple:

#!/usr/bin/perl use XML::Simple; use Data::Dumper; $dumpfile = "QUT_EPrints.xml"; $xml = new XML::Simple; $data = $xml->XMLin($dumpfile); print Dumper($data);


When I run it, I get a single error of the type 'Invalid name in entity [Ln: 30370, Col: 95]' and the script stops. In all cases, the problem appears to be a character code, most of the time it is a newline character ( ). If I remove the character and re-run the script then it seems to continue happily but then goes on to throw up the same error in the same manner, only with another character on another line.

The odd thing is that when I've gone to remove the offending character, I can't see any difference between the offending character and the newline characters on lines surrounding it. It just doesn't appear to be unusual or different.

Can anyone tell me how to avoid this error, or how to at least tell XML::Simple to ignore these errors?

Thanks, Guy
Direct Responses: 5981 | Write a response
Posted on Wed Aug 29 09:27:12 2007 by grantm in response to 5980
Re: 'Inavlid entity' errors when parsing XML

I'm guessing that you may be running into a buffering bug in XML::SAX::PurePerl. Try installing the XML::SAX::Expat module which will then be used instead of the PurePerl parser.

If that doesn't work then try and provide the smallest test data that exposes the bug.

PS: www.perlmonks.org is a better place for asking questions like this

Direct Responses: 5999 | Write a response
Posted on Fri Aug 31 04:25:33 2007 by libsys in response to 5981
Re: 'Inavlid entity' errors when parsing XML
Thanks, XML::SAX::Expat seems to have resolved this problem. Much appreciated!
Write a response