Thread

Posted on Tue Mar 28 17:31:08 2006 by monicad
Using keyattr causes repeated elements to be ignored
If I use XML:Simple to process the XML as in the code at the end, I get the following data structure when no options are supplied to XMLin:
$VAR1 = { 'dc:subject' => [ { 'xsi:type' => 'ebankterms:CompoundClass', 'content' => 'Organic' }, { 'xsi:type' => 'ebankterms:Keywords', 'content' => 'cycloadditions' }, { 'xsi:type' => 'ebankterms:Keywords', 'content' => '1,3-dicarbonyl compounds' }, { 'xsi:type' => 'ebankterms:Keywords', 'content' => 'mesoionics' }, { 'xsi:type' => 'ebankterms:Keywords', 'content' => 'tautomerism' }, { 'xsi:type' => 'ebankterms:Keywords', 'content' => 'thioisomnchnones ' }, { 'xsi:type' => 'ebankterms:Keywords', 'content' => 'Halogen-halogen interactions' } ] };

However if I try to use a keyattr option to XMLin (see code) the multiple values of ebankterms:Keywords are not retained since no array or hash of hashes is created to accommodate them.
$VAR1 = { 'dc:subject' => { 'ebankterms:CompoundClass' => { 'content' => 'Organic' }, 'ebankterms:Keywords' => { 'content' => 'Halogen-halogen interactions' } } };

Is this the expected behaviour? I have looked through the various other options and can't find anything that will modify this behaviour. Is it possible to use the keyattr option to distinguish between the various dc:subject elements (ebankterms:CompoundClass, ebankterms:Keywords) (as in the 2nd example), but still get access to all the values of dc:subject ebankterms:Keywords (as in the first example)? Code with example XML below.

#!/usr/bin/perl5.8.0

use XML::Simple;

use Data::Dumper;

$xml = "<record><dc:subject xsi:type=\"ebankterms:CompoundClass\">Organic</dc:subject>
<dc:subject xsi:type=\"ebankterms:Keywords\">cycloadditions</dc:subject>
<dc:subject xsi:type=\"ebankterms:Keywords\">1,3-dicarbonyl compounds</dc:subject>
<dc:subject xsi:type=\"ebankterms:Keywords\">mesoionics</dc:subject>
<dc:subject xsi:type=\"ebankterms:Keywords\">tautomerism</dc:subject>
<dc:subject xsi:type=\"ebankterms:Keywords\">thioisomnchnones </dc:subject>
<dc:subject xsi:type=\"ebankterms:Keywords\">Halogen-halogen interactions</dc:subject></record>";

# eval { $xp = XMLin($xml) };
eval { $xp = XMLin($xml, keyattr => {'dc:subject'=>'xsi:type'} ) };
# eval { $xp = XMLin($xml, forcearray=>1,keyattr => {'dc:subject'=>'xsi:type'} ) };

print Dumper($xp);

Monica
Direct Responses: 2054 | Write a response
Posted on Tue Mar 28 23:59:35 2006 by grantm in response to 2051
Re: Using keyattr causes repeated elements to be ignored

Yes, this is this the expected behaviour. When multiple values are assigned to the same hash key, each new value will overwrite the previous one.

You might be better off switching to XML::LibXML. This article compares the two modules.

Since your XML uses namespaces, you'll also need to use XML::LibXML::XPathContext.

Either the Perlmonks web site or the Perl-XML mailing lists are better places to get help.

Write a response