HTML-TokeParser-Simple - Re: How do I know which end tag goes with which start tag?

Posted on Tue May 17 06:33:06 2005 by ovid in response to 464 (See the whole thread of 2)
Re: How do I know which end tag goes with which start tag?

Hi Joshua

The problem with HTML is that it is inherently free form and stumbling across misnested tags can throw the best algorithms for a loop (no bad pun intended). Assuming your tags are properly nested, though, the best way of dealing with this is to either switch to HTML::TreeBuilder (which would let you treat the spans as leafs on a tree), or to maintain either a tag stack or a tag count. I've chose then the latter in the following HTML snippet:

#!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple 3.13; my $parser = HTML::TokeParser::Simple->new(handle => \*DATA); while (my $token = $parser->get_token) { next unless $token->is_start_tag('span'); my $html = get_element($parser, 'span'); print $html; } # pass this the parser and the name of the tag you're interested in. sub get_element { my ($parser, $tag) = @_; my $html = ''; my $more_tags = 0; while (my $token = $parser->get_token) { return $html if $token->is_end_tag($tag) && ! $more_tags; $more_tags++ if $token->is_start_tag($tag); $more_tags-- if $token->is_end_tag($tag); $html .= $token->as_is; } return $html; } __DATA__ <head> <body> <span> <span foo="bar"> stuff </span> </span> </body> </head>
Write a response