XML-TreeBuilder - Re: HTML::TreeBuilder

Posted on Mon Jan 30 08:45:40 2006 by sth in response to 660 (See the whole thread of 3)
Re: HTML::TreeBuilder
i have had the exact same problem with HTML::TreeBuilder. Actually, this is a problem of the "as_text" method of HTML::Element.

As a workaround, i created a hack by adding a new method called "as_newline_text()" as ugly copy-paste-programming. It's a modification of the original "as_text" method:

# You may add this to .../HTML/Element.pm sub as_newline_text { # STh: a special version of as_text that tries to keep the outline structure my($this,%options) = @_; my $skip_dels = $options{'skip_dels'} || 0; #print "Skip dels: $skip_dels\n"; my(@pile) = ($this); my $tag; my $text = ''; while(@pile) { if(!defined($pile[0])) { # undef! # no-op } elsif(!ref($pile[0])) { # text bit! save it! # $text .= "\n{$tag}" if $HTML::Element::canTighten{$tag}; # $text .= "\n[$tag]" unless $HTML::Element::canTighten{$tag}; $text .= shift @pile; } else { # it's a ref -- traverse under it unshift @pile, @{$this->{'_content'} || $nillio} unless ($tag = ($this = shift @pile)->{'_tag'}) eq 'style' or $tag eq 'script' or ($skip_dels and $tag eq 'del'); # $text .= "\n{+$tag}" if $HTML::Element::canTighten{$tag}; # $text .= "\n[+$tag]" unless $HTML::Element::canTighten{$tag}; $text .= "\n\n" if $HTML::Element::canTighten{$tag}; } } $text =~ s/^\n+//; # remove all leading \n $text =~ s/\n+$//; # remove all trailing \n $text =~ s/\n\n\n/\n\n/g; # collapse multi \n to a maximum of double \n return $text; }
By uncommenting the other "$text .= ..." lines, you can figure out a little bit more about how the procedure works. You may also convert the "\n\n" to a single space if you do not want newlines in your address.
Write a response