Thread

Posted on Sat May 27 00:22:15 2006 by jstenzel
How to distinguish lists correctly?
Hi, when traversing a document, I wonder how to check the type of a list. The docs say lists have types in OO1, and styles in OO2. But is there an example of how to check them? My current approach is this:
# get element text my $text=$me->{docContent}->getText($element); # choose an output format according to the type if ($element->isItemList) { # a list: try to find out whether this list is ordered or not # (note: only OO 1 distinguishes ordered and unordered lists directly, # OO2 uses generic lists and configures them by style) my $type= $me->{docContent}{opendocument} ? 'oasis' : $element->isOrderedList ? 'ordered' : 'bullet'; # handle all list elements foreach my $item ($me->{docContent}->getItemElementList($element)) {...} }
As you can see, for OO2 I have no idea. For OO1, I use the documented methods, but it turns out that they always report an ordered list, regardless if the list is ordered or not. This was tested with an SXW document (OO1 format), stored by OO 2. Thanks in advance Jochen
Direct Responses: 2367 | Write a response
Posted on Sat May 27 23:59:03 2006 by jmgdoc in response to 2364
Re: How to distinguish lists correctly?

Ordered and unordered lists are OO1-only objects. The OpenDocument format knows item lists only. The same list object is displayed with numbers, with bullets, or otherwise according to its style. If you really need to automatically distinguish ordered and unordered lists, the only way consists of interpreting the properties of the used list styles.
Direct Responses: 2368 | Write a response
Posted on Sun May 28 01:58:13 2006 by jstenzel in response to 2367
Re: How to distinguish lists correctly?
For OpenDocument, it seems there is no example, so I will try to investigate the styles and - as you write - decide which style indicates an ordered or unordered list. As OO2 must do this a similar way, there should be a pattern, I hope. For OO1, there is a problem. As you write, there should be ordered and unordered lists, but the OO::OODoc methods always report *ordered* lists for my document - regardless if the lists are ordered or not. Do I use the methods the wrong way, or is this a bug? How to do it correctly? Thanks!
Direct Responses: 2369 | Write a response
Posted on Sun May 28 12:10:30 2006 by jmgdoc in response to 2368
Re: How to distinguish lists correctly?

1) What do you mean by "OO2 must do this a similar way" ?. The OO2 format *is* OpenDocument !

2) Could you provide examples of methods reporting "ordered" regardless if the lists are ordered or not ? Do you mean that the basic isOrderedList() boolean returns always "true" ?
Direct Responses: 2370 | Write a response
Posted on Sun May 28 15:17:02 2006 by jstenzel in response to 2369
Re: How to distinguish lists correctly?
1) Yes, OpenDocument is the OO2 format. Therefore, as OO2 *has* GUI buttons that toggle if the current paragraph is part of an ordered or an unordered list, when reading an OpenDocument list OO2 has to distinguish list paragraphs, and therefore it should use a certain pattern for the type decision. Say it treats a bullet prefix style attribute to handle the paragraph as part of a bullet list, and <whatever> prefix style attribute as an indicator that the list is ordered. Likewise, when a user presses the, say, "ordered list" button in OO2, it has to assign that certain prefix style attribute to the list element. So, as there must be an internal mapping in OO2 from prefix style attributes (bullets, numbers, whatever) to list types presented in the GUI (ordered, unordered) and vice versa, I hope I can find out this mapping pattern and use it similarly to detect the list types. Or, I'd be glad to hear about a standard OO2 method or code snippet to do this detection. As an optimization and suggestion, would it be possible to have builtin support of these mappings in the OO::OODoc methods isOrderedList() and isUnorderedList(), so when they detect they are called for an OpenDocument document they switch to that recognition method transparently? 2) I use the isOrderedList() method for a list element to find out its type. And yes, for my test document in OO1 format (exported by OO2), this method always returns true for ordered *and* unordered lists (I can send you the document to your CPAN author address, is this address still valid?). I tracked this down in the Perl debugger and found that finally OO::OODoc::Element::hasTag() returns true for "text:ordered-list", which in fact is what $node->getName() returns for $node. But nevertheless, the list is unordered, so ... ?
Direct Responses: 2373 | Write a response
Posted on Tue May 30 00:05:31 2006 by jmgdoc in response to 2370
Re: How to distinguish lists correctly?

This is a specific OO2 issue: when a document is saved in OO1 format, the list objects are arbitrarily saved as 'text:ordered-list' XML elements. But each one is saved with an appropriate style, in order to be displayed with or without numbers.
The OO1-compatible XML, when generated by OO2, is not exactly the same as the native OO1 XML. One could regard that as an issue, but it's a transitional issue only. Hopefully, we will have to deal with only one open format soon.
Direct Responses: 2407 | Write a response
Posted on Sat Jun 3 10:45:25 2006 by jstenzel in response to 2373
Re: How to distinguish lists correctly?
Thank you for this information. So it turns out at the moment there are three formats: OO1, OO2 Transitional, and OO2/OpenDocument, right? Thinking about this and your last sentence I think the consequence will be that I do not support OO1 and OO2 transitional in my tool, as OpenDocument is more universal (more applications using it) and the former two can be converted into OpenDocument by the free OO2 software. But it turns out OpenDocument has its pitfalls, too. In my example, I cannot find the style. What's wrong with my code?
# extract document (in content and style parts) $me->{docContent}=ooDocument( archive => $me->{archive}, member => 'content', delimiters => \%delimiters, ); $me->{docStyles}=ooDocument( archive => $me->{archive}, member => 'styles', delimiters => \%delimiters, ); ... ... # choose an output format according to the type if ($element->isItemList) { # get attributes from style my $styleName=$me->{docContent}->textStyle($element); my $styleObject= $me->{docContent}->getStyleElement($styleName) || $me->{docStyles}->getStyleElement($styleName); my %attributes=(defined $styleName) ? $me->{docContent}->getStyleAttributes($styleObject) : (); # debug: list results so far use Data::Dumper; warn Dumper [ $styleName, $me->{docContent}->getStyleElement($styleName), $me->{docStyles}->getStyleElement($styleName), $styleObject, \%attributes, ]; ... }
This displays
[OpenOffice::OODoc::Styles::getStyleAttributes] Unknown style $VAR1 = [ 'L1', undef, undef, undef, {} ];
So, I can find out that the internal style assigned to the bullet point is "L1", but how to find the style object for it? Thanks in advance!
Direct Responses: 2408 | Write a response
Posted on Sat Jun 3 12:17:25 2006 by jmgdoc in response to 2407
Re: How to distinguish lists correctly?

The getStyleElement() and getStyleList(), without any other argument than the style name, can retrieve only the styles having the default namespace and the default type. The namespace and type options are mandatory in order to select another search space.

For an OpenDocument list style, the namespace is 'text' and the type is 'list-style' (don't ask me why). So, you could get a 'L1' style element like that:
$s = $doc->getStyleElement ("L1", namespace => 'text', type => 'list-syle');

However, a list style is more a style container than a real style. It hosts a set of specialized styles, typically one per possible hierarchical level in the list, whose namespace is 'text' and type is 'list-level-style-bullet' or 'list-level-style-number'. Each level style controls the item numbering logic, the bullet character, the space between the number/bullet and the text, and so on. Dealing with this kind of plumbing is a tricky business, and the present methods of OODoc are not really tailored to easily work with the styles of complex elements such as tables and lists. The more practical way consists of creating a the needed list style through the OpenOffice.org GUI and using it as a template.

However, the best way to distinguish "bulleted" lists and "numbered" lists consists of looking for the 'list-level-style-bullet' or 'list-level-style-number' children of the given list style.

Remember that a list style doesn't control the way the item texts are displayed. Each individual list item contains a paragraph which depends on a regular paragraph style. The paragraph style of an item can be reached using getItemList(), described in the O::O::Text man page.
Write a response