|
Posted on Thu Aug 17 22:37:05 2006
by iaw4
|
| empty cell or col bug or feature of getTableText in 2.113 dist? |
|
It seems to me that getTableText has some strange behavior when it comes to empty cells/columns.
#!/usr/bin/perl -w
use strict;
use OpenOffice::OODoc;
my $doc= ooDocument(file => "test.ods");
$doc->{'field_separator'}= ",\t";
my $sheet= $doc->getTableText("Sheet1", 1000, 1000);
print $sheet;
and my ods test file contains
1 6 9 13
2 10
3 15
4 7 12 16
5 8
(i.e., C column and E columns are blank. 10 appears in D column. 15 in F column.)
The output of my program, however, is
Argument "1.15_02" isn't numeric in subroutine entry at /usr/lib/perl5/site_perl//5.8.8/OpenOffice/
+OODoc/File.pm line 16.
1, 6, , 9, , 13
2, , 10, ,
3, , , , 15
4, 7, , 12, , 16
5, 8,
|
|
|
Posted on Thu Aug 17 23:07:40 2006
by mlcohen
in response to 2789
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
If you replace:
my $sheet= $doc->getTableText("Sheet1", 1000, 1000);
with
my $normsheet = $doc->normalizeSheet("Sheet1", 10, 10);
my $sheet= $doc->getTableText(normsheet, 10, 10);
things will work. Check out the documentation on normalizeSheet. Short version is that staroffice compresses tables, and it takes a long time to decompress them, so rather than doing it fully for you, you have to specify how much to decompress. The main symptom of not decompressing, or 'normalizing', the sheet is that funny things happen to blank cells, especially multiple blank cells in a row. The key is to normalize as small an area as possible, because it really will take a long time to normalize a large area. Try to know the row and column # beforehand, if possible.
Hope that helps,
Matt
|
|
|
Posted on Sun Aug 20 05:06:55 2006
by iaw4
in response to 2790
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
hi matt: aha! thanks for the info. this is indeed exactly what I needed. quite a headscratcher.
is there a way to find out what the bottom right cell in a spreadsheet is? right now, I am using 1000,1000 simply because I don't know how to get the latter.
regards,
/ivo
|
|
|
Posted on Sun Aug 20 05:22:53 2006
by iaw4
in response to 2797
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
celebrated too early--the code fragment that first normalizes and then gettabletexts fails:
$ perl ods2csv.pl test2.ods
Argument "1.15_02" isn't numeric in subroutine entry at /usr/lib/perl5/site_perl//5.8.8/OpenOffice/
+OODoc/File.pm line 16.
wrong condition 'table:(covered-|)table-cell' at /usr/lib/perl5/site_perl//5.8.8/OpenOffice/OODoc/T
+ext.pm line 2220
|
|
|
Posted on Mon Aug 21 18:03:44 2006
by iaw4
in response to 2790
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
hi matt: not sure if you saw my response. I cannot get around an error message, no matter what I try (including your example, [except normsheet -> $normsheet]):
wrong condition 'table:(covered-|)table-cell' at /usr/lib/perl5/site_perl//5.8.8/OpenOffice/OODoc/T
+ext.pm line 2220
the docs
http://www.annocpan.org/~JMGDOC/OpenOffice-OODoc-2.027/OODoc/Text.pod
suggest giving
$doc = ooDocument(file => 'report.sxc');
my $sheet = $doc->normalizeSheet('Sheet1', 7, 9);
but this fails, too.
is this a bug or a feature?
regards, /iaw
|
|
|
Posted on Mon Aug 21 18:16:29 2006
by mlcohen
in response to 2805
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
I don't know what to say...I have your sample code exactly as you posted before:
#!/usr/bin/perl -w
use strict;
use OpenOffice::OODoc;
my $doc= ooDocument(file => "test.ods");
$doc->{'field_separator'}= ",\t";
my $normsheet = $doc->normalizeSheet("Sheet1", 10, 10);
my $sheet= $doc->getTableText($normsheet, 10, 10);
#my $sheet= $doc->getTableText("Sheet1", 1000, 1000);
print $sheet;
and it works perfectly for me. I have your original line commented out, and I get the same results as you had, and then I replace it with the two lines about normsheet and everything works. Did you try the simple test case again, with the new code? Or are you trying in your full ods2csv script? Try the test case again, I bet it will work, and the problem is elsewhere in the script. And don't worry, this package is a bit confusing to use at first, just like anything that's powerful...it will take a bit of time to figure out what makes it tick. And sometimes, there are bugs :)
-Matt
|
|
|
Posted on Wed Sep 6 04:37:48 2006
by phil
in response to 2806
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
using:
XML::Twig 3.26 (and 3.27 development)
OpenOffice::OODoc::Text 2.225
I see the same problem when calling normalizeSheet. It appears twig doesn't like an expression being passed to it.
I need to finish something yesterday, so I haven't really looked harder, but as
an untested workaround, I did this to _expand_row() in OODdoc's Text.pm (line 2220):
original:
my @cells = $row->selectChildElements
('table:(covered-|)table-cell');
changed to:
my @cells = $row->selectChildElements
('table:covered-table-cell');
push(@cells, $row->selectChildElements
('table:table-cell'));
matt, what version of twig are you running?
|
|
|
Posted on Wed Sep 6 12:49:08 2006
by phil
in response to 2924
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
p.s. the above hack is only a workaround if you are writing to cells. If you are reading
cells, there's more places in Text.pm that use the expression:
('table:(covered-|)table-cell')
All these places would need looking at, but
the right thing is to figure out the real problem and let the appropriate module author know...I'll do that if nobody else does in the next few days. Info from matt on his twig version would help a lot.
|
|
|
Posted on Wed Sep 6 13:17:10 2006
by bernos
in response to 2928
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
Hi,
I have the same problem. This simple code:
use OpenOffice::OODoc;
my $archive = ooFile('chart1.sxc') or die "Cannot open input file\n";
my $content = ooDocument(archive => $archive) or die "Cannot extract content from input file\n";
my $table = $content->getTable(0,10,2);
produces this error:
wrong condition 'table:(covered-|)table-cell' at /usr/lib/perl5/site_perl/5.8.5/OpenOffice/OODoc/Text.pm line 2220
I'm using OpenOffice-OODoc-2.027 and XML-Twig-3.26
|
|
|
Posted on Wed Sep 6 15:35:21 2006
by jmgdoc
in response to 2929
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
The bug is pinpointed in XPath.pm 2.017, so Text.pm should not be patched and the XML::Twig version (3.22 or later) doesn't matter .
The fix will be done in the next O::O release. However, in the meantime, the existing XPath.pm should be manually replaced by a provisional one, which is now available at
http://jean.marie.gouarne.online.fr/tech/oodoc/XPath.pm
|
|
|
Posted on Wed Sep 6 17:28:57 2006
by mlcohen
in response to 2928
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
I'm using OpenOffice-OODoc 2.026, with XML-Twig 1.303, it seems. At least, that's what I assume this line at the top of Twig.pm means:
# $Id: Twig_pm.slow,v 1.303 2006/05/26 08:07:14 mrodrigu Exp $
Seems like mine is a lot older than what the rest of you are using. I guess I should talk to my IT guys about upgrading it.
-Matt
|
|
|
Posted on Wed Sep 6 17:54:43 2006
by jmgdoc
in response to 2931
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
Don't worry about this Twig_pm.slow heading comment; it doesn't indicate the real Twig.pm version number.
Look at the $VERSION variable in the BEGIN block (near the line #90) in Twig.pm
Note that you could not run OpenOffice::OODoc 2.026 without XML::Twig 3.22 or later.
|
|
|
Posted on Wed Sep 6 18:09:29 2006
by mlcohen
in response to 2932
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
In that case, I have version 3.26.
|
|
|
Posted on Tue Sep 12 00:16:22 2006
by jmgdoc
in response to 2930
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
OpenOffice::OODoc 2.028 has been posted today.
This release fixes a bug affecting some table-related access methods (previously reported in this forum).
In addition, the OpenOffice::OODoc::Text manual section has been updated in order to describe the text box related methods (available but not documented in 2.027).
|
|
|
Posted on Sat Oct 14 02:22:56 2006
by cgrauer
in response to 2998
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
I installed 2.028 but I still experiance this problem. getTableText merges empty cells (not all but some of them. I found no regularity), also on normalized tables (the results are different for normalized and non normalized tables, however both are false).
|
|
|
Posted on Sun Oct 15 02:17:04 2006
by cgrauer
in response to 3257
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
I hope this forum is the right place for the following. I got to know OODoc only two days ago and I'm unfamiliar with it's developing community etc. And I apologize in advance for my bad english as I'm not a native speaker.
I spent part of my weekend examining the sourcecode of OpenOffice::OODoc::Text to understand, how it works and why I experianced the mentioned problems. Just in case anyone is interested in the solution, I post it here:
My problem was this: I use the function getTableText() to 'export' Data from all the tables in a spreadsheet to csv files. I don't know in advance how many tables the sheet contains nore the names and the size of them. This is how I read the content from all the tables and store it in a hash-array-structure:
sub y_getTableData {
# returns a ref on a list containing hashes with tablename, table number
# and the content in text format (csv)
my $ood=shift; # ods document object
my ( @array );
my $count = scalar(($ood->getTableList())) - 1;
for my $n (0 .. $count) {
my ( %hash );
$hash{'number'} = $n;
$hash{'text'} = $ood->getTableText($n);
$hash{'name'} = $ood->getAttribute($ood->getTable($n),'table:name');
push ( @array, { ( %hash ) } );
}
return [ @array ];
}
Now, as I didn't knew, OODoc by default normalizes only 32 rows and 26 cols ( cf. options 'max_cols' and 'max_rows'). This caused the errors in my csv files because my tables contain up to 2100 rows and could contain even more in future. So I first changed the option 'max_rows' to 65536. This produced correct csv files - after several cups of coffee though! So I thought of first counting the rows of the sheet and then call getTableText with the parameters 'width' and 'length' to make OODoc normalize only the given size. But this is not possible as the sheet is already normalized when I count the rows (or in other words: to count the rows, the table has to be normalized first).
So I decided to patch OODoc::Text, telling it to stop normalization as soon as it finds a row with the ROW_REPEAT_ATTRIBUTE (table:number-rows-repeated in the XML-File) equal or greater than the remaining number of rows to normalize (i.e. the number of allready processed rows subtracted from paramter 'length' or the option 'max_rows' and decreased by 2), assuming that this indicates the empty rows until the end of the sheet. As I found in the XML-Code of the Sheet, OpenOffice leaves the last row as separate row with no repeate attribute. This is why I dereased the value by 2.
No I can change the option 'max_rows' to 65536 (and 'max_cols' to 256) to be sure, every table will be normalized properly irrespective of it's size and in a reasonable time. It only may cause problems if the row number 65536 (i.e. the last row!) is not empty or if there are repeated non-empty rows until the end of the sheet.
These are the changes I made on OpenOffice::OODoc::Text:
Changing line 2350 to:
while ($rep > 1 && ($rownum < $length) && !$skip )
Changing line 2359 to:
if ( ( $rownum < $length ) && !$skip )
Inserting a new line after line 2349 with:
$skip = $rep > ( $length - $rownum - 2 ) ? 1 : 0;
Inserting a new line after line 2339 with:
my $skip = 0;
That's all. Will this affect the module's functions in any way? I can't imagine. Perhaps it is even a suggestion for the developpers of OODoc. I'm not sure but I think csv-export of spreadsheets could be a common task for OODoc.
|
|
|
Posted on Sun Oct 15 16:51:57 2006
by jmgdoc
in response to 3258
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
Hopefully, you should not need to patch O::O::Text in order to change the size of the area to be normalized. In order to control this size, you can provide getTable() with the appropriate values as optional arguments. Example:
my $table = $doc->getTable($tablename, $height, $width);
my $text = $doc->getTableText($table);
When called with optional size arguments, getTable() automatically calls normalizeSheet() with the given values (and the default size is ignored).
In order to make the table more safe for getTableText(), you could, *after* the code above, delete every possible extra row. Example:
my ($h, $w) = $doc->getTableSize($table);
for (my $i = $height ; $i < $h ; $i++)
{
$doc->deleteRow($table, $i);
}
In addition, it's possible to remove the extra cells (if any) in each row in the normalized area, according the difference between the normalized width ($width) and the external width ($w) returned by getTableSize().
|
|
|
Posted on Sun Oct 15 23:28:29 2006
by cgrauer
in response to 3261
|
| Re: empty cell or col bug or feature of getTableText in 2.113 dist? |
|
I know getTable($tablename, $height, $width) - but what if I don't know height and width (as I described yesterday...)??
And getTableSize: the problem is not, to "get" the empty rows (getTableText truncates empty rows at the end anyway), but the normalization takes much time if you normalize 65536 rows.
Btw. I found that my solution from yesterday does not work properly. It was a little bit rash, to post it, sorry. But I already have a new one ;-) I don't like to patch modules either, but I don't know an other solution. So now I made the following with O::O::Text:
After line 2316 I insert:
#-----------------------------------------------------------------------------
# checks wether a row contains data in any cell (result:0) or not (result:1)
sub _is_empty_row {
my $self = shift;
my $row = shift;
my $cell_values = join('', $self->_get_row_content( $row ) );
if ( $cell_values =~ /\w/si ) {
return 0;
} else {
return 1;
}
}
After line 2339 (now 2355) I insert:
if ( $self->{'truncate_empty_cells'} ) {
while ( $self->_is_empty_row( $rows[$#rows] ) ) {
pop( @rows );
}
}
I Change line 2359 (now 2382) to:
if ( ( $rownum < $length ) && (not $self->{'truncate_empty_cells'}) )
Assuming that the global option 'truncate_empty_cells' set to '1'.
This takes time as well, but not as much as normalizing the whole table. And in respect to yesterday's solution it works ;-)
|
|