Spreadsheet-ParseExcel - memory allocation and cpu usuage

Posted on Mon Jun 11 11:07:01 2007 by luna
memory allocation and cpu usuage

Hi...using Perl 5.8.7 for Windows with XP on a box with a processor speed of 1.8 Ghz.

I've written a script to parse a spreadsheet that's 3300 rows by 25 cols. The spreadsheet is all text strings ranging from 3 characters to 25 characters in length. The file size is 2.5 mb.

There are two methods that've I tried and both seem to use an inordinate amount of memory and cpu time.

When parsing the spreadsheet using method one (see below), cpu usuage peaks at 100% for 6-10 seconds and ram usuage is about 55 mb.

In method two I've added a cell handler function and instructed Parse::Excel to not save the parsed cell as per John Mcnamara's recommendation. In this configuration the CPU usuage peaks at 100% for up to 30 seconds and ram usuage is about 21 mb.

If method two is used without the cell handler function then the file parses faster but ram usuage is the same. The latter result is not surprising given the script is not actually processing any information from the spreadsheet without the cell handler function.

I've been searching for a way to reduce the parsing time and the memory usuage. If anyone has any other techniques to share or comments I'd appreciate it. So far it seems the only viable way is to pre-edit the actual spreadsheet I'm parsing. This makes the script considerably less valuable.

Thanks

philc

use Spreadsheet::ParseExcel; #method 1 #-------------------------- my $book = Spreadsheet::ParseExcel->Parse('test.xls'); #CPU usuage = 100% for 6-10 seconds #memory usuage = 55 mb #-------------------------- #method 2 #------------------------- my $parser = Spreadsheet::ParseExcel->new( CellHandler => \&cell_handler, NotSetCell => 1, ); sub cell_handler{ my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; if($sheet_index >0){ $workbook->ParseAbort(0); }else{ $data[$row][$col]=$cell->{Val}; } } #parse file; my $book = $parser->Parse('test.xls'); #CPU usuage = 100% for 30 seconds #memory usuage = 21mb #--------------------------
Write a response