Thread

Posted on Tue Nov 20 00:11:36 2007 by jmccarrell
significant OIO overhead with thousands of objects
Hey folks.

Here at Yahoo! we write a fair amount of perl code, and I've been working on a moderately sized project that is near to shipping. The main process of the project uses OIO for a couple of its foundation working classes, and that process is showing performance issues. So I did some profiling with Devel::Profile and found some interesting results.

The profiled test code in question creates 50,001 OIO objects by parsing 50,001 lines of text, returning an OIO object for each line.

The top 5 results reported by Devel::Profile were:
time elapsed (wall): 1524.4849 time running program: 1490.9677 (97.80%) time profiling (est.): 33.5172 (2.20%) number of calls: 6252142 %Time Sec. #calls sec/call F name 91.89 1370.0146 100002 0.013700 Object::InsideOut::_ID 2.44 36.4517 150003 0.000243 <anon>:.../Yggdrasil/Classifier.pm:146 1.08 16.0312 1300033 0.000012 Object::InsideOut::Util::hash_re 0.63 9.3990 300006 0.000031 Object::InsideOut::set 0.47 7.0407 50001 0.000141 Yahoo::Yggdrasil::Classifier::parse_entity

At first, I didn't believe these results, so I ran another experiment that removed OIO from the equation. My Classifier::parse_entity() method parses the line, builds up a hash of parameters for the constructor, and returns an object as the last line by calling

return $class->new($rh_ent);

(this call to new is on line Classifer.pm:146, which shows up as the second line in the profiling output). So I removed OIO from the equation by commenting out the call to new, and ran the experiment again. The top 5 results were:

time elapsed (wall): 36.3511 time running program: 28.6448 (78.80%) time profiling (est.): 7.7063 (21.20%) number of calls: 1451844 %Time Sec. #calls sec/call F name 23.96 6.8634 50001 0.000137 Yahoo::Yggdrasil::Classifier::parse_entity 14.51 4.1571 50001 0.000083 Yahoo::Yggdrasil::Classifier::_parse_rule 8.75 2.5075 50001 0.000050 Regexp::Common::_decache 8.74 2.5042 50001 0.000050 Yahoo::Yggdrasil::Classifier::_parse_line 6.79 1.9455 400004 0.000005 Yahoo::Yggdrasil::Classifier::_decode_entity_part

N.B. that the total run time went from 1,524 seconds to 36 seconds.

I still didn't believe it, so I ran the same experiment without the profiler involved, just measuring the wall clock time with and without the call to new(). The results were 5 minutes 2 seconds with OIO, and 9.8 seconds without OIO.

Additionally, I observed that a significant portion of the test time appeared to be consumed during the DESTROY phase, deleting the 50,001 objects.

I conclude that there is a significant performance penalty using OIO when there are thousands of objects being created and destroyed.

First of all, are these surprising results to OIO gurus?

Second, what can be done to minimize the impact of id handling in OIO? I've read the advice in the "Special Processing" section of the OIO pod; I guess I am not too anxious to go hacking around with object identity issues in the absence of some guidance from those in the know.

At this point, I am looking at ways to remove OIO from being invoked thousands of times, and I'm considering doing an experiment with Class::Std to see if it has this kind of performance issue with tens of thousands of objects.

Any guidance greatly appreciated.

Direct Responses: 6484 | 6487 | 6506 | Write a response
Posted on Tue Nov 20 19:35:50 2007 by jmccarrell in response to 6475
Re: significant OIO overhead with thousands of objects
N.B. These results were obtained with version 3.17 of OIO. I note that v 3.21 claims to provide "significant performance improvements for object creation". Hopefully, simply moving to 3.21 will improve these results.
Direct Responses: 6485 | Write a response
Posted on Tue Nov 20 19:38:57 2007 by jdhedden in response to 6484
Re: significant OIO overhead with thousands of objects
Be sure to use the latest version: 3.33
Direct Responses: 6528 | Write a response
Posted on Tue Nov 20 23:09:33 2007 by jdhedden in response to 6475
Re: significant OIO overhead with thousands of objects
Well, I am not entirely surprised by the results. Object creation and destruction in OIO is expensive because of considerations for thread safety and reclamation of object IDs. (The latter keeps the arrays that store object data from continually growing when there is a lot of object turnover.) I'm hoping your tests using the lastest OIO version be better.

You mentioned Class::Std. My own testing shows it to be on a par with OIO as afar as object creation, but OIO is faster for data access. Because of the nature of inside-out objects, both are slower by an order of magnitude compared to standard blessed hashes.

One drawback to Class::Std that bears mentioning is that it doesn't support threads.

If your application can accommodate it, you could try a caching scheme to recycle old objects. Create an array to 'push' old objects onto, create a method to reinitialize an old object with new data, and use it like this:
# Reuse an old object if (my $obj = pop(@recycle_bin)) { return $obj->reinit($rh_ent); } # Create a new object return $class->new($rh_ent);
If you do feel compelled to switch, you might give consideration to Class::InsideOut. However, I can't speak to its performance.
Write a response
Posted on Thu Nov 22 22:02:04 2007 by shmem in response to 6475
Re: significant OIO overhead with thousands of objects

As an immediate relieve, check out Anno Siegel's Alter package which provides Inside-Out functionality at a bit more than no cost.

The OIO engine is quite heavy, so I'm not surprised by the overhead per se, but by the amount. I made tests with a 50.001 lines file, each line consisting of an int and 52 more fields, as (a..z) alternated with random words from/usr/share/dict/words.

perl -nle 'chop && push@w, $_ if !($.%10) } \ { $w=@w;print join("\t",$_,map{$_,$w[rand$w-1]}a..z) for 0..50000' \ /usr/share/dict/words > text

Well, not 50.001, but 49850, since some values in the a..z hash are undefined. Anyways...

Built OIO-object from each line, then Alter-objects:

perl -MWords -nle 'use Time::HiRes qw(gettimeofday tv_interval);BEGIN{$tz=[gettimeofday]} @l=split/ +\t/,$_;push @o,Words->new("uid",@l) if @l == 53}{print tv_interval($tz,[gettimeofday])' text 29.586015 perl -MWords2 -nle 'use Time::HiRes qw(gettimeofday tv_interval);BEGIN{$tz=[gettimeofday]} @l=split +/\t/,$_;push @o,Words2->new("uid",@l) if @l == 53}{print tv_interval($tz,[gettimeofday])' text 7.258618

Some improvement. I don't know what your constructor does, here are the two packages I used for the test:

package Words; use Object::InsideOut; my @uid :Field :Accessor(uid); my @a :Field :Accessor(a); my @b :Field :Accessor(b); my @c :Field :Accessor(c); my @d :Field :Accessor(d); my @e :Field :Accessor(e); my @f :Field :Accessor(f); my @g :Field :Accessor(g); my @h :Field :Accessor(h); my @i :Field :Accessor(i); my @j :Field :Accessor(j); my @k :Field :Accessor(k); my @l :Field :Accessor(l); my @m :Field :Accessor(m); my @n :Field :Accessor(n); my @o :Field :Accessor(o); my @p :Field :Accessor(p); my @q :Field :Accessor(q); my @r :Field :Accessor(r); my @s :Field :Accessor(s); my @t :Field :Accessor(t); my @u :Field :Accessor(u); my @v :Field :Accessor(v); my @w :Field :Accessor(w); my @x :Field :Accessor(x); my @y :Field :Accessor(y); my @z :Field :Accessor(z); my %init_args :InitArgs = ( map { $_, { Mandatory => 1 } } a..z ); sub init :Init { my ($self, $args) = @_; #no strict 'refs'; $self->$_($args->{$_}) for keys %$args; }; package Words2; use Alter ego => {}; sub new { my $class = shift; my $self = bless( \ my $o, $class); my $href = ego( $self); %$href = @_; $self; } # retrieve values sub AUTOLOAD { my $self = shift; (my $key = $AUTOLOAD) =~ s/.*:://; ego( $self) -> { $key}; }; 1;

What perl version do you have? what version of Scalar::Utils? Plain perl or XS version?

Write a response
Posted on Mon Nov 26 19:43:18 2007 by jmccarrell in response to 6485
Re: significant OIO overhead with thousands of objects
Ok, I re-ran my performance test after updating from version 3.17 to 3.33 of OIO. The results show an increase of a few percent; elapsed time decreased from 302 seconds to 297 seconds.
Direct Responses: 6541 | Write a response
Posted on Tue Nov 27 23:27:10 2007 by jdhedden in response to 6528
Re: significant OIO overhead with thousands of objects
It may be that your code is using some feature (e.g., regexes for argument names) that is not necessary, or any of a number of other causes. If you could send me directly an example of the code that you feel is problematic, I might be able to assist you further. Thanks.
Direct Responses: 6542 | Write a response
Posted on Wed Nov 28 01:25:17 2007 by jmccarrell in response to 6541
Re: significant OIO overhead with thousands of objects
Ok, Jerry now has a tarball to reproduce the results I've been seeing...
Direct Responses: 6543 | Write a response
Posted on Wed Nov 28 03:35:30 2007 by jdhedden in response to 6542
Re: significant OIO overhead with thousands of objects
Yowza! I found the problem. I was putting reclaimed IDs into an array, and then checking each new reclaimed ID against the contents of that array. This is an O(n^2) algorithm - very bad. I have changed the code to use a hash, reducing this check to an O(1) algorithm.

Give OIO 3.34 a try. I'm sure you'll be much happier with the preformance. Thanks for working with me on this.
Direct Responses: 6544 | Write a response
Posted on Wed Nov 28 04:42:49 2007 by jmccarrell in response to 6543
Re: significant OIO overhead with thousands of objects

Ok, this is confirmed. The elapsed time of my test dramatically improved from 302 seconds for version 3.17 to 24.3 seconds for 3.34.

I can't wait to see 3.34 at my local CPAN mirror.

Thanks Jerry!

Direct Responses: 7136 | Write a response
Posted on Tue Feb 19 19:15:44 2008 by arguile in response to 6544
Re: significant OIO overhead with thousands of objects

I'm actually seeing drastically decreased performance (ie. it's slower) when moving from 3.33 to 3.34 or greater when the perl is persistant. While the initial DESTROY phase is a lot faster it seems to be at the expense of later object creation.

To demonstrate this I stored a large (tens of thousands of nodes for a graph structure) result set from our system in a Cache::FileCache. In the test script I retrieve the structure ('GET'), undef it ('CLEAN'), then do the same again.

Version 3.33
Init Cache: 0.000522137 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) Retrival attempt: 0 : GET: 3.16013 wallclock secs ( 2.91 usr + 0.04 sys = 2.95 CPU) CLEAN: 32.3321 wallclock secs (30.80 usr + 0.01 sys = 30.81 CPU) 1 : GET: 3.18668 wallclock secs ( 2.98 usr + 0.01 sys = 2.99 CPU) CLEAN: 33.8066 wallclock secs (31.85 usr + 0.03 sys = 31.88 CPU) 2 : GET: 3.33551 wallclock secs ( 3.09 usr + 0.02 sys = 3.11 CPU) CLEAN: 34.509 wallclock secs (31.89 usr + 0.02 sys = 31.91 CPU)
Version 3.34
Init Cache: 0.000521898 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) Retrival attempt: 0 : GET: 3.04257 wallclock secs ( 2.90 usr + 0.04 sys = 2.94 CPU) CLEAN: 0.754264 wallclock secs ( 0.72 usr + 0.00 sys = 0.72 CPU) 1 : GET: 91.0177 wallclock secs (85.86 usr + 0.04 sys = 85.90 CPU) CLEAN: 0.805352 wallclock secs ( 0.73 usr + 0.00 sys = 0.73 CPU) 2 : GET: 81.2487 wallclock secs (76.66 usr + 0.05 sys = 76.71 CPU) CLEAN: 0.793248 wallclock secs ( 0.75 usr + 0.00 sys = 0.75 CPU)

As you can see version 3.33 had a very very slow destruction but the runs were predictable. In 3.34 and greater (tested to current at 3.36) the destruction is much faster but at the expense of predictability and much slower instantiation of subsequent objects and an overall greater net time.

Devel::DProf segfaults and Devel::Profiler currently produces and invalid tmon.out for some reason, so I haven't yet profiled this further. A sample dataset and the test script are available if it's needed.

Direct Responses: 7137 | Write a response
Posted on Tue Feb 19 20:13:29 2008 by arguile in response to 7136
Re: significant OIO overhead with thousands of objects

I determined the issue was in the ID reclamation with the line:

if ((my $id) = keys(%{$$reuse{$tree}[$thread_id]})) {

For a large number of objects this builds the entire array of IDs and grabs the first. If it's instead changed to:

keys(%{$$reuse{$tree}[$thread_id]}); # Reset hash iterator. if ((my $id) = each %{$$reuse{$tree}[$thread_id]}) {

We see the exact same IDs (each() gives keys/values in the same order as keys()) but without any of the list building overhead. This changes the profiling in the previous post to:

Init Cache: 0.000521898 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) Retrival attempt: 0 : GET: 3.04518 wallclock secs ( 2.88 usr + 0.05 sys = 2.93 CPU) CLEAN: 0.791305 wallclock secs ( 0.72 usr + 0.00 sys = 0.72 CPU) 1 : GET: 3.47432 wallclock secs ( 3.31 usr + 0.01 sys = 3.32 CPU) CLEAN: 0.738527 wallclock secs ( 0.73 usr + 0.00 sys = 0.73 CPU) 2 : GET: 3.59346 wallclock secs ( 3.36 usr + 0.02 sys = 3.38 CPU) CLEAN: 0.755293 wallclock secs ( 0.76 usr + 0.00 sys = 0.76 CPU)

The diff:

683c683,685 < if ((my $id) = keys(%{$$reuse{$tree}[$thread_id]})) { --- > keys(%{$$reuse{$tree}[$thread_id]}); # Reset hash iterator. > > if ((my $id) = each %{$$reuse{$tree}[$thread_id]}) {
Direct Responses: 7138 | Write a response
Posted on Tue Feb 19 20:32:08 2008 by jdhedden in response to 7137
Re: significant OIO overhead with thousands of objects
Brilliant. Thanks for the patch. New version forthcoming.
Write a response