Thread

Posted on Fri Aug 12 00:02:11 2005 by cmv
Bug in Channel.pm when stdin length is just right
Folks-
I believe I've uncovered a bug when I try to do:
$ssh->cmd($cmd, $stdin);
...and the length of $stdin is just right.
I've narrowed it down to the process_outgoing() routine in Channel.pm, but didn't follow it any further. Here is the debug output (to which, I've added some additional debugging statements):
... nimrod.ih.lucent.com: Login completed, opening dummy shell channel. nimrod.ih.lucent.com: channel 0: new [client-session] nimrod.ih.lucent.com: Requesting channel_open for channel 0. nimrod.ih.lucent.com: channel 0: open confirm rwindow 0 rmax 32768 nimrod.ih.lucent.com: Got channel open confirmation, requesting shell. nimrod.ih.lucent.com: Requesting service shell on channel 0. nimrod.ih.lucent.com: channel 1: new [client-session] nimrod.ih.lucent.com: Requesting channel_open for channel 1. nimrod.ih.lucent.com: Entering interactive session. nimrod.ih.lucent.com: Sending command: /opt/nwstools/bin/predecode nimrod.ih.lucent.com: Requesting service exec on channel 1. nimrod.ih.lucent.com: Checking stdin... nimrod.ih.lucent.com: Draining stdin... nimrod.ih.lucent.com: channel 1: drain_outgoing - length=32769 nimrod.ih.lucent.com: channel 1: process_outgoing - remote_window=0 nimrod.ih.lucent.com: channel 1: process_outgoing - remote_maxpacket=32768 nimrod.ih.lucent.com: channel 1: process_outgoing - length=32769 len=0 nimrod.ih.lucent.com: channel 1: process_outgoing - remote_window=0 nimrod.ih.lucent.com: channel 1: process_outgoing - remote_maxpacket=32768 nimrod.ih.lucent.com: channel 1: process_outgoing - length=32769 len=0 nimrod.ih.lucent.com: channel 1: process_outgoing - remote_window=131072 nimrod.ih.lucent.com: channel 1: process_outgoing - remote_maxpacket=32768 nimrod.ih.lucent.com: channel 1: process_outgoing - length=32769 len=32768 nimrod.ih.lucent.com: channel 1: process_outgoing - remote_window=98304 nimrod.ih.lucent.com: channel 1: process_outgoing - remote_maxpacket=32768 nimrod.ih.lucent.com: channel 1: process_outgoing - length=1 len=1 nimrod.ih.lucent.com: channel 1: window 12173 sent adjust 20595

As you can see, the length of the stdin string was just one over the remote_maxpacket. If the stdin string is under remote_maxpacket, it all works fine. You can also make it work fine with particular values over remote_maxpacket, but I haven't been able to pin down a pattern.
Any help is appreciated.
Thanks
-Craig
Direct Responses: 892 | Write a response
Posted on Mon Aug 15 17:06:00 2005 by cmv in response to 873
Re: Bug in Channel.pm when stdin length is just right
I've found the bug, and have fixed it as described below. I don't think this is the best way to fix it, but it did prove my theory on how the bug was occuring. The problem occurs due to the interaction between drain_outgoing (in Channel.pm) and client_loop in (SSH2.pm). If your $stdout is bigger than remote_maxpacket, drain_outgoing tries to repeatedly call client_loop until the length is reduced to zero. The problem is that client_loop will not return back to drain_outgoing, once it is called from inside the while loop the second time. MY FIX: I reasoned that if I could force the client_loop to execute one-and-only-one-time, for each time it was called from the drain_outgoing while loop, there would be no problem. To test this, I did the following:
1.) drain_outgoing: added the following line before the while loop
$c->{ssh}->{DoOneLoop}=1;
2.) drain_outgoing: added the following line after the while loop
undef($c->{ssh}->{DoOneLoop});
3.) client_loop: added the following line just before the end of the first while loop
last if($ssh->{DoOneLoop});
This seems to have fixed it for all scenarios that I've tested.
Please let me know if there is a better way to fix this, and how we can get the fix officially into the module.
-Craig
Direct Responses: 970 | Write a response
Posted on Thu Sep 8 01:35:34 2005 by jongorama in response to 892
Re: Bug in Channel.pm when stdin length is just right
There are two scenarios where this is not a complete fix:
1. when the remote peer is running OpenSSH 3.7.1p3
2. when the remote peer is running Sun's deployed SSH identified by 'SSH-1.99-Sun_SSH_1.1'

In those two cases, it appears the $select_class->select() call in Net::SSH::Perl::SSH2::client_loop blocks indefinitely waiting for input (when the length of $stdin in Net::SSH::Perl::SSH2::cmd() is larger than 32678). Strangely enough, input onto STDIN of the running tty is put into $stdin (not exactly sure how *that* happens, but oh well) and then sent to the remote peer.

To get around this, we've implemented the following:
diff /opt/rcs/lib/Net/SSH/Perl/Channel.pm /opt/rcs/os_deployment/lib/Net/SSH/Perl/Channel.pm 191a192,193 > ## COVD FIX: > $c->{ssh}->{DoOneLoop} = 10; 195a198,200 > ## COVD FIX: (yes, I know it's redundant) > undef( $c->{ssh}->{DoOneLoop} ); > delete $c->{ssh}->{DoOneLoop}; diff /opt/rcs/lib/Net/SSH/Perl/Constants.pm /opt/rcs/os_deployment/lib/Net/SSH/Perl/Constants.pm 138c138 < 'MAX_PACKET_SIZE' => 256000, --- > 'MAX_PACKET_SIZE' => 8192, diff /opt/rcs/lib/Net/SSH/Perl/SSH2.pm /opt/rcs/os_deployment/lib/Net/SSH/Perl/SSH2.pm 302c303,306 < my($rready, $wready) = $select_class->select($rb, $wb); --- > ## COVD FIX: > $ssh->debug("Instantiating a select with $ssh->{DoOneLoop} second timeout.") > if (exists $ssh->{DoOneLoop} && $ssh->{DoOneLoop} > 0); > my($rready, $wready) = $select_class->select($rb, $wb, undef, ($ssh->{DoOneLoop} || undef +)); 313a318,319 > ## COVD FIX: > last if ($ssh->{DoOneLoop});

Note that we've also adjusted the MAX_PACKET_SIZE contstant. The value that was there seemed to increase the timeouts when there was no real good reason to do so. A smaller value seems to work just fine. Files of 128MB have been successfully transferred via $stdin to cmd() with the above fixes to various implementations of SSH (SSH-1.99-OpenSSH_3.7.1p2, SSH-1.99-Sun_SSH_1.0.1, and SSH-1.99-Sun_SSH_1.1).
cmv, thanks for the pointer initially.
-gilbert.
Direct Responses: 1889 | Write a response
Posted on Thu Mar 2 03:29:31 2006 by jongorama in response to 970
Re: Bug in Channel.pm when stdin length is just right
As a quick followup to this - the diffs posted are ambiguous what with the line changes and such. Below you can find a context diff to help you patch Channels.pm, Constants.pm, and SSH2.pm (all from the 1.29 version).
$ diff -c b/Channel.pm ./Channel.pm *** b/Channel.pm 2003-12-03 07:35:21.000000000 -0800 --- ./Channel.pm 2006-02-28 22:00:43.000000000 -0800 *************** *** 189,198 **** --- 189,201 ---- $c->register_handler(SSH2_MSG_CHANNEL_WINDOW_ADJUST, sub { $_[0]->{ssh}->break_client_loop }); + $c->{ssh}->{DoOneLoop} = 10; while ($c->{input}->length) { $c->process_outgoing; $c->{ssh}->client_loop if $c->{input}->length; } + undef( $c->{ssh}->{DoOneLoop} ); + delete $c->{ssh}->{DoOneLoop}; $c->drop_handler(SSH2_MSG_CHANNEL_WINDOW_ADJUST); $c->{ssh}->restore_client_loop; } $ diff -c b/SSH2.pm ./SSH2.pm *** b/SSH2.pm 2005-10-08 14:04:24.000000000 -0700 --- ./SSH2.pm 2006-03-01 17:05:07.340954072 -0800 *************** *** 301,307 **** my $oc = grep { defined } @{ $cmgr->{channels} }; last unless $oc > 1; ! my($rready, $wready) = $select_class->select($rb, $wb); $cmgr->process_input_packets($rready, $wready); for my $a (@$rready) { --- 301,307 ---- my $oc = grep { defined } @{ $cmgr->{channels} }; last unless $oc > 1; ! my($rready, $wready) = $select_class->select($rb, $wb, undef, ($ssh->{DoOneLoop} || undef ) +); $cmgr->process_input_packets($rready, $wready); for my $a (@$rready) { *************** *** 313,318 **** --- 313,319 ---- $ssh->incoming_data->append($buf); } } + last if ($ssh->{DoOneLoop}); } } $ diff -c b/Constants.pm ./Constants.pm *** b/Constants.pm 2001-07-03 00:04:52.000000000 -0700 --- ./Constants.pm 2006-02-28 22:05:34.000000000 -0800 *************** *** 135,141 **** 'PRIVATE_KEY_ID_STRING' => "SSH PRIVATE KEY FILE FORMAT 1.1\n", ! 'MAX_PACKET_SIZE' => 256000, ); use vars qw( %TAGS ); --- 135,142 ---- 'PRIVATE_KEY_ID_STRING' => "SSH PRIVATE KEY FILE FORMAT 1.1\n", ! #'MAX_PACKET_SIZE' => 256000, ! 'MAX_PACKET_SIZE' => 8192, ); use vars qw( %TAGS );
The easiest way to test this is to create two seperate files - one of 32764 bytes, and one of 32765 bytes - and use the 1.29 distro's 'pscp' tool to transfer them to the target node. If you can successfully transfer both files, then the fix is in.
Additionally, note that this fix was tested OK for SSH identifing itself as 'SSH-2.0-OpenSSH_4.2p1 Debian-5'.
Note that while I'm not certian, it's likely that this will resolve the following currently open or new CPAN bugs:
- 17475: Net::SSH::Perl blocks/crashes on IO-Volume > 32kB - 8690: Problem with large stdin to cmd call - 7910: $shh("cat -> file",$data) hangs on large amount of data
...which all appear to be the same thing.
-gilbert.
Direct Responses: 2501 | Write a response
Posted on Fri Jun 16 23:34:52 2006 by cmv in response to 1889
Re: Bug in Channel.pm when stdin length is just right
Folks-

Not only is this bug appearing on many different platforms, but the fix I proposed earlier (and the subsequent jgilbert fix) doesn't work on some of them with v1.30 (ppc-linux in particular).

I've written a perl script to cause the bug to occur, and identify just how many characters your system needs to trigger it. Can you try running it and see if you have the bug and post your results here? I've also sent this to Dave Robins to see if he can figure this but out once and for all.

Attached is my output.

Thanks
-Craig
#!/usr/bin/perl use strict; use warnings; use English; use Data::Dumper; # Set these for your remote host... my $user = "user"; my $password = "password"; my $host = "machine"; # Time to wait for remote command to finish... my $timeout=10; # Consider upping this if you change $high # Remote command... my $cmd = "cat"; # Print heading... print "Net::SSH::Perl bug test...\n"; print "host=$host user=$user remoteCmd=$cmd\n"; # Set variables... my $high=932779; # Maximum number of starting xmitChars my $low=0; my $i=$high; my $delta=$high-$low;; my $ssh; # Main loop... while($delta>1) { undef($ssh); print "xferChars=$i"; # setup SSH... $ssh = newssh() || die "Can't create a new ssh, exiting..."; print '|'; # Indicates newssh worked... $ssh->login($user,$password) || die "Can't login, exiting..."; print '-'; # Indicates successful login... # Proper values of $i will trigger the bug... my $stdin = '=' x $i; # Run remote command... my ($stdout, $stderr, $exit) = runcmd($ssh, $cmd, $stdin, $timeout); # Check output... if(defined($stdout)) { if(length($stdout) != $i) { die "Data mismatch, expected: $i chars, got: ", length($stdout), "chars\n"; } # Go back up... $delta = $high-$i; $low = $i; $i += int($delta/2); print "OK, back up to $i chars\n" }else{ # Go down... $delta = $i-$low; $high = $i; $i -= int($delta/2); print "Error, down to $i chars\n\t$stderr\n"; next; } } if ($i == $high) { print "\nBug did not trigger with $high chars on this system.\n"; }else{ print "\nReport for remote host: $host...\n"; print "xferChars=$low works, xferChars=$high hangs on timeout=$timeout.\n"; } print "exiting...\n"; sub newssh { my $ssh; # Require the correct SSH package, based on OS... if($OSNAME=~/^(MSWin32)/) { require Net::SSH::W32Perl; $ssh = Net::SSH::W32Perl->new( $host, port => 22 , protocol => 2, #debug => 999, ); }else{ require Net::SSH::Perl; $ssh = Net::SSH::Perl->new( $host, port => 22 , protocol => 2, #debug => 999, ); } return($ssh); } sub runcmd { my $ssh = shift || die "Missing ssh object"; my $cmd = shift || die "Missing cmd"; my $stdin = shift || die "Missing stdin"; my $timeout = shift || die "Missing timeout"; my ($stdout, $stderr, $exit); eval { local $SIG{'ALRM'} = sub { die "alarm handler $timeout second timeout"; }; alarm($timeout); ($stdout, $stderr, $exit) = $ssh->cmd($cmd, $stdin); alarm(0); }; if($@) { $stderr=$@; $stderr=~s/( at |\n).*$/ /s; } return ($stdout, $stderr, $exit); }
Here is my output:
Net::SSH::Perl bug test... host=nwsgpb user=watchmrk remoteCmd=cat xferChars=932779|-Error, down to 466390 chars Received disconnect message: Corrupted MAC on input. xferChars=466390|-Error, down to 233195 chars Received disconnect message: Corrupted MAC on input. xferChars=233195|-Error, down to 116598 chars Received disconnect message: Corrupted MAC on input. xferChars=116598|-Error, down to 58299 chars Received disconnect message: Corrupted MAC on input. xferChars=58299|-Error, down to 29150 chars alarm handler 10 second timeout xferChars=29150|-Error, down to 14575 chars alarm handler 10 second timeout xferChars=14575|-Error, down to 7288 chars alarm handler 10 second timeout xferChars=7288|-OK, back up to 10931 chars xferChars=10931|-OK, back up to 12753 chars xferChars=12753|-Error, down to 11842 chars alarm handler 10 second timeout xferChars=11842|-Error, down to 11387 chars alarm handler 10 second timeout xferChars=11387|-OK, back up to 11614 chars xferChars=11614|-Error, down to 11501 chars alarm handler 10 second timeout xferChars=11501|-OK, back up to 11557 chars xferChars=11557|-Error, down to 11529 chars Received disconnect message: Corrupted MAC on input. xferChars=11529|-OK, back up to 11543 chars xferChars=11543|-Error, down to 11536 chars Received disconnect message: Corrupted MAC on input. xferChars=11536|-Error, down to 11533 chars alarm handler 10 second timeout xferChars=11533|-OK, back up to 11534 chars xferChars=11534|-OK, back up to 11535 chars xferChars=11535|-OK, back up to 11535 chars Report for remote host: nwsgpb... xferChars=11535 works, xferChars=11536 hangs on timeout=10. exiting...
Direct Responses: 6967 | Write a response
Posted on Tue Jan 29 19:09:50 2008 by swares in response to 2501
Re: Bug in Channel.pm when stdin length is just right
I seem to have the same issue on several machines running various Linux distributions on varying x86 hardware. I have tried to rebuild using the most recent stable Perl and modules but the issue persists. Here is my output.
Net::SSH::Perl bug test... host=pendulum.boulder.ibm.com user=root remoteCmd=cat xferChars=932779|-Error, down to 466390 chars alarm handler 10 second timeout xferChars=466390|-Error, down to 233195 chars Received disconnect message: Corrupted MAC on input. xferChars=233195|-Error, down to 116598 chars Received disconnect message: Corrupted MAC on input. xferChars=116598|-Error, down to 58299 chars alarm handler 10 second timeout xferChars=58299|-Error, down to 29150 chars Received disconnect message: Corrupted MAC on input. xferChars=29150|-Error, down to 14575 chars alarm handler 10 second timeout xferChars=14575|-Error, down to 7288 chars alarm handler 10 second timeout xferChars=7288|-OK, back up to 10931 chars xferChars=10931|-OK, back up to 12753 chars xferChars=12753|-OK, back up to 13664 chars xferChars=13664|-Error, down to 13209 chars alarm handler 10 second timeout xferChars=13209|-Error, down to 12981 chars alarm handler 10 second timeout xferChars=12981|-OK, back up to 13095 chars xferChars=13095|-Error, down to 13038 chars alarm handler 10 second timeout xferChars=13038|-Error, down to 13010 chars alarm handler 10 second timeout xferChars=13010|-Error, down to 12996 chars Received disconnect message: Corrupted MAC on input. xferChars=12996|-Error, down to 12989 chars Received disconnect message: Corrupted MAC on input. xferChars=12989|-OK, back up to 12992 chars xferChars=12992|-Error, down to 12991 chars Received disconnect message: Corrupted MAC on input. xferChars=12991|-Error, down to 12990 chars Received disconnect message: Corrupted MAC on input. xferChars=12990|-OK, back up to 12990 chars Report for remote host: pendulum... xferChars=12990 works, xferChars=12991 hangs on timeout=10. exiting...
Write a response