|
It seems that encoding::warnings don't work the way it was advertised as when it is "use"d before any literals containing 8-bit they are swallowed by perl compiler.
For example, the following script won't raise a warning:
(For all the following examples note, that instead of "\xc0\xc1\xc2\xc3\xc4\xc5" actual code uses corresponding 8-bit characters. I could not post with these to this forum.)
#!/usr/bin/perl
use encoding::warnings; # encoding::warning is "use"d BEFORE 8-bit literals
my $str = "\xc0\xc1\xc2\xc3\xc4\xc5"; # first 6 letters of cyrillic alphabet in 8-bit encoding cp12
+51
my $yo = "\x{0401}"; # 7th letter of cyrillic alphabet, E with diaeresis
binmode STDOUT, ":encoding(utf8)"; # Implicitly set PerlIO layer on STDOUT
print "$str$yo\n"; # Concatenate byte- and unicode-strings.
To raise an "implicitly upgraded" warning the module must be imported after the problematic literals:
#!/usr/bin/perl
my $str = "\xc0\xc1\xc2\xc3\xc4\xc5"; # first 6 letters of cyrillic alphabet in 8-bit encoding cp12
+51
use encoding::warnings; # encoding::warning is "use"d AFTER 8-bit literals
my $yo = "\x{0401}"; # 7th letter of cyrillic alphabet, E with diaeresis
binmode STDOUT, ":encoding(utf8)"; # Implicitly set PerlIO layer on STDOUT
print "$str$yo\n"; # Concatenate byte- and unicode-strings.
Since the 8-bit encoded literals are widely scattered in legacy code, this module is almost useless in its current form.
The problem lies in the following lines in its' source code:
# Don't worry about source code literals.
sub cat_decode {
my $self = shift;
return $self->[LATIN1]->cat_decode(@_);
}
These lines implicitly pass the 8-bit source code through a latin-1 decoder. By importing encoding::warnings, the whole 8-bit source code that is compiled after this momement is fed through this decoder. More dangerously, every 8-bit literal is converted to Unicode using latin-1 convertor in situations where no conversion is awaited without "use encoding::warnings". This can lead to data corruption.
The solution is to add a code that issues a warning in cat_decode method of encoding::warnings (and does not convert the source!!!). The only problem is it will spam about _every_ 8-bit literal, even those that will be never "implicitly upgraded" to Unicode.
The workaround is to place "use encoding::warnings" string late in the program (which is not applicable for using it in modules).
If the community is interested, I can send a simple patch to the cat_decode method that will work fine.
|