Alfred, thanks for sharing the PDF::API2 module. It has enabled me to audit more than 10,000 PDF documents without breaking a sweat.
I did run into one problem that I can't lick. Approximately one out of ten PDF documents contain wide characters that $pdf-info() returns as "junk_text" rather than "(OHS).PDF" as seen in Acrobat 6.0, Document Properties, Description. This also happens with encrypted documents...
The CPAN::Forum would not accept the remainder of this post. Here's some PDF examples.
www.gs.gov.nl.ca/ohs/pdf/ann-rep-whsi.pdf (garbled)
www.gs.gov.nl.ca/cca/cr/pdf/coop/coop21-art-dis.pdf (works)
www.gs.gov.nl.ca/misc/data/gazette/wk/2006-01-13.pdf (garbled, encrypted file)
Can PDF-API2 process unicode characters in meta info?
Thanks, Rob
(11)
]
