Hi Mark,
Thanks for the sample via email.
I think the thing to do is to assume Latin1 coding unless otherwise specified.
This should fix the problem with your sample image at least. It is a rather significant
change to start translating IPTC text, but I hope I have done it in a way that won't
break things for too many people, and hopefully it will solve more problems than
it creates.
The strategy now is to convert IPTC text if the CodedCharacterSet is recognized,
and to assume Latin1 if the CodedCharacterSet tag doesn't exist. The ISO 2022
escape sequences used to switch between different codings are not yet supported,
and the text is assumed to be all in a single character set. Also, when creating
a new IPTC record from scratch, a CodedCharacterSet value of "UTF8" is written by
default.
The new version will require a lot of testing since this is a fairly significant change.
It would help if you could help with this effort. I have uploaded a
6.70
pre-release here for you to play with.
Note that the translations are only performed if the coding is Latin1 or UTF8. Otherwise
no translation is done. This will all be spelled out in the new FAQ #10, which
will read:
IPTC: IPTC text is converted only for recognized values of
the IPTC:CodedCharacterSet tag. Currently recognized encodings are UTF-8
("UTF8" or "ESC % G") and Latin1/ISO-8859-1
("Latin" or "ESC . A"). "Latin"
is assumed if the CodedCharacterSet tag is missing. No translation is performed
for all other values of CodedCharacterSet. When reading, text is translated to
UTF-8 by default, or Latin1 with the -L option. When
writing, the inverse translation is performed. When creating a new IPTC record,
ExifTool automatically sets CodedCharacterSet to "UTF8" unless
otherwise specified. This causes all text strings to be stored in UTF-8, which
is the preferred encoding.
- Phil
(4)
]
