Thread

Posted on Wed Jan 10 19:20:32 2007 by themonk
Non Printable Ascii Chars In XMP
Hi Phil

Happy New Year

What do you make of this ?

I have been duplicating IPTC blocks in XMP using -tagsFromFile ......
We have been getting spurious PhotoShop CS2 errors about un-readable data being ignored.
It turns out that XMP only supports printable ASCII chars (0-127), the character
that was causing the problem was a £ (pound sign).
If you run "exiftool -XMP:Description='£' testfile.jpg" and then load testfile.jpg in PS you should
see a simpler version of the problem.

-tagsromFile is a handy solution and I do not want to have to limit myself to extracting
the IPTC values, re-encoding and then re-inserting.

Can you suggest anything ?

Mark Tate
Direct Responses: 4011 | Write a response
Posted on Thu Jan 11 00:38:47 2007 by exiftool in response to 4009
Re: Non Printable Ascii Chars In XMP
Hi Mark,

XMP supports special characters beyond the standard ASCII using UTF-8 encoding. The problem is on the IPTC side: IPTC character encoding is fantastically obscure, and not well implemented by other software. Even Photoshop does not adhere to the IPTC specification, and will write Latin1 characters ad-hoc in IPTC without properly setting the CodedChararacterSet tag.

For this reason it is very difficult to properly handle special characters in IPTC.

Also, I don't have a very good test set of IPTC containing special characters from other applications, so it is difficult for me to know what the best way to handle this is. Can you tell me what encoding is used in your IPTC samples that contain special characters, and what the CodedCharacterSet tag is set to?

According to this source, it may be sufficient in most cases to just assume Latin-1 encoding if not specified. If this is true, I could add an option which would force ExifTool to assume Latin1 encoding and convert appropriately.

If anyone has any ideas on this matter, I'd love to hear them.

- Phil
Direct Responses: 4021 | Write a response
Posted on Thu Jan 11 21:01:47 2007 by exiftool in response to 4011
Re: Non Printable Ascii Chars In XMP
Hi Mark,

Thanks for the sample via email.

I think the thing to do is to assume Latin1 coding unless otherwise specified. This should fix the problem with your sample image at least. It is a rather significant change to start translating IPTC text, but I hope I have done it in a way that won't break things for too many people, and hopefully it will solve more problems than it creates.

The strategy now is to convert IPTC text if the CodedCharacterSet is recognized, and to assume Latin1 if the CodedCharacterSet tag doesn't exist. The ISO 2022 escape sequences used to switch between different codings are not yet supported, and the text is assumed to be all in a single character set. Also, when creating a new IPTC record from scratch, a CodedCharacterSet value of "UTF8" is written by default.

The new version will require a lot of testing since this is a fairly significant change. It would help if you could help with this effort. I have uploaded a 6.70 pre-release here for you to play with.

Note that the translations are only performed if the coding is Latin1 or UTF8. Otherwise no translation is done. This will all be spelled out in the new FAQ #10, which will read:

IPTC: IPTC text is converted only for recognized values of the IPTC:CodedCharacterSet tag. Currently recognized encodings are UTF-8 ("UTF8" or "ESC % G") and Latin1/ISO-8859-1 ("Latin" or "ESC . A"). "Latin" is assumed if the CodedCharacterSet tag is missing. No translation is performed for all other values of CodedCharacterSet. When reading, text is translated to UTF-8 by default, or Latin1 with the -L option. When writing, the inverse translation is performed. When creating a new IPTC record, ExifTool automatically sets CodedCharacterSet to "UTF8" unless otherwise specified. This causes all text strings to be stored in UTF-8, which is the preferred encoding.

- Phil
Direct Responses: 4066 | 4068 | 4072 | Write a response
Posted on Wed Jan 17 18:43:06 2007 by themonk in response to 4021
Re: Non Printable Ascii Chars In XMP
Thanks Phil ...

I've put a dozen images through which previously had problems and they open in PhotoShop
with the XMP in-tact...

I will continue to test as and when I come across images but so far so good...

Let me know if anyone spots any issues....

Mark
Write a response
Posted on Wed Jan 17 19:12:11 2007 by themonk in response to 4021
Re: Non Printable Ascii Chars In XMP
Thanks Phil ...

I've put a dozen images through which previously had problems and they open in PhotoShop
with the XMP in-tact...

I will continue to test as and when I come across images but so far so good...

Let me know if anyone spots any issues....

Mark
Direct Responses: 4070 | Write a response
Posted on Wed Jan 17 19:29:38 2007 by exiftool in response to 4068
Re: Non Printable Ascii Chars In XMP
Hi Mark,

I've been reading more about the ISO 2022 specification that is used in IPTC, and there are a couple of things I'm thinking about changing. The first issue won't affect you because you're not writing IPTC, but the second may help if you have translation problems with images where CodedCharacterSet has been set to an unrecognized value.

1) I think I'll change the default behaviour of setting CodedCharacterSet to UTF8 when creating a new IPTC record because it seems there isn't good support for this in other applications.

2) I may try applying the Latin conversion even for unrecognized CodedCharacterSets provided no alternate ISO 2022 character sets have been invoked in the text.

- Phil
Write a response
Posted on Wed Jan 17 20:11:42 2007 by themonk in response to 4021
Re: Non Printable Ascii Chars In XMP
Thanks Phil ...

I've put a dozen images through which previously had problems and they open in PhotoShop
with the XMP in-tact...

I will continue to test as and when I come across images but so far so good...

Let me know if anyone spots any issues....

Mark
Direct Responses: 4092 | Write a response
Posted on Fri Jan 19 16:00:24 2007 by exiftool in response to 4072
Re: Non Printable Ascii Chars In XMP
I've released version 6.70 officially now. This version implements the changes that I mentioned in my last post. So here is the updated FAQ #10 text for IPTC character coding:

IPTC: The value of the IPTC:CodedCharacterSet tag determines how the internal IPTC string values are interpreted. If CodedCharacterSet exists and has a value of "UTF8" (or "ESC % G") then string values are assumed to be stored as UTF-8, otherwise Latin1 (cp1252) coding is assumed. When reading, these strings are translated to UTF-8 by default, or Latin1 with the -L option. When writing, the inverse translation is performed. No translation is done if the internal (IPTC) and external (ExifTool) character sets are the same. Note that ISO 2022 character set shifting is not supported. Instead, a warning is issued and the string is not translated if an ISO 2022 shift code is found. See the IPTC specification for more information about IPTC character coding.

- Phil
Direct Responses: 4095 | Write a response
Posted on Fri Jan 19 19:56:35 2007 by themonk in response to 4092
Re: Non Printable Ascii Chars In XMP
I will let you know of any issues..

Excellent response as usual..
Thanks Phil.....

Write a response