Thread

Posted on Fri Apr 7 06:43:45 2006 by pdi
IPTC UTF-8 Support?
According to IIM v4.1 ("IPTC headers") specs, ch. 5 Envelope Record, 1:90 Coded Character Set, there seems to be a provision for utf-8 support. I understand this is not often implemented due both to implementation difficutlies and native utf-8 support in XMP. I've read ExifTool's documentation but I couldn't find anything on this. Is utf-8 for IPTC supported? Many thanks in advance.
Direct Responses: 2115 | Write a response
Posted on Fri Apr 7 16:05:23 2006 by exiftool in response to 2114
Re: IPTC UTF-8 Support?
Yes, you can write UTF-8 to IPTC using ExifTool. If you do this, you are responsible for setting the value the CodedCharacterSet tag to "ESC % G", this is not done automatically. (Note: Use this exact string -- ExifTool translates to the proper byte sequence for the escape code.)
Direct Responses: 2117 | Write a response
Posted on Fri Apr 7 17:15:20 2006 by pdi in response to 2115
Re: IPTC UTF-8 Support?
Thank you for the reply. ExifTool never ceases to amaze me with what it can do. I'm still having some difficulty. Here's what I do: 1. make a txt file with all the iptc tags needed, including the line "-iptc:CodedCharacterSet=ESC % G" 2. pass the file to exiftool using the -@ ARGFILE. The resulting jpg's iptc is not in utf-8. Must the txt file be in utf-8 encoding to begin with? If yes, what is the use of CodedCharacterSet?
Direct Responses: 2121 | Write a response
Posted on Fri Apr 7 19:27:29 2006 by exiftool in response to 2117
Re: IPTC UTF-8 Support?
Perhaps I could have been a bit more clear: You can write IPTC strings containing any arbitrary series of bytes using ExifTool. It is up to you to encode them with the desired character set. ExifTool does not translate IPTC character strings. So to write UTF-8, the values that you assign must be valid UTF-8 byte sequences.

The value of setting CodedCharacterSet is that it informs other applications that your IPTC strings contain UTF-8 characters. As far as ExifTool is concerned though it doesn't matter, because ExifTool places the translation in your hands.

So maybe you shouldn't be so amazed after all... ;)

- Phil
Direct Responses: 2122 | Write a response
Posted on Fri Apr 7 21:10:53 2006 by pdi in response to 2121
Re: IPTC UTF-8 Support?
Phil, This is crystal clear, many thanks for your clarification :-) Pandelis
Direct Responses: 2144 | Write a response
Posted on Tue Apr 11 09:45:41 2006 by pdi in response to 2122
Re: IPTC UTF-8 Support?
Phil,

Just one last point. Googling around I came accross 3 different versions of the 1:90 utf-8 escape sequence. Does it make any difference which string is used, or can ExifTool inerpret all of them?

1. "ESC % G", from your previous reply
2. "ESC %G", from Markus Kuhn's "UTF-8 and Unicode FAQ for Unix/Linux"
3. "ESC%G", from a post by the author (I think) of PictureSync

TIA
Pandelis
Direct Responses: 2146 | Write a response
Posted on Tue Apr 11 14:43:40 2006 by exiftool in response to 2144
Re: IPTC UTF-8 Support?
The actual hex byte sequence that must be written to the file is 0x1b 0x25 0x47 (3 bytes).

ExifTool converts "ESC " to 0x1b only if there is a space after "ESC". Then it packs the remaining bytes by removing spaces and commas. Note that '%' = 0x25 and 'G' = 0x47.

So with your examples, either 1 or 2 will work, but 3 will not because of the way that ExifTool translates the escape character.
Direct Responses: 4877 | Write a response
Posted on Sun Apr 15 18:00:57 2007 by linuxuser in response to 2146
Re: IPTC UTF-8 Support?
-IPTC:CodedCharacterSet="ESC % G" works for me, but why can't it be defined with -IPTC:CodedCharacterSet="UTF-8" (it doesn't work here)? I was a little bit confused with the FAQ #10, because it says: If CodedCharacterSet exists and has a value of "UTF8" (or "ESC % G") ...
Direct Responses: 4878 | Write a response
Posted on Sun Apr 15 18:14:25 2007 by exiftool in response to 4877
Re: IPTC UTF-8 Support?
-IPTC:CodedCharacterSet=UTF8 (not UTF-8) works again as of version 6.86 (somehow this feature got broken before this version).

IPTC uses the ISO 2022 standard for alternate character sets, and character sets are specified via a sequence of escape characters. For UTF-8, the escape sequence is "ESC % G", and for convenience ExifTool allows you to also specify "UTF8" which is translated to the appropriate escape sequence internally.

You can use the -n option to bypass this translation to see actual escape sequence (but in this case, the "ESC" character will appear as a "." because all control characers are converted to "." in the output. But you can add the -b option to prevent this conversion too if you want.)

I hope this makes sense.

- Phil
Write a response