Thread

Posted on Tue May 8 17:39:20 2007 by linuxuser
UTF-8 -> Latin1 conversion, L option
I searched the exiftool-page but I didn't find out, what I have to define for -iptc:CodedCharacterSet=Latin1. Sorry I don't know the ESC-sequence.

What happens if the tag is written in UTF-8 and CodedCharacterSet is removed? Remains the coding UTF8?

How would you recommend to code all the tags from UTF-8 to Latin1 and _no_ CodedCharacterSet-tag?

What does exiftool with characters which are not contained in Latin1? Is it possible that complete words, which contain e.g. only 1 non-latin character are removed from keywords? E.g. njemacka (where the c is a special character and I don't know how to write it in the forum), which is Germany in Croatian.

I think the CodedCharacterSet-tag is a problem with Zooomr and I would like to do tests with different metatags.
Direct Responses: 5090 | Write a response
Posted on Tue May 8 18:49:02 2007 by exiftool in response to 5088
Re: UTF-8 -> Latin1 conversion, L option
The encoding of existing information is not changed if you change CodedCharacterSet. However, it affects any new information added with ExifTool, and it also affects the way all IPTC information is decoded when reading.

If you want to use Latin1, probably the best thing to do is to delete the CodedCharacterSet tag. Most software will assume Latin1 if there is no CodedCharacterSet specified.

FYI: The proper way to use Latin1 in IPTC is actually very complex, and few software packages would understand it if done properly. (You need to use ISO 2022 and designate your choice to alternate graphics character sets to be Latin1 with the appropriate escape sequence in CodedCharacterSet, then invoke the desired character set with another ISO 2022 escape sequence in the actual text when you want to use it.)

But to answer your question, here is how you would change encoding to UTF8:

exiftool a.jpg -tagsfromfile a.jpg -iptc:all -codedcharacterset=UTF8

Unfortunately, due to a quirk in the way this is implemented in versions up to 6.89, this doesn't work when the CodedCharacterSet is deleted (although this is exactly what you want to do). So I have changed this, and uploaded a 6.90 pre-release which properly handles the translations when CodedCharacterSet is deleted. With this version, you can also translate the IPTC values back to Latin1 like this:

exiftool a.jpg -tagsfromfile a.jpg -iptc:all -codedcharacterset=

- Phil
Direct Responses: 5091 | 5092 | Write a response
Posted on Tue May 8 18:54:49 2007 by exiftool in response to 5090
Re: UTF-8 -> Latin1 conversion, L option
Sorry, I didn't answer your question about conversion of characters which aren't valid Latin1: Only valid Latin1 characters are translated. All other characters are passed straight through without translation, and just encoded into UTF8 directly.

- Phil
Write a response
Posted on Tue May 8 21:22:28 2007 by linuxuser in response to 5090
Re: UTF-8 -> Latin1 conversion, L option
Phil, I would like to do it the other way round, not
exiftool a.jpg -tagsfromfile a.jpg -iptc:all -codedcharacterset=UTF8

I want to create valid latin1-tags _from_ existing utf8-tags. How should the command be to to copy all the tags of an image with uft8-metatags to a new image with latin1-tags? How can I throw away the characters which are not Latin1? I use the bash, so maybe I could use a sed command in a pipe "between". I think the only fields which could contain "real" UTF-8-chacters like Greek characters would be the keyword-field and maybe the description field, so I could exclude this first and then add it after a modification. What is the escape-sequence to define Latin1 in codedcharacterset? I mean something like ESC .. Thanks a lot
Direct Responses: 5093 | Write a response
Posted on Tue May 8 21:42:06 2007 by exiftool in response to 5092
Re: UTF-8 -> Latin1 conversion, L option
I gave you an example of how to convert UTF8 tags to Latin1, but you need ExifTool 6.90 to do it. If you want to convert when copying to another file, just use a different filename as the source in the -tagsfromfile option.

Thinking about this a bit more carefully: There is no way to throw out non-Latin1 characters, because all byte values 0-255 correspond to a valid Latin1 character.

As I said, the Latin1 escape sequence is not simple. "ESC,A", "ESC-A", "ESC.A" and "ESC/A" in CodedCharacterSet will designate Latin1 for graphics character set 0 through 3 respectively, but then you have to invoke the appropriate set through an ISO 2022 escape sequence in the text itself, otherwise the CodedCharacterSet's don't get used. This is a real pain, and no software will decode this properly.

Did I mention that IPTC really sucks when it comes to coded characters?

- Phil
Write a response