You need to sign in to do that
Don't have an account?
sprak
[PHP] The always popular "Invalid byte 1 of 1-byte UTF-8 sequence." problem
*** UPDATE 4 Feb 2005: It appears that one field was not having the utf8_encode function applied to it. Stupid error; you may now all point and laugh.
Greetings all; I recently had to code up a mechanism to insert/select cases into Salesforce via the SOAP API. Built everything up fine and began testing; eventually, some of the test cases included accented characters (umlauts, etc. e.g., ö). In these test cases, the insert call failed with the error message "java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence."
Having encountered this problem before, I scoured my archives and reminded myself that the text should be passed through utf8_encode() before making the SOAP call. No problem then; I wrapped each text item with a utf8_encode() call.
However, the problem still persists even after using utf8_encode(); now, the accented characters are being passed as values that look like garbage (e.g., �?¼). The same error message is returned from the SOAP call; scoping the wire shows that the SOAP headers are set to UTF-8 encoding.
Pretty stumped at this point; any help would be appreciated. Server details follows:
* Fedora Core 2
* Apache 2.0.51-2.7
* PHP 4.3.8 (cgi)
* PEAR::SOAP 0.8RC3
* Latest Salesforce client from Sourceforge
Cheers.
Greetings all; I recently had to code up a mechanism to insert/select cases into Salesforce via the SOAP API. Built everything up fine and began testing; eventually, some of the test cases included accented characters (umlauts, etc. e.g., ö). In these test cases, the insert call failed with the error message "java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence."
Having encountered this problem before, I scoured my archives and reminded myself that the text should be passed through utf8_encode() before making the SOAP call. No problem then; I wrapped each text item with a utf8_encode() call.
However, the problem still persists even after using utf8_encode(); now, the accented characters are being passed as values that look like garbage (e.g., �?¼). The same error message is returned from the SOAP call; scoping the wire shows that the SOAP headers are set to UTF-8 encoding.
Pretty stumped at this point; any help would be appreciated. Server details follows:
* Fedora Core 2
* Apache 2.0.51-2.7
* PHP 4.3.8 (cgi)
* PEAR::SOAP 0.8RC3
* Latest Salesforce client from Sourceforge
Cheers.
Message Edited by sprak on 02-04-2005 10:04 AM
Any help you could provide would be appreciated. Thanks!
~Chad
Where exactly is the kanji ending up as garbage? Is it when you view the record via the Salesforce web interface and if so what browser? When you pull it back out of Salesforce via an API call? What encoding is the text in (shift jis, euc-jp, iso, etc.) before you pass it to Salesforce?
Cheers.
I must of misunderstood your original post, I thought that you resolved the error by using utf8_encode() but were still getting wierd characters (e.g., �?¼).
This is the problem i am having. European characters, accents, etc have been working fine... But when the double-byte kanji come through it causes the api to die with the same error. I saw your suggestion and added utf8_encode() to all the variables in the form. When I wrap in utf8_encode... the error goes away but the leads show up in SalesForce as (e.g., �?¼) instead of the actual characters.
The data is all being sent and passed in utf-8. I can store them in MySQL fine and retreive them in their original state, I can send utf-8 email and the characters show up fine. But when I send them to the API, it dies on me.
I have included some Kanji if you have a moment to test it. I appreciate your help.
以下のフォームをご記入ください。 <ご記入上の注意事項> *は必須入力項目です。英数字は半角で入力してください。
- If you are using a HTML form to pass the data into the API call process, add <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> in between the head tags if it is not already there and try changing the charset to various charsets (shift-jis, etc.). Also, try adding and changing the value of the accept-charset attribute for the <form> tag. Not sure if that will help, but it will rule out a few more variables.
- Start up a new thread in this forum titled "Unable to store kanji/double-byte characters via API" or something to that effect. Someone might recognize the problem you are having from a new, better titled thread
- Fire off a ticket to Salesforce support; I've found them very responsive.
Sorry I cannot be of more assistance; if you do come across a solution, drop a link to it here if you remember.Cheers.
Message Edited by sprak on 04-21-2006 02:41 PM
“We currently have four instances of our service: NA0, NA1, EMEA, and JP.
While NA1, EMEA, and JP support the UTF8 character set (aka Unicode), NA0 only supports the ISO-8859-1 character set.
What this all means is that for all customers that signed up from the US web site prior to roughly June 2002, those customers cannot use asian languages that are based on a double-byte character set. So, those customers would have a tough time putting on divisions or users from Japan, Korea, or China.
Note that these orgs CAN still use the following:
So that said, we are migrating to a compatible server and we should be good to go. Thanks for your quick replies... very helpful.
~Chad
Our data was migrated to the servers that support the asian characters. When leads are created through the API it still bombs with that darn error: "Invalid byte 1 of 1-byte UTF-8 sequence."
If I use utf8_encode, it doesn't die, but all the asian characters are transformed into something strange.
The error appears to occur on the SOAP level, and not the salesforce side, but I could be totally wrong...
Any ideas would help.
Thank you.
When dealing with Asian Characters (double-byte) functions like substr, strtolower, strtoupper will actualy distort the original characters (I originally thought they would be ignored).
we do not have mbstring functions installed, but using mb_substr, mb_strtolower, etc will resolve this issue, or by removing the formating from double-byte characters.
The migration to the new servers was also critical from the SFDC end.
Thanks everyone for your help.
At least we hope they will!
Dear Simon,
but it doesn't work. Its allowing me to enter any characters and saved the record without throwing any error! :-(
Simon,
Could you please tell me the range of Western characters which can be passed in REGEX. If i get the range, then i am done with my requirement.
thanks a lot
Hi,
I am having the same problem as you faced earlier i was wondering if you found any solution for this.
Sorry it might be many years you might remember or not just want to take a chance.
Thanks
Akhil