IOS JSON escaping special characters - objective-c

I'm working in IOS and trying to pass some content to a web server via an NSURLRequest. On the server I have a PHP script setup to accept the request string and convert it into an JSON object using the Zend_JSON framework. The issue I am having is whenever the character "ø" is in any part of the request parameters, then the request string is cut short by one character.
Request string before going to server.
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Østfold","udid":"bed164974ea0d436a43f3cdee0e005a1"}]
Request string on server before any parsing
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Nord-Trøndelag","udid":"bed164974ea0d436a43f3cdee0e005a1"}
Everything looks exactly the same except the final closing ] is missing. I'm thinking it's having an issue when converting the string to UTF-8, but not sure the correct way to fix this issue.
Does anyone have any ideas why this is happening?

first of all do not trust the xcode console in such cases. you never know which coding the console is actually using.
second, escape the invalid characters before you build you json string. easiest way would probably to make sure you are using the same unicode representation, like utf-8, all the time.
third, if there are still invalid characters use a json lib with a parser (does the encoding). validate the output by parsing back to e.g. NSString. or validate the output manually by using a web form like http://jsonformatter.curiousconcept.com/
the badest way is to replace the single characters in the string, build your json and convert back. one way to do this could be to replace e.g an german ä with its unicode representaion U+00E4 (http://www.utf8-chartable.de/).
Thats the way I do it. I am glad that I nerver needed to go further than step three and this is the step you should do anyway to keep your code simple.

Please try to use Zends internal json Encoding:
Zend_Json::$useBuiltinEncoderDecoder = true;
should fix your issue.

Related

Sabre Web/ .NET - Special Characters In SabreCommandLLSRQ Response Not Handed Properly

I'm using VB.NET to consume Sabre Web Services, primarily using SabreCommandLLSRQ to send native Sabre commands. Sending special characters without any special encoding works fine, but when I try to manipulate any response that contain the Cross of Lorraine using the Response element of SabreCommandLLSRS all of the Cross of Lorraine chars are missing if I display my string in a MsgBox or try to manipulate it.
If I push that string into my clipboard and view it in Notepad++, the characters are there but they seem to be encoded improperly - they come through as something like "‡". I'm pretty new to unicode encoding so that's all a bit above my head.
I've tried using the Replace method of String Builder to change those characters to something visible no avail - anyone have a way around this issue?
Strangely, the other special characters (e.g. "¤") seem to come through just fine.
This section in Dev Studio includes references to special character hex codes:
https://developer.sabre.com/docs/read/soap_apis/management/utility/Send_Sabre_Command
Does this help?
This is a pain in the behind due to the invisible characters.
String replace does work you just need to make sure you capture the invisible character after the Â
Simply in the SabreCommandSend function before you send the string to Sabre put something like the below.
Hopefully this should copy and paste straight out including the invisible character.
if (tempCommand.Contains("‡"))
{
tempCommand = tempCommand.Replace("‡", "Â");
}
I figured out how to get this to work, but its not pretty so if anyone has a better way to do it, I'm all ears.
I couldn't figure out what char to use to do the simple string Replace method, so instead I'm casting the string to a byte array, iterating through the array and replacing any strange characters I find, recasting the byte array into a raw string and doing the string replace on that:
Imports System.Text
Dim byteArray() As Byte = System.Text.Encoding.ASCII.GetBytes(sabreResponse)
For i = 0 To byteArray.Length - 1
If byteArray(i) = 63 Then 'this is a question mark char
byteArray(i) = 94 'caret that doesn't exist in native Sabre
End If
Next
MyClass.respString = System.Text.ASCIIEncoding.ASCII.GetString(byteArray)
MyClass.respString = MyClass.respString.Replace("^", "¥")
For whatever reason, the string replace method works after I swap out the offending byte with a dummy character but not before.

QueryString Encryption and Related Characters Problems

I'm using a 64base data encryption function to Encrypt and Decrypt emil addresses sent in links and back in QueryString using :
Encrypt(txtEmail.Text).ToString
// Which generate something like this " pqM/rgLD9PSrE+Ofm4pt4kg86+1RChHD "
Decrypt(Request("email").ToString
But the Decrypt didn't work fine and returned an error "Invalid length for a Base-64 char array" until I fond that I may solve it using :
Decrypt(Request("email").Replace(" ", "+").ToString)
Since the plus sign "+" character was generating a space when call from a URL.
I also tried UrlEncode but didn't help
Decrypt(Server.UrlEncode(Request("email")))
Now my questions is:
Is this the only problem may I face with the encrypted strings?
Is there another way to solve the problem more effective than I used with replace function?
Thank you all in advance
This would happen if you don't generate the URL properly.
The ASP.Net Request accessors will automatically decode the data that you access.
However, you need to URL-data-encode your string before putting it in the querystring in the first place.

Feed Encoding Problems Ruby 1.9

i am trying to parse rss/atom-feeds in my rails app, but i encountered some serious problems with non-ASCII characters, eg. the german umlauts ÄÖÜ or ß. Some feeds in the wild use proper UTF-8, but some others make me cry. The general Problem is:
I must be able to parse any Feeds, whatever encoding they might have. The "loss" of characters is not an option (though its my current status), because i do some text and language analysis with the feed-items.
What i use so far:
FeedZirra for fetching and parsing the feeds, works well so far. I also "sanitize" the values i get from FeedZirra.
HTMLEntities (gem) for unescaping special characters, like "Ä" which means "Ä"
rCharDet19 gem, to figure out which encoding the feed might have, and to:
string.encode! to convert from whatever it is to utf-8
Ruby 1.9.3 (lastest) and Rails 3.2.8 on Ubuntu Linux 12.04
The problem is, that i literally have no idea what i'm doing wrong.
def self.sanitize_encoding_and_htmlentities str
cd = CharDet.detect str
s = str.encode(:invalid => :replace, :undef => :replace, :replace => '')
coder = HTMLEntities.new
coder.decode(s)
end
This is my current sanitize method. As sample-feed i use
http://www.N24.de/2/index.rss
So far, the "special" characters got replaced completely. This is the only variant i found which just works without raising an error due to invalid byte stuff. I changed the encode method slightly, because i read in the ruby doc that without any encoding given, the encode method should "translate" to the given default_internal Encoding of the app, which is utf-8 in my case. CharDet stands there just for possible changes to anything related, might be useful.
I used the magic_encoding gem, so every file in my project should have the comment on the first line. My database is sqlite3 with utf-8.
As of 2012, is there anything i should look at? Did i make anything really wrong?
Thanks for help!
EDIT:
The feeds may be rss of any kind, atom, and/or just invalid XML. The Encoding may be UTF-8, something different, or just says "utf-8" while its some windows-XXX stuff, and so on. I really need a solution for this alltogether.
Also the fetching/parsing must be as fast as possible, that's why i picked feedzirra.
My current Idea is to get the feedcontent, replace every char in the "title" and "description" nodes with htmlentities if possible, use the encode! method to switch to utf-8, and then unescape the htmlentities. After this, special characters should be keeped i think, but i can't get something like this working at the moment. Might this be a good approach?
Finally i found the main Problem:
Feedzirra already returns UTF-8 when accessing entries and their attributes. But i used the sanitize method to access attributes, which returns ASCII-8BIT and weird characters escaped as html-entities.
However, i kicked all the sanitizing and encoding stuff out of my code, and now it just works. Seems that FeedZirra has something built in to transcode the feeds if neccessary.

Char.ConvertFromUtf32 not available in Silverlight

I'm converting a WinForms app to Silverlight (VB.NET). What should I use instead of Char.ConvertFromUtf32 as it's not available to use in Silverlight?
UTF-32 is currently not part of Silverlight, so you have to find a way around the limitation. I think you should stop a moment and think exactly why you need to read UTF32-encoded text.
If you are reading such text from a database or a file on the server, I would perform the conversion server-side (if possible I would convert everything to UTF-8 and get rid of the UTF-32 data in one shot).
If you are parsing a user-provided file on the client side, I would detect the UTF-32 encoding and gently tell the user that the file encoding is not supported. UTF32 is pretty rare nowadays, so I guess it should not be a very common case (but I could be wrong not knowing your exact situation).
In order to detect the file encoding you have to look at the first few bytes (byte order mark) -more information here, if they are not present the task becomes much harder and involves some kind of heuristics based on character frequency.
From: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/types/how-to-convert-between-hexadecimal-strings-and-numeric-types
You can use a direct cast, like:
// Get the character corresponding to the integral value.
string stringValue = Char.ConvertFromUtf32(value);
char charValue = (char)value;
Small warning, it will only work up to 0xffff. It will not work for high range Unicode from 0x10000 to 0x10ffff.
Also, if you need to parse \uXXXX, try this other question: How do I convert Unicode escape sequences to Unicode characters in a .NET string?

Is it safe to convert a mysqlpp::sql_blob to a std::string?

I'm grabbing some binary data out of my MySQL database. It comes out as a mysqlpp::sql_blob type.
It just so happens that this BLOB is a serialized Google Protobuf. I need to de-serialize it so that I can access it normally.
This gives a compile error, since ParseFromString() is not intended for mysqlpp:sql_blob types:
protobuf.ParseFromString( record.data );
However, if I force the cast, it compiles OK:
protobuf.ParseFromString( (std::string) record.data );
Is this safe? I'm particularly worried because of this snippet from the mysqlpp documentation:
"Because C++ strings handle binary data just fine, you might think you can use std::string instead of sql_blob, but the current design of String converts to std::string via a C string. As a result, the BLOB data is truncated at the first embedded null character during population of the SSQLS. There’s no way to fix that without completely redesigning either String or the SSQLS mechanism."
Thanks for your assistance!
It doesn't look like it would be a problem judging by that quote (it's basically saying if a null character is found in the blob it will stop the string there, however ASCII strings won't have random nulls in the middle of them). However, this might present a problem for internalization (multibyte charsets may have nulls in the middle).