I am trying to parse a JSON script from my server which contains a £ (pound) however this is returning null. I had problems before so temporarily just switched to using dollars or euro sign but I need to be able to parse the pound sign. However I am unsure as to how to rectify this issue. I created a test project and temporarily just using String with contents method, all the other jsons work fine, but the one with the pound sign in it returns null.
NSString *get5 = [NSString stringWithContentsOfURL:url5 encoding:NSUTF8StringEncoding error:nil];
I tried the other encoding NSUTF encoding but they dont seem to work either. Some return null, some return chinese characters, so they are not much good.
Any help would be much appreciated!!
Edit:
Used the NSError object and got this message back
"The operation couldn’t be completed. (Cocoa error 261.)"
UserInfo=0x68294b0
{NSURL=http://myserver.com/test.jsp,
NSStringEncoding=4}
Cocoa Error 261 is an encoding error. The service returning the JSON obviously isn't returning it with an UTF-8 encoding. Either make the service returns UTF-8 if you can, or find out which encoding it is returning and use that.
See this question for more info:
Encoding issue: Cocoa Error 261?
Can you check that the json is not encoded in 1) CRLF (windows)encoding 2) Western etc.
Make sure the encoding is UTF-8
Related
I've got a wkwebview that works as a browser. I can't manage to load addresses with special characters such as "http://www.håbo.se" (swedish character).
I'm using:
parsedUrl = [parsedUrl stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
which is promising as it creates an address that looks like follows:
http://www.h%c3%a5bo.se/
If I enter that in Chrome it works. But when I try to load it in the wkwebview i get the following (I can load all other pages):
Here's the full NSError printed
Error Domain=NSURLErrorDomain Code=-1003 "A server with the specified hostname could not be found." UserInfo={_WKRecoveryAttempterErrorKey=<WKReloadFrameErrorRecoveryAttempter: 0x7f82ca502290>, NSErrorFailingURLStringKey=http://www.h%c3%a5bo.se/, NSErrorFailingURLKey=http://www.h%c3%a5bo.se/, NSUnderlyingError=0x7f82ca692200 {Error Domain=kCFErrorDomainCFNetwork Code=-1003 "A server with the specified hostname could not be found." UserInfo={NSErrorFailingURLStringKey=http://www.h%c3%a5bo.se/, NSErrorFailingURLKey=http://www.h%c3%a5bo.se/, _kCFStreamErrorCodeKey=8, _kCFStreamErrorDomainKey=12, NSLocalizedDescription=A server with the specified hostname could not be found.}},
This one is complicated. From this article:
Resolving a domain name
If the string that represents the domain name is not in Unicode, the
user agent converts the string to Unicode. It then performs some
normalization functions on the string to eliminate ambiguities that
may exist in Unicode encoded text.
Normalization involves such things as converting uppercase characters
to lowercase, reducing alternative representations (eg. converting
half-width kana to full), eliminating prohibited characters (eg.
spaces), etc.
Next, the user agent converts each of the labels (ie. pieces of text
between dots) in the Unicode string to a punycode representation. A
special marker ('xn--') is added to the beginning of each label
containing non-ASCII characters to show that the label was not
originally ASCII. The end result is not very user friendly, but
accurately represents the original string of characters while using
only the characters that were previously allowed for domain names.
For example, following domain name:
JP納豆.例.jp
converts to next representation:
xn--jp-cd2fp15c.xn--fsq.jp
You can use following code to perform this conversion.
Resolving a path
If the string is input by the user or stored in a non-Unicode
encoding, it is converted to Unicode, normalized using Unicode
Normalization Form C, and encoded using the UTF-8 encoding.
The user agent then converts the non-ASCII bytes to percent-escapes.
For example, following path:
/dir1/引き割り.html
converts to next representation:
/dir1/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html
For this purpose, you may use following code:
path = [URL.path stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
Note that stringByAddingPercentEscapesUsingEncoding: is deprecated, because each URL component or subcomponent has different rules for what characters are valid.
Putting it all together
Resulting code:
#implementation NSURL (Normalization)
- (NSURL*)normalizedURL {
NSURLComponents *components = [NSURLComponents componentsWithURL:self resolvingAgainstBaseURL:YES];
components.host = [components.host IDNAEncodedString]; // from https://github.com/OnionBrowser/iOS-OnionBrowser/blob/master/OnionBrowser/NSStringPunycodeAdditions.h
components.path = [components.path stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
return components.URL;
}
#end
Unfortunately, actual URL "normalization" is more complicated - you need to handle all remaining URL components too. But I hope I've answered your question.
i am trying to parse rss/atom-feeds in my rails app, but i encountered some serious problems with non-ASCII characters, eg. the german umlauts ÄÖÜ or ß. Some feeds in the wild use proper UTF-8, but some others make me cry. The general Problem is:
I must be able to parse any Feeds, whatever encoding they might have. The "loss" of characters is not an option (though its my current status), because i do some text and language analysis with the feed-items.
What i use so far:
FeedZirra for fetching and parsing the feeds, works well so far. I also "sanitize" the values i get from FeedZirra.
HTMLEntities (gem) for unescaping special characters, like "Ä" which means "Ä"
rCharDet19 gem, to figure out which encoding the feed might have, and to:
string.encode! to convert from whatever it is to utf-8
Ruby 1.9.3 (lastest) and Rails 3.2.8 on Ubuntu Linux 12.04
The problem is, that i literally have no idea what i'm doing wrong.
def self.sanitize_encoding_and_htmlentities str
cd = CharDet.detect str
s = str.encode(:invalid => :replace, :undef => :replace, :replace => '')
coder = HTMLEntities.new
coder.decode(s)
end
This is my current sanitize method. As sample-feed i use
http://www.N24.de/2/index.rss
So far, the "special" characters got replaced completely. This is the only variant i found which just works without raising an error due to invalid byte stuff. I changed the encode method slightly, because i read in the ruby doc that without any encoding given, the encode method should "translate" to the given default_internal Encoding of the app, which is utf-8 in my case. CharDet stands there just for possible changes to anything related, might be useful.
I used the magic_encoding gem, so every file in my project should have the comment on the first line. My database is sqlite3 with utf-8.
As of 2012, is there anything i should look at? Did i make anything really wrong?
Thanks for help!
EDIT:
The feeds may be rss of any kind, atom, and/or just invalid XML. The Encoding may be UTF-8, something different, or just says "utf-8" while its some windows-XXX stuff, and so on. I really need a solution for this alltogether.
Also the fetching/parsing must be as fast as possible, that's why i picked feedzirra.
My current Idea is to get the feedcontent, replace every char in the "title" and "description" nodes with htmlentities if possible, use the encode! method to switch to utf-8, and then unescape the htmlentities. After this, special characters should be keeped i think, but i can't get something like this working at the moment. Might this be a good approach?
Finally i found the main Problem:
Feedzirra already returns UTF-8 when accessing entries and their attributes. But i used the sanitize method to access attributes, which returns ASCII-8BIT and weird characters escaped as html-entities.
However, i kicked all the sanitizing and encoding stuff out of my code, and now it just works. Seems that FeedZirra has something built in to transcode the feeds if neccessary.
I'm working in IOS and trying to pass some content to a web server via an NSURLRequest. On the server I have a PHP script setup to accept the request string and convert it into an JSON object using the Zend_JSON framework. The issue I am having is whenever the character "ø" is in any part of the request parameters, then the request string is cut short by one character.
Request string before going to server.
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Østfold","udid":"bed164974ea0d436a43f3cdee0e005a1"}]
Request string on server before any parsing
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Nord-Trøndelag","udid":"bed164974ea0d436a43f3cdee0e005a1"}
Everything looks exactly the same except the final closing ] is missing. I'm thinking it's having an issue when converting the string to UTF-8, but not sure the correct way to fix this issue.
Does anyone have any ideas why this is happening?
first of all do not trust the xcode console in such cases. you never know which coding the console is actually using.
second, escape the invalid characters before you build you json string. easiest way would probably to make sure you are using the same unicode representation, like utf-8, all the time.
third, if there are still invalid characters use a json lib with a parser (does the encoding). validate the output by parsing back to e.g. NSString. or validate the output manually by using a web form like http://jsonformatter.curiousconcept.com/
the badest way is to replace the single characters in the string, build your json and convert back. one way to do this could be to replace e.g an german ä with its unicode representaion U+00E4 (http://www.utf8-chartable.de/).
Thats the way I do it. I am glad that I nerver needed to go further than step three and this is the step you should do anyway to keep your code simple.
Please try to use Zends internal json Encoding:
Zend_Json::$useBuiltinEncoderDecoder = true;
should fix your issue.
I am using libical which is a library to parse the icalendar format (RFC 2445).
The problem is, that there may be some german umlaute for example in the location field.
Now libical returns a const char * for each value like:
"K\303\203\302\274nstlerhaus in M\303\203\302\274nchen"
I tried to convert it to NSString with:
[NSString stringWithCString:icalvalue_as_ical_string_r(value) encoding:NSUTF8StringEncoding];
But what I get is:
Künstlerhaus in München
Any suggestions? I would appreciate any help!
Seems like your string got doubly-UTF-8-encoded, because "Künstlerhaus in München" actually is UTF-8, if you UTF-8-decode that again you should get the correct string.
Bear in mind though that you shouldn't be satisfied with that result. There are combinations where a doubly-UTF-8-encoded string can't be simply be decoded by doing a double-UTF-8-decode. Some encoding combinations are irreversible. So in your situation I'd suggest you find out why the string got doubly-UTF-8-encoded in the first place, probably the ical is stored in the wrong encoding on the hard disk, or libical uses the wrong character set to access it, or if you're getting the ical from a server, perhaps the charset there is wrong for text/ical, etc, etc...
The C string does not seem to be encoded in UTF-8, as there are four bytes for each of the characters. For example ü would be encoded as \xc3\xbc (or \195\188) in UTF-8. So the input is either already garbled when you receive it or it uses some other encoding.
I am trying to retrieve some information from the server via the following objective C resource path. However, I was unable to get my results as the resource path passed to the server is altered as shown below (server console)
//Objective C code
NSString *resourcePath = [NSString stringWithFormat:#"/sm/search?limit=100&term=%#&types%5B%5D=users&types%5B%5D=questions&types%5B%5D=topics",searchString];
//Server console
[GET /sm/search?limit=100&term=Afhd&types5803200164=users&types51107296256=questions&types5368849=topics]
How can I update my code so that the server can recognize the regular expressions (%5B%5D) in my resource path instead of converting them?
As you use stringWithFormat, it means format specifiers start with %.
If you want to leave %5d etc intact in the output, you have to double the percent signs: %%5d.
So, you have to double all of them, except the one in term=%#, so that the value of stringSearch get into the result.