WKWebView load webpage with special characters - objective-c

I've got a wkwebview that works as a browser. I can't manage to load addresses with special characters such as "http://www.håbo.se" (swedish character).
I'm using:
parsedUrl = [parsedUrl stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
which is promising as it creates an address that looks like follows:
http://www.h%c3%a5bo.se/
If I enter that in Chrome it works. But when I try to load it in the wkwebview i get the following (I can load all other pages):
Here's the full NSError printed
Error Domain=NSURLErrorDomain Code=-1003 "A server with the specified hostname could not be found." UserInfo={_WKRecoveryAttempterErrorKey=<WKReloadFrameErrorRecoveryAttempter: 0x7f82ca502290>, NSErrorFailingURLStringKey=http://www.h%c3%a5bo.se/, NSErrorFailingURLKey=http://www.h%c3%a5bo.se/, NSUnderlyingError=0x7f82ca692200 {Error Domain=kCFErrorDomainCFNetwork Code=-1003 "A server with the specified hostname could not be found." UserInfo={NSErrorFailingURLStringKey=http://www.h%c3%a5bo.se/, NSErrorFailingURLKey=http://www.h%c3%a5bo.se/, _kCFStreamErrorCodeKey=8, _kCFStreamErrorDomainKey=12, NSLocalizedDescription=A server with the specified hostname could not be found.}},

This one is complicated. From this article:
Resolving a domain name
If the string that represents the domain name is not in Unicode, the
user agent converts the string to Unicode. It then performs some
normalization functions on the string to eliminate ambiguities that
may exist in Unicode encoded text.
Normalization involves such things as converting uppercase characters
to lowercase, reducing alternative representations (eg. converting
half-width kana to full), eliminating prohibited characters (eg.
spaces), etc.
Next, the user agent converts each of the labels (ie. pieces of text
between dots) in the Unicode string to a punycode representation. A
special marker ('xn--') is added to the beginning of each label
containing non-ASCII characters to show that the label was not
originally ASCII. The end result is not very user friendly, but
accurately represents the original string of characters while using
only the characters that were previously allowed for domain names.
For example, following domain name:
JP納豆.例.jp
converts to next representation:
xn--jp-cd2fp15c.xn--fsq.jp
You can use following code to perform this conversion.
Resolving a path
If the string is input by the user or stored in a non-Unicode
encoding, it is converted to Unicode, normalized using Unicode
Normalization Form C, and encoded using the UTF-8 encoding.
The user agent then converts the non-ASCII bytes to percent-escapes.
For example, following path:
/dir1/引き割り.html
converts to next representation:
/dir1/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html
For this purpose, you may use following code:
path = [URL.path stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
Note that stringByAddingPercentEscapesUsingEncoding: is deprecated, because each URL component or subcomponent has different rules for what characters are valid.
Putting it all together
Resulting code:
#implementation NSURL (Normalization)
- (NSURL*)normalizedURL {
NSURLComponents *components = [NSURLComponents componentsWithURL:self resolvingAgainstBaseURL:YES];
components.host = [components.host IDNAEncodedString]; // from https://github.com/OnionBrowser/iOS-OnionBrowser/blob/master/OnionBrowser/NSStringPunycodeAdditions.h
components.path = [components.path stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
return components.URL;
}
#end
Unfortunately, actual URL "normalization" is more complicated - you need to handle all remaining URL components too. But I hope I've answered your question.

Related

Inconsistencies in URL encoding methods across Objective-C and Swift

I have the following Objective-C code:
[#"http://www.google.com" stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
// http%3A//www.google.com
And yet, in Swift:
"http://www.google.com".addingPercentEncoding(withAllowedCharacters: .urlPathAllowed)
// http://www.google.com
To what can I attribute this discrepancy?
..and for extra credit, can I rely on this code to encode for url path reserved characters while passing a full url like this?
The issue actually rests in the difference between NSString method stringByAddingPercentEncodingWithAllowedCharacters and String method addingPercentEncoding(withAllowedCharacters:). And this behavior has been changing from version to version. (It looks like the latest beta of iOS 11 now restores this behavior we used to see.)
I believe the root of the issue rests in the particulars of how paths are percent encoded. Section 3.3 of RFC 3986 says that colons are permitted in paths except in the first segment of a relative path.
The NSString method captures this notion, e.g. imagine a path whose first directory was foo: (with a colon) and a subdirectory of bar: (also with a colon):
NSString *string = #"foo:/bar:";
NSCharacterSet *cs = [NSCharacterSet URLPathAllowedCharacterSet];
NSLog(#"%#", [string stringByAddingPercentEncodingWithAllowedCharacters:cs]);
That results in:
foo%3A/bar:
The : in the first segment of the page is percent encoded, but the : in subsequent segments are not. This captures the logic of how to handle colons in relative paths per RFC 3986.
The String method addingPercentEncoding(withAllowedCharacters:), however, does not do this:
let string = "foo:/bar:"
os_log("%#", string.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed)!)
Yields:
foo:/bar:
Clearly, the String method does not attempt that position-sensitive logic. This implementation is more in keeping with the name of the method (it considers solely what characters are "allowed" with no special logic that tries to guess, based upon where the allowed character appears, whether it's truly allowed or not.)
I gather that you are saddled with the code supplied in the question, but we should note that this behavior of percent escaping colons in relative paths, while interesting to explain what you experienced, is not really relevant to your immediate problem. The code you have been provided is simply incorrect. It is attempting to percent encode a URL as if it was just a path. But, it’s not a path; it’s a URL, which is a different thing with its own rules.
The deeper insight in percent encoding URLs is to acknowledge that different components of a URL allow different sets of characters, i.e. they require different percent encoding. That’s why NSCharacterSet has so many different URL-related character sets.
You really should percent encode the individual components, percent encoding each with the character set allowed for that type of component. Only when the individual components are percent encoded should they then be concatenated together to form the whole the URL.
Alternatively, NSURLComponents is designed precisely for this purpose, getting you out of the weeds of percent-encoding the individual components yourself. For example:
var components = URLComponents(string: "http://httpbin.org/post")!
let foo = URLQueryItem(name: "foo", value: "bar & baz")
let qux = URLQueryItem(name: "qux", value: "42")
components.queryItems = [foo, qux]
let url = components.url!
That yields the following, with the & and the two spaces properly percent escaped within the foo value, but it correctly left the & in-between foo and qux:
http://httpbin.org/post?foo=bar%20%26%20baz&qux=42
It’s worth noting, though, that NSURLComponents has a small, yet fairly fundamental flaw: Specifically, if you have query values, NSURLQueryItem, that could have + characters, most web services need that percent escaped, but NSURLComponents won’t. If your URL has query components and if those query values might include + characters, I’d advise against NSURLComponents and would instead advise percent encoding the individual components of a URL yourself.

JSON parsing with £ symbol returning null

I am trying to parse a JSON script from my server which contains a £ (pound) however this is returning null. I had problems before so temporarily just switched to using dollars or euro sign but I need to be able to parse the pound sign. However I am unsure as to how to rectify this issue. I created a test project and temporarily just using String with contents method, all the other jsons work fine, but the one with the pound sign in it returns null.
NSString *get5 = [NSString stringWithContentsOfURL:url5 encoding:NSUTF8StringEncoding error:nil];
I tried the other encoding NSUTF encoding but they dont seem to work either. Some return null, some return chinese characters, so they are not much good.
Any help would be much appreciated!!
Edit:
Used the NSError object and got this message back
"The operation couldn’t be completed. (Cocoa error 261.)"
UserInfo=0x68294b0
{NSURL=http://myserver.com/test.jsp,
NSStringEncoding=4}
Cocoa Error 261 is an encoding error. The service returning the JSON obviously isn't returning it with an UTF-8 encoding. Either make the service returns UTF-8 if you can, or find out which encoding it is returning and use that.
See this question for more info:
Encoding issue: Cocoa Error 261?
Can you check that the json is not encoded in 1) CRLF (windows)encoding 2) Western etc.
Make sure the encoding is UTF-8

Objective C - char with umlaute to NSString

I am using libical which is a library to parse the icalendar format (RFC 2445).
The problem is, that there may be some german umlaute for example in the location field.
Now libical returns a const char * for each value like:
"K\303\203\302\274nstlerhaus in M\303\203\302\274nchen"
I tried to convert it to NSString with:
[NSString stringWithCString:icalvalue_as_ical_string_r(value) encoding:NSUTF8StringEncoding];
But what I get is:
Künstlerhaus in München
Any suggestions? I would appreciate any help!
Seems like your string got doubly-UTF-8-encoded, because "Künstlerhaus in München" actually is UTF-8, if you UTF-8-decode that again you should get the correct string.
Bear in mind though that you shouldn't be satisfied with that result. There are combinations where a doubly-UTF-8-encoded string can't be simply be decoded by doing a double-UTF-8-decode. Some encoding combinations are irreversible. So in your situation I'd suggest you find out why the string got doubly-UTF-8-encoded in the first place, probably the ical is stored in the wrong encoding on the hard disk, or libical uses the wrong character set to access it, or if you're getting the ical from a server, perhaps the charset there is wrong for text/ical, etc, etc...
The C string does not seem to be encoded in UTF-8, as there are four bytes for each of the characters. For example ü would be encoded as \xc3\xbc (or \195\188) in UTF-8. So the input is either already garbled when you receive it or it uses some other encoding.

Unable to correctly interpret format specifiers in resource path

I am trying to retrieve some information from the server via the following objective C resource path. However, I was unable to get my results as the resource path passed to the server is altered as shown below (server console)
//Objective C code
NSString *resourcePath = [NSString stringWithFormat:#"/sm/search?limit=100&term=%#&types%5B%5D=users&types%5B%5D=questions&types%5B%5D=topics",searchString];
//Server console
[GET /sm/search?limit=100&term=Afhd&types5803200164=users&types51107296256=questions&types5368849=topics]
How can I update my code so that the server can recognize the regular expressions (%5B%5D) in my resource path instead of converting them?
As you use stringWithFormat, it means format specifiers start with %.
If you want to leave %5d etc intact in the output, you have to double the percent signs: %%5d.
So, you have to double all of them, except the one in term=%#, so that the value of stringSearch get into the result.

How do I match non-ASCII characters with RegexKitLite?

I am using RegexKitLite and I'm trying to match a pattern.
The following regex patterns do not capture my word that includes N with a titlde: ñ.
Is there a string conversion I am missing?
subjectString = #"define_añadir";
//regexString = #"^define_(.*)"; //this pattern does not match, so I assume to add the ñ
//regexString = #"^define_([.ñ]*)"; //tried this pattern first with a range
regexString = #"^define_((?:\\w|ñ)*)"; //tried second
NSString *captured= [subjectString stringByMatching:regexString capture:1L];
//I want captured == añadir
Looks like an encoding problem to me. Either you're saving the source code in an encoding that can't handle that character (like ASCII), or the compiler is using the wrong encoding to read the source files. Going back to the original regex, try creating the subject string like this:
subjectString = #"define_a\xC3\xB1adir";
or this:
subjectString = #"define_a\u00F1adir";
If that works, check the encoding of your source code files and make sure it's the same encoding the compiler expects.
EDIT: I've never worked with the iPhone technology stack, but according to this doc you should be using the stringWithUTF8String method to create the NSString, not the #"" literal syntax. In fact, it says you should never use non-ASCII characters (that is, anything not in the range 0x00..0x7F) in your code; that way you never have to worry about the source file's encoding. That's good advice no matter what language or toolset you're using.