Prevent NSString to escape contents - objective-c

Sorry for maybe a newbie question.
For various reasons I am stuck with a peculiar string that looks like this:
NSString *myString = #"A\\314\\212A\\314\\210O\\314\\210.jpg";
Can I in some ninja-way remove the double \\ and force NSString understand that the string is Uniencoded and should be read like this
NSString *myString = #"A\314\212A\314\210O\314\210.jpg"; // Displays ÅÄÖ as expected
I have tried different strategies tried to replace all slashes ("\"), but as soon as I add a ("\") NSString adds another one to escape the first one. And I get stuck here...
Is it possible to prevent NSString to escape my string?
UPDATE
I am aware this is a special case. Reading the output from a terminal program which reads files on the users drive. Via a NSTask I am capturing the output to into a NSString for parsing and splitting it into an array. It works great as long as there are no non-ascii characters. HFS+ is encoding non-ascii characters with slightly different Unicode called NFD.
When I am capturing the reponse, the ÅÄÖ are already encoded inside qoutes like this:
file.jpg
file2.jpg
"A\314\212A\314\210O\314\210.jpg"
When I create a NSString and with the captured reponse, it gets escaped by NSString a second time.
A\\314\\212A\\314\\210O\\314\\210.jpg
I am aware that this is not the optimal, but right now I have no control over what the terminal program is outputting. Usually when a NSString is created with this NFD encoding, Objectiv-C takes care of the encoding/decoding for you. But since I have a string with mixed and double escaped content, I have a hard way of creating it and make NSString to understand that the content is encoded with this encoding.
Basically I would like to to this:
decodedString = [output stringByReplacingOccurrencesOfString:#"\\\\"
withString:#"\\"];
But behind the scenes NSString is always escaping \ with another \ for you so I would like a way to create "raw" strings with out NSString interfering.
Have tried various ways to try enforing Unicode encoding on NSString but it all boils down to NSString is always capturing and escaping \.
Any tips och points appreciated!

I did not find any way around this other than go the other way around and change the output from the terminal program not to encode it this way.

Related

Parsing file with percent signs (%) in Objective-C

I'm writing a parser for fortune files. Fortune is a small app on *nix platforms that just prints out a random "fortune". The fortune files are straight text, with each fortune being separated by a percent sign on its own line. For example:
A little suffering is good for the soul.
-- Kirk, "The Corbomite Maneuver", stardate 1514.0
%
A man either lives life as it happens to him, meets it head-on and
licks it, or he turns his back on it and starts to wither away.
-- Dr. Boyce, "The Menagerie" ("The Cage"), star date unknown
%
What I've found is that when parsing the file, stringWithContentsOfFile returns a string with the % signs in place. For example:
#"A little suffering is good for the soul.\n\t\t-- Kirk, \"The Corbomite Maneuver\", stardate 1514.0\n%\nA man either lives life as it happens to him, meets it head-on and\nlicks it, or he turns his back on it and starts to wither away.\n\t\t-- Dr. Boyce, \"The Menagerie\" (\"The Cage\"), stardate unknown\n%"
However, when I call componentsSeparatedByCharactersInSet on the file contents, everything is parsed as a string, with the exception of the percent signs, which are NSTaggedPointerString. When I print out the lines, the percent signs are gone.
Is this because the percent sign is a format specifier for strings? I would think in that case that the initial content pull would escape those.
Here's the code:
NSFileManager *fileManager;
fileManager = [NSFileManager defaultManager];
NSStringEncoding stringEncoding;
// NSString *fileContents = [NSString stringWithContentsOfFile:fileName encoding:NSASCIIStringEncoding error:nil];
NSString *fileContents = [NSString stringWithContentsOfFile:fileName usedEncoding:&stringEncoding error:nil];
NSArray *fileLines = [fileContents componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
The used encoding ends up being UTF-8. You can see I have also tried specifying plain ASCII, but it yields the same results.
So the question is, how do I retain the percent signs? Or, may I should use it as the separator character and then parse each of the subsequent results individually.
You are calling NSLog() but passing the line strings as the format string. Something like:
NSLog(lineString);
Therefore, any percent characters in the line strings are interpreted as format specifiers. You should (almost) never pass strings that come from outside sources — i.e. strings which are not hard-coded in your code — as format strings to any function (NSLog(), printf(), +[NSString stringWithFormat:], etc.). It's not safe and you'll sometimes get unexpected results like you've seen.
You should always log a single string like this:
NSLog(#"%#", lineString);
That is, you need to pass a hard-coded format string and use the foreign string as data for that to format.
NSTaggedPointerString is just subclass of NSString. You can use anywhere as NSString.
But in your string
#"A little suffering is good for the soul.\n\t\t-- Kirk, \"The Corbomite Maneuver\", stardate 1514.0\n%\nA man either lives life as it happens to him, meets it head-on and\nlicks it, or he turns his back on it and starts to wither away.\n\t\t-- Dr. Boyce, \"The Menagerie\" (\"The Cage\"), stardate unknown\n%"
sign % is not percent sign. in Objective-C percent sign is declared as double of % mark
#"%%"

Objc-DataFile-Unreadable Substring-Unknown to any encoding

I have a DataFile, built by subsidiairy Application. I need to locate some substring contained in the data file. They are identifiable by the character symbols delimiting them. For instance : *!substringqSxt .The substring will vary from a project to another so I need to locate the symbols delimiting them to read the following substring. I also printed the file to different encodings trying which one was used and matched the original data file. found it was MacOsRomanStringEncoding.
I use NSRange:rangOfStringto locate the delimiting symbols. Here is my code :
char *debutAudio ="jjbj";
char *finAudio ="qSxt";
NSString *debutAudioConverted = [[NSString alloc]
initWithCString: debutAudio
encoding:NSMacOSRomanStringEncoding];
NSString *finAudioConverted = [[NSString alloc]
initWithCString: finAudio
encoding:NSMacOSRomanStringEncoding];
NSRange debutaudioRange =[dataFileContent rangeOfString:debutAudioConverted];
NSRange finaudioRange =[dataFileContent rangeOfString:finAudioConverted];
NSLog(#"range is %#",NSStringFromRange(debutaudioRange));
NSLog(#"range is %#",NSStringFromRange(finaudioRange));
Both NSLog returns range is {9223372036854775807, 0}
so not locating the delimiting strings there.
And if I ask to look for other strings contained in the file like "Settings" the rangeOfString will return the proper location and length.
I thought the file may contain multiple encodings, and tried converting with initWithCStringto any possible encoding but nothing would do.
Also if I open the file in text edit and use the "Find" function, it will not locate the delimiting string, but will locate other words. My guts tell me its related. I dont know where to look for info. Could the file be protected, I am reading a copy of it though.
I have found the problem occuring here. The proper encoding is still MacOsRoman. The problem is the prefix string *debutAudio "jjbj"there is actually a tiny space , like a quarter space between each characters. I have tried every unicode spaces listed here :https://www.cs.tut.fi/~jkorpela/chars/spaces.html#adj
without any success. Now I will tried to find some half or quarter space under MacOsRoman see if that is working.

How to put string into string after specific string

I am new in programming. I have string NSString *string = #"\U0420\U043e\U0437\U044b"; and after each slash('\') i need put another slash to get string like this #"\\U0420\\U043e\\U0437\\U044b"
I am new to programming and objective-c. please help.
My original answer was:
Use [NSString stringByReplacingOccurrencesOfString:withString:] (reference).
NSString *string = #"\U0420\U043e\U0437\U044b";
NSString *converted = [string stringByReplacingOccurrencesOfString:#"\\"
withString:#"\\\\\\"];
However I now don't think that's right given the \ characters won't actually exist in string; instead the compiler will convert each of those sequences into a unicode character. You will need to encode string as this:
NSString *string = #"\\U0420\\U043e\\U0437\\U044b";
In order to use the above code. I cannot see any alternative to this.
Further Update: Often when I've come across questions like this there is a confusion between string literals and string data. In your question those \ characters won't appear as the compiler will have converted them into unicode characters (\Uxxx is a unicode escape sequence for a single character). However if you provided a string like that at runtime (say read from a text file) then those \ characters will exist and you can use the code above.

objective c UTF8String not working with japanese

I would like to show the NSString below on my UILabel:
NSString *strValue=#"你好";
but i can not show it on my UILabel i get strange characters!
I use this code to show the text:
[NSString stringWithCString:[strValue UTF8String] encoding:NSUTF8StringEncoding];
I tried [NSString stringWithCString:[strValue cStringUsingEncoding:NSISOLatin1StringEncoding] encoding:NSUTF8StringEncoding] and it worked
but i can not show emoticons with cStringUsingEncoding:NSISOLatin1StringEncoding so i have to use UTF8String.
Any help appreciated.
Your source file is in UTF-8, but the compiler you are using thinks it's ISO-Latin 1. What you think is the string #"你好" is actually the string #"你好". But when you ask NSString* to give you this back as ISO-Latin 1, and treat it as UTF-8, you've reversed the process the compiler took and you end up with the original string.
One solution that you can use here is to tell your compiler what encoding your source file is in. There is a compiler flag (for GCC it's -finput-charset=UTF-8, not sure about clang) that will tell the compiler what encoding to use. Curiously, UTF-8 should be the default already, but perhaps you're overriding this with a locale.
A more portable solution is to use only ASCII in your source file. You can accomplish this by replacing the non-ASCII chars with a string escape using \u1234 or \U12345678. In your case, you'd use
NSString *strValue=#"\u4F60\u597D";
Of course, once you get your string constant to be correct, you can ditch the whole encoding stuff and just use strValue directly.

How do I read a specific line from a large text file with Objective-C?

Say I have text file my.txt like this
this is line 1
this is line 2
....
this is line 999999
this is line 1000000
In Unix I can get the line of "this is line 1000" by issuing command like "head -1000 my.txt | tail -1". What is the corresponding way to get this in Objective-C?
If it's not too inefficient to have the whole thing in memory at once then the most compact sequence of calls (which I've expanded onto multiple lines for simpler exposition) would be:
NSError *error = nil;
NSString *sourceString = [NSString stringWithContentsOfFile:#"..."
encoding:NSUTF8StringEncoding error:&error];
NSArray *lines = [sourceString componentsSeparatedByCharactersInSet:
[NSCharacterSet newlineCharacterSet]];
NSString *relevantLine = [lines objectAtIndex:1000];
You should check the value of error and the count of lines for validation.
EDIT: to compare to Nathan's answer, the benefit of splitting by characters in set is that you'll accept any of the five unicode characters that can possibly delimit a line break, with anywhere where several of them sit next to each other counting as only one break (as per e.g. \r\n).
NSInputStream is probably what you're going to have to deal with if memory footprint is an issue, which is barely more evolved than C's stdio.h fopen/fread/etc so you're going to have to write your own little loop to dash through.
The answer does not explain how to read a file too LARGE to keep in memory. There is not nice solution in Objective-C for reading large text files without putting them into memory (which isn't always an option).
In these case I like to use the c methods:
FILE* file = fopen("path to my file", "r");
size_t length;
char *cLine = fgetln(file,&length);
while (length>0) {
char str[length+1];
strncpy(str, cLine, length);
str[length] = '\0';
NSString *line = [NSString stringWithFormat:#"%s",str];
% Do what you want here.
cLine = fgetln(file,&length);
}
Note that fgetln will not keep your newline character. Also, We +1 the length of the str because we want to make space for the NULL termination.
The simplest is to just load the file using one of the NSString file methods and then use the -[NSString componentsSeparatedByString:] method to get an array of every line.
Or you could use NSScanner, scan for newline/carriage return characters counting them until you get to you line of interest.
If you are really concerned about memory usage you could look at NSInputStream use that to read in the file, keeping count of the number of newlines. It a shame that NSScanner doesn't work with NSInputStream.
I don't think this is an exact duplicate, because it sounds like you want to skip some lines in the file, but you could easily use an approach like the one here:
Objective-C: Reading a file line by line (Specific answer that has some sample code)
Loop on the input file, reading in a chunk of data, and look for newlines. Count them up and when you hit the right number, output the data after that one and until the next.
Your example looks like you might have hundreds of thousands of lines, so definitely don't just read in the file into a NSString, and definitely don't convert it to an NSArray.
If you want to do it the fancier NSInputStream way (which has some key advantages in character set decoding), here is a great example that shows the basic idea of polling to consume all of the data from a stream source (in a file example, its somewhat overkill). Its for output, but the idea is fine for input too:
Polling versus Run Loop Scheduling