I am parsing a tab seperated list using a NSScanner based upon each line and the tabs. However for some reason the last field in the array (parsed from each row) contains a \r character.
How can I strip this from the NSString that represents the line (or the field)
If the \r character is at the end (probably because the file being parsed is CRLF), you can just do something like [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]. (You might want to use an explicitly created '\r' character set instead if you don't want to strip whitespace as well.)
Try using the +[NSCharacterSet newlineCharacterSet] method with NSScanner in your various scanning method calls.
Just FYI, The \r is part of the line ending for a file created in a windows environment.
Related
I have a DataFile, built by subsidiairy Application. I need to locate some substring contained in the data file. They are identifiable by the character symbols delimiting them. For instance : *!substringqSxt .The substring will vary from a project to another so I need to locate the symbols delimiting them to read the following substring. I also printed the file to different encodings trying which one was used and matched the original data file. found it was MacOsRomanStringEncoding.
I use NSRange:rangOfStringto locate the delimiting symbols. Here is my code :
char *debutAudio ="jjbj";
char *finAudio ="qSxt";
NSString *debutAudioConverted = [[NSString alloc]
initWithCString: debutAudio
encoding:NSMacOSRomanStringEncoding];
NSString *finAudioConverted = [[NSString alloc]
initWithCString: finAudio
encoding:NSMacOSRomanStringEncoding];
NSRange debutaudioRange =[dataFileContent rangeOfString:debutAudioConverted];
NSRange finaudioRange =[dataFileContent rangeOfString:finAudioConverted];
NSLog(#"range is %#",NSStringFromRange(debutaudioRange));
NSLog(#"range is %#",NSStringFromRange(finaudioRange));
Both NSLog returns range is {9223372036854775807, 0}
so not locating the delimiting strings there.
And if I ask to look for other strings contained in the file like "Settings" the rangeOfString will return the proper location and length.
I thought the file may contain multiple encodings, and tried converting with initWithCStringto any possible encoding but nothing would do.
Also if I open the file in text edit and use the "Find" function, it will not locate the delimiting string, but will locate other words. My guts tell me its related. I dont know where to look for info. Could the file be protected, I am reading a copy of it though.
I have found the problem occuring here. The proper encoding is still MacOsRoman. The problem is the prefix string *debutAudio "jjbj"there is actually a tiny space , like a quarter space between each characters. I have tried every unicode spaces listed here :https://www.cs.tut.fi/~jkorpela/chars/spaces.html#adj
without any success. Now I will tried to find some half or quarter space under MacOsRoman see if that is working.
After copying pasting a text from the web, in my mac app NSTextArea, I see
EE
If I copy these 2 letters in a browser I see:
E?E
If I copy them in google translator I get
E 'E
I cannot identify this character in between the two E. But the question is: how do I remove these hidden characters from my NSString?
In your uploaded file the specific hex code for the hidden character is 0x18
(found via Hex Fiend)
This character, along with others are part of a 'control character set'. The set also contains characters such as the tab (0x09) and newline (0x0A) - obviously those we don't want to remove.
In Objective-C, we can use the NSCharacterSet controlCharacterSet in conjunction with whitespaceAndNewlineCharacterSet to get just the blank characters that have no rendered width.
NSMutableCharacterSet* zeroWidthCharacterSet = [[NSCharacterSet controlCharacterSet] mutableCopy];
[zeroWidthCharacterSet formIntersectionWithCharacterSet:[[NSCharacterSet whitespaceAndNewlineCharacterSet] invertedSet]];
Then we can simply use the good old split by character set method
string = [[string componentsSeparatedByCharactersInSet:zeroWidthCharacterSet] componentsJoinedByString:#""];
Note that if a special character that uses more than one UTF8 character to represent itself (like Emoji) uses 0x18 then stripping it will break the character combo
Because the control characters are special, I don't believe you'd ever find them in an Emoji sequence.
I would like to create a character set that includes all of its own characters, as well as those from another character set. Append in other words.
I thought there'd be an obvious way, but after control-space completion in the IDE, and then poking around the docs, I couldn't fine anything.
I can see how to append all the characters from a string. But I need to append the characters from another set. I guess I could to-string the second set, if there's a to-string method.
How do I do this?
You are probably seaching for this method in NSMutableCharacterSet :
- (void)formUnionWithCharacterSet:(NSCharacterSet *)otherSet
From Doc:
Modifies the receiver so it contains all characters that exist in
either the receiver or otherSet.
For Swift 3:
let fullCharset = aCharset.union(anotherCharset)
I download strings from a web server and it contains special characters such as /n /p and so on. What is the best way to get rid of these?
do you have a list of characters that need stripping?
or you could use
[string stringByTrimmingCharactersInSet:[NSCharacterSet newlineCharacterSet]];
I'd have thought your best bet would be to use one of the NSString methods such as stringByReplacingCharactersInRange:withString: (replacing the characters in question with an empty string) or stringByTrimmingCharactersInSet:.
you can use
Str=[Str stringByReplacingOccurrencesOfString:#"/n" withString:#""];
if you have selected special characters use Array of special characters and then will also work.
I'm having an NSString like this:
music that rocks.mp3 e99a65fb
The problem is that between the filename (musicthatrocks.mp3) and the CRC32 checksum (e99a65fb) there could be many more spaces then one. How can I split the line into an array?
I thought of using componentsSeparatedByString, but my problem is that the filename can also contain spaces.
Thanks in advance.
If you're OK with regular expressions, you could do it that way. For example (using RegexKitLite):
NSString * fileName = [line stringByMatching:#"(.*)\\s" capture:1];
An explanation of the regex: (.*) This will match as many characters as it can, until it finds a space character. However, the capture is greedy, which means it's going to grab as many characters as it can before the last space character (in a nutshell).
Or you can use NSString methods to find the last occurrence of the space character and get a substring from the beginning of the line to the last space character.
Or you can split the string base on #" ", throw away the last object in the array, and then recombine the array with #" ".
Use componentsSeparatedByString:.
Get the lastObject, which is the CRC32.
Use subarrayWithRange: to create a new array without the CRC32.
Use componentsJoinedByString: to reconstitute the filename.
You may want to do step 4 in a loop, repeatedly deleting the last object until either the filename exists in the target directory or you run out of empty strings at the end of the array. This handles the case of multiple spaces between filename and CRC32, as well as the case of a filename ending in spaces (pathological but possible).