How do I remove hidden characters from a NSString? - objective-c

After copying pasting a text from the web, in my mac app NSTextArea, I see
EE
If I copy these 2 letters in a browser I see:
E?E
If I copy them in google translator I get
E 'E
I cannot identify this character in between the two E. But the question is: how do I remove these hidden characters from my NSString?

In your uploaded file the specific hex code for the hidden character is 0x18
(found via Hex Fiend)
This character, along with others are part of a 'control character set'. The set also contains characters such as the tab (0x09) and newline (0x0A) - obviously those we don't want to remove.
In Objective-C, we can use the NSCharacterSet controlCharacterSet in conjunction with whitespaceAndNewlineCharacterSet to get just the blank characters that have no rendered width.
NSMutableCharacterSet* zeroWidthCharacterSet = [[NSCharacterSet controlCharacterSet] mutableCopy];
[zeroWidthCharacterSet formIntersectionWithCharacterSet:[[NSCharacterSet whitespaceAndNewlineCharacterSet] invertedSet]];
Then we can simply use the good old split by character set method
string = [[string componentsSeparatedByCharactersInSet:zeroWidthCharacterSet] componentsJoinedByString:#""];
Note that if a special character that uses more than one UTF8 character to represent itself (like Emoji) uses 0x18 then stripping it will break the character combo
Because the control characters are special, I don't believe you'd ever find them in an Emoji sequence.

Related

Displaying special characters in a UILabel

I have a string which contains a mix of normal text and special characters. When setting the label text with this string the character codes are being displayed rather than the actual character. I was wondering is there any support for special characters or if there is a way to decode the values?
I have tried stringWithCString but haven't had any luck with it.
Setting:
self.stringLabel.text = myNSString
Result:
Hello world!
èéêëÄÄÄ
ÿ
ûüùúÅ«
Anyone else come across a similar issue?

UTF8String giving different value for same string objective-c

I am matching strings in condition. Both strings are exactly same.I also trimmed all whitespace and newline characters. But compiler saying both are not same.
I investigate a lot then I identify that both strings have UTF8String value as different.
po otherPersonName
"76000 13590"
po [otherPersonName UTF8String]
"76000 13590"
po findPersonName
"76000 13590"
po [findPersonName UTF8String]
"\xffffffc2\xffffffa076000\xffffffc2\xffffffa013590\xffffffe2\xffffff80\xffffffac"
Can I anyone explain what to do match correctly this strings.
In findPersonName there are non-breaking spaces (U+00A0, UTF-8 C2 A0, which po is showing as \xffffffc2\xffffffa0) at the start and between the numbers, and a POP DIRECTIONAL FORMATTING (U+202C, UTF-8 E2 80 AC, \xffffffe2\xffffff80\xffffffac) at the end (suggesting the value has come from a larger text with mixed scripts, left-to-right and right-to-left).
If these are the only characters that might occur a couple of calls to stringByReplacingOccurrencesOfString:withString: may be used to replace/remove them. However if there are other white space characters then look at other approaches to clean up the string - see NSString, NSCharacterSet, NSRegularExpression etc.
HTH

Objc-DataFile-Unreadable Substring-Unknown to any encoding

I have a DataFile, built by subsidiairy Application. I need to locate some substring contained in the data file. They are identifiable by the character symbols delimiting them. For instance : *!substringqSxt .The substring will vary from a project to another so I need to locate the symbols delimiting them to read the following substring. I also printed the file to different encodings trying which one was used and matched the original data file. found it was MacOsRomanStringEncoding.
I use NSRange:rangOfStringto locate the delimiting symbols. Here is my code :
char *debutAudio ="jjbj";
char *finAudio ="qSxt";
NSString *debutAudioConverted = [[NSString alloc]
initWithCString: debutAudio
encoding:NSMacOSRomanStringEncoding];
NSString *finAudioConverted = [[NSString alloc]
initWithCString: finAudio
encoding:NSMacOSRomanStringEncoding];
NSRange debutaudioRange =[dataFileContent rangeOfString:debutAudioConverted];
NSRange finaudioRange =[dataFileContent rangeOfString:finAudioConverted];
NSLog(#"range is %#",NSStringFromRange(debutaudioRange));
NSLog(#"range is %#",NSStringFromRange(finaudioRange));
Both NSLog returns range is {9223372036854775807, 0}
so not locating the delimiting strings there.
And if I ask to look for other strings contained in the file like "Settings" the rangeOfString will return the proper location and length.
I thought the file may contain multiple encodings, and tried converting with initWithCStringto any possible encoding but nothing would do.
Also if I open the file in text edit and use the "Find" function, it will not locate the delimiting string, but will locate other words. My guts tell me its related. I dont know where to look for info. Could the file be protected, I am reading a copy of it though.
I have found the problem occuring here. The proper encoding is still MacOsRoman. The problem is the prefix string *debutAudio "jjbj"there is actually a tiny space , like a quarter space between each characters. I have tried every unicode spaces listed here :https://www.cs.tut.fi/~jkorpela/chars/spaces.html#adj
without any success. Now I will tried to find some half or quarter space under MacOsRoman see if that is working.

Cocoa: Anomaly Writing TAB Character to File?

I'm using the following to define an NSString containing the Tab character:
#define TAB #"\t"
I'm writing a longer NSString to a file using writeToFile:atomically:encoding:error, using encoding: NSUTF8StringEncoding. This longer NSString contains TAB characters.
When I open the resulting file in TextEdit, I see a character that looks like a Japanese glyph in the place of the TAB character. Here is a screen shot of a line that is intended to have two tab characters, but which has these odd characters instead:
odd characters http://www.market-research-services.com/starpowermedia/for_distribution/tab-char-anomaly.png
What is the correct way to #define an NSString that will contain a TAB character to be written to a file of NSUTF8StringEncoding?
Thanks in advance to all for any info.

How do I remove illegal characters from an NSString?

I am parsing a tab seperated list using a NSScanner based upon each line and the tabs. However for some reason the last field in the array (parsed from each row) contains a \r character.
How can I strip this from the NSString that represents the line (or the field)
If the \r character is at the end (probably because the file being parsed is CRLF), you can just do something like [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]. (You might want to use an explicitly created '\r' character set instead if you don't want to strip whitespace as well.)
Try using the +[NSCharacterSet newlineCharacterSet] method with NSScanner in your various scanning method calls.
Just FYI, The \r is part of the line ending for a file created in a windows environment.