Objective-C Case-Insensitivity and Turkish Characters - objective-c

I have a regular expression that searches strings and then wraps them within certain html tags. The problem is that two Turkish characters (İ and ı) do not get matched against their lower or upper cases. So they cannot be wrapped properly.
To be more precise:
i and even İ is not matched against İ (it probably becomes "I")
I is not matched against ı (it probably becomes "i")
Example:
Search term is İskendername.
The string contains it exactly as it is (İskendername) but there are no matches at all.
Here is my code:
NSString *regex_pattern = [[NSArray arrayWithObjects:#"(", search_term, #")(?![^<>]*>)",nil] componentsJoinedByString:#""];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:regex_pattern options:NSRegularExpressionCaseInsensitive error:&error];
string_to_be_searched = [regex stringByReplacingMatchesInString:string_to_be_searched options:0 range:NSMakeRange(0, [stringByReplacingMatchesInString:string_to_be_searched length]) withTemplate:#"<div class=""highlight"">$1</div>"];

Solved it myself. Here is how:
There was no way I could get any kind of NS.. options to support Turkish characters. A lossy conversion causes defect in my rendered content. So here is how I sorted it out:
As I have stated, there is this problem that -I- is understood as -i- and -i- is treated as I but that is not the case with Turkish alphabet. We have a lowercase -ı- and an uppercase -İ-.
What I have done was changing my regular expression. So basically I went through all the letters in the NSString and replaced problematic ones (I and i) with [iİıI] so my regular expression would accept them regardless of their having a dot on top or not !
Here is the code in case someone needs it..
- (NSString*)returnRegexPatternForSearchString:(NSString *)search_string
{
NSString *regex_pattern = [[NSString alloc] init];
for(int i =0 ;i<[search_string length]; i++)
{
if([[search_string substringWithRange:NSMakeRange(i, 1)] isEqualToString:#"ı"] || [[search_string substringWithRange:NSMakeRange(i, 1)] isEqualToString:#"I"])
{
regex_pattern = [regex_pattern stringByAppendingString:#"[ıI]"];
}
else if([[search_string substringWithRange:NSMakeRange(i, 1)] isEqualToString:#"i"] || [[search_string substringWithRange:NSMakeRange(i, 1)] isEqualToString:#"İ"])
{
regex_pattern = [regex_pattern stringByAppendingString:#"[iİıI]"];
}
else
{
regex_pattern = [regex_pattern stringByAppendingString:[search_string substringWithRange:NSMakeRange(i, 1)]];
}
}
return regex_pattern;
}

Related

How to display persian script through unicode

Someone please help me displaying this string in persian script: "\u0622\u062f\u0631\u0633 \u0627\u06cc\u0645\u06cc\u0644"
I have tried using
NSData *data = [yourtext dataUsingEncoding:NSUTF8StringEncoding];
NSString *decodevalue = [[NSString alloc] initWithData:dataencoding:NSNonLossyASCIIStringEncoding];
and this gets returned: u0622u062fu0631u0633 u0627u06ccu0645u06ccu0644
I want the same solution for objective C: https://www.codeproject.com/Questions/714169/Conversion-from-Unicode-to-Original-format-csharp
I assume that your input string has backslash-escaped codes (as if it was in a source code file verbatim), and you want to parse the escape sequences it into a unicode string, and also want to preserve the unescaped characters as they are.
This is what I've came up with:
NSError *badRegexError;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(\\\\u([a-f0-9]{4})|.)" options:0 error:&badRegexError];
if (badRegexError) {
NSLog(#"bad regex: %#", badRegexError);
return;
}
NSString *input = #"\\u0622\\u062f\\u0631\\u0633 123 test -_- \\u0627\\u06cc\\u0645\\u06cc\\u0644";
NSMutableString *output = [NSMutableString new];
[regex enumerateMatchesInString:input options:0 range:NSMakeRange(0, input.length)
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop)
{
NSRange codeRange = [result rangeAtIndex:2];
if (codeRange.location != NSNotFound) {
NSString *codeStr = [input substringWithRange:codeRange];
NSScanner *scanner = [NSScanner scannerWithString:codeStr];
unsigned int code;
if ([scanner scanHexInt:&code]) {
unichar c = (unichar)code;
[output appendString:[NSString stringWithCharacters:&c length:1]];
}
} else {
[output appendString:[input substringWithRange:result.range]];
}
}];
NSLog(#" actual: %#", output);
NSLog(#"expected: %#", #"\u0622\u062f\u0631\u0633 123 test -_- \u0627\u06cc\u0645\u06cc\u0644");
Explanation
This is using a regex that finds blocks of 6 characters like \uXXXX, for example \u062f. It extracts the code as a string like 062f, and then uses NSScanner.scanHexInt to convert it to a number. It assumes that this number is a valid unichar, and builds a string from it.
Note \\\\ in the regex, because first the objc compiler one layer of slashes, and it becomes \\, and then the regex compiler removes the 2nd layer of slashes and it becomes \ which is used for exact matching. If you have just "u0622u062f..." (without slashes), try removing \\\\ from the regex.
The second part of the regex (|.) treats non-escaped characters as is.
Caveats
You also might want to make the matching case insensitive by setting proper regex options.
This doesn't handle invalid character codes.
This is not the most performant solution, and you'd better use a proper parsing library to do this at scale.
Related docs and links
https://developer.apple.com/documentation/foundation/nsregularexpression?language=objc
https://developer.apple.com/documentation/foundation/nsregularexpression/1409687-enumeratematchesinstring?language=objc
How do you use NSRegularExpression's replacementStringForResult:inString:offset:template:
https://developer.apple.com/documentation/foundation/nstextcheckingresult?language=objc
xcode UTF-8 literals
Objective-C parse hex string to integer
just copy and paste this phrase to python shell and press "Enter" you will see this phrase in Farsi or Persian language. the result is: ایمیل آدرس

Objective-C NSString character substitution

I have a NSString category I am working on to perform character substitution similar to PHP's strtr. This method takes a string and replaces every occurrence of each character in fromString and replaces it with the character in toString with the same index. I have a working method but it is not very performant and would like to make it quicker and able to handle megabytes of data.
Edit (for clarity):
stringByReplacingOccurrencesOfString:withString:options:range: will not work. I have to take a string like "ABC" and after replacing "A" with "B" and "B" with "A" end up with "BAC". Successive invocations of stringByReplacingOccurrencesOfString:withString:options:range: would make a string like "AAC" which would be incorrect.
Suggestions would be great, sample code would be even better!
Code:
- (NSString *)stringBySubstitutingCharactersFromString:(NSString *)fromString
toString:(NSString *)toString;
{
NSMutableString *substitutedString = [self mutableCopy];
NSString *aCharacterString;
NSUInteger characterIndex
, stringLength = substitutedString.length;
for (NSUInteger i = 0; i < stringLength; ++i) {
aCharacterString = [NSString stringWithFormat: #"%C", [substitutedString characterAtIndex:i]];
characterIndex = [fromString rangeOfString:aCharacterString].location;
if (characterIndex == NSNotFound) continue;
[substitutedString replaceCharactersInRange:NSMakeRange(i, 1)
withString:[NSString stringWithFormat:#"%C", [toString characterAtIndex:characterIndex]]];
}
return substitutedString;
}
Also this code is executed after every change to text in a text view. It is passed the entire string every time. I know that there is a better way to do it, but I do not know how. Any suggestions for this would be most certainly appreciated!
You can make that kind of string substitution with NSRegularExpression either modifying an mutable string or creating a new immutable string. It will work with any two strings to substitute (even if they are more than one symbol) but you will need to escape any character that means something in a regular expression (like \ [ ( . * ? + etc).
The pattern finds either of the two substrings with the optional "anything" between and than replaces them with the two substrings with each other preserving the optional string between them.
// These string can be of any length
NSString *firstString = #"Axa";
NSString *secondString = #"By";
// Escaping of characters used in regular expressions has NOT been done here
NSString *pattern = [NSString stringWithFormat:#"(%#|%#)(.*?)(%#|%#)", firstString, secondString, firstString, secondString];
NSString *string = #"AxaByCAxaCBy";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:NSRegularExpressionCaseInsensitive
error:&error];
if (error) {
// Insert error handling here...
}
NSString *modifiedString = [regex stringByReplacingMatchesInString:string
options:0
range:NSMakeRange(0, [string length])
withTemplate:#"$3$2$1"];
NSLog(#"Before:\t%#", string); // AxaByCAxaCBy
NSLog(#"After: \t%#", modifiedString); // ByAxaCByCAxa

Objective-C: Regular expression beyond a pattern of hex value

I'm trying to detect the text in a text view whether it contains anything beyond a pattern of hex value \u00 - \u7f or not and then do something. Please take a look at this code:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"[\x00-\x7f]"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString:textView.text
options:0
range:NSMakeRange(0, [textView.text length])];
if (!NSEqualRanges(rangeOfFirstMatch, NSMakeRange(NSNotFound, 0)))
{
// do statement 1
}
else
{
// do statement 2
}
From above, if the text view contains both text inside and outside [\u00 - \u7f] this will do statement 1 but what I want is do statement 2.
In my opinion, it should have the regular expression opposite to this pattern but I don't know what it is. Any suggestions are welcome, thank you.
A carat ('^') negates a character class, so [^\u00-\u7f] will match any character except those in the range '\u00' through '\u7f'.
You could also use rangeOfCharacterFromSet: or canBeConvertedToEncoding: to check whether a string has any non-ASCII characters.
rangeOfCharacterFromSet:
NSRange ASCIIRange = NSMakeRange(0, 0x80);
NSCharacterSet *nonASCIICharSet = [[NSCharacterSet characterSetWithRange:ASCIIRange] invertedSet];
NSRange nonASCIIChars = [textView.text rangeOfCharacterFromSet:nonASCIICharSet];
if (nonASCIIChars.location == NSNotFound) {
...
} else {
// textView.text contains non-ASCII characters
...
}
canBeConvertedToEncoding:
if ([textView.text canBeConvertedToEncoding:NSASCIIStringEncoding]) {
...
} else {
// textView.text contains non-ASCII characters
...
}

How to get regex to grab all letters from an objective c string?

I'm trying to get the following regular expression to grab only the letters from an alpha-numeric character input box, however it's always returning the full string, and not any of the A-Z letters.
What am I doing wrong?
It needs to grab all the letters only. No weird characters and no numbers, just A-Z and put it into a string for me to use later on.
// A default follows
NSString *TAXCODE = txtTaxCode.text;
// Setup default for taxcode
if ([TAXCODE length] ==0)
{
TAXCODE = #"647L";
}
NSError *error = NULL;
NSRegularExpression *regex;
regex = [NSRegularExpression regularExpressionWithPattern:#"/[^A-Z]/gi"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSLog(#"TAXCODE = %#", TAXCODE);
NSLog(#"TAXCODE.length = %d", [TAXCODE length]);
NSLog(#"STC (before regex) = %#", STC);
STC = [regex stringByReplacingMatchesInString:TAXCODE
options:0
range:NSMakeRange(0, [TAXCODE length])
withTemplate:#""];
NSLog(#"STC (after regex) = %#", STC);
My debug output is as follows:
TAXCODE = 647L
TAXCODE.length = 4
STC (before regex) =
STC (after regex) = 647L
If you only ever going to have letters on one end then you could use.
NSString *TAXCODE =#"647L";
NSString *newcode = [TAXCODE stringByTrimmingCharactersInSet:[NSCharacterSet decimalDigitCharacterSet]];
If intermixed letters then you can get an Array that you can then play with.
NSString *TAXCODE =#"L6J47L";
NSArray *newcodeArray = [TAXCODE componentsSeparatedByCharactersInSet:[NSCharacterSet decimalDigitCharacterSet]];
I think you need to drop the perl syntax on the regexp. Use #"[^A-Z]" as the match string.

How to capitalize the first word of the sentence in Objective-C?

I've already found how to capitalize all words of the sentence, but not the first word only.
NSString *txt =#"hi my friends!"
[txt capitalizedString];
I don't want to change to lower case and capitalize the first char. I'd like to capitalize the first word only without change the others.
Here is another go at it:
NSString *txt = #"hi my friends!";
txt = [txt stringByReplacingCharactersInRange:NSMakeRange(0,1) withString:[[txt substringToIndex:1] uppercaseString]];
For Swift language:
txt.replaceRange(txt.startIndex...txt.startIndex, with: String(txt[txt.startIndex]).capitalizedString)
The accepted answer is wrong. First, it is not correct to treat the units of NSString as "characters" in the sense that a user expects. There are surrogate pairs. There are combining sequences. Splitting those will produce incorrect results. Second, it is not necessarily the case that uppercasing the first character produces the same result as capitalizing a word containing that character. Languages can be context-sensitive.
The correct way to do this is to get the frameworks to identify words (and possibly sentences) in the locale-appropriate manner. And also to capitalize in the locale-appropriate manner.
[aMutableString enumerateSubstringsInRange:NSMakeRange(0, [aMutableString length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
[aMutableString replaceCharactersInRange:substringRange
withString:[substring capitalizedStringWithLocale:[NSLocale currentLocale]]];
*stop = YES;
}];
It's possible that the first word of a string is not the same as the first word of the first sentence of a string. To identify the first (or each) sentence of the string and then capitalize the first word of that (or those), then surround the above in an outer invocation of -enumerateSubstringsInRange:options:usingBlock: using NSStringEnumerationBySentences | NSStringEnumerationLocalized. In the inner invocation, pass the substringRange provided by the outer invocation as the range argument.
Use
- (NSArray *)componentsSeparatedByCharactersInSet:(NSCharacterSet *)separator
and capitalize the first object in the array and then use
- (NSString *)componentsJoinedByString:(NSString *)separator
to join them back
pString = [pString
stringByReplacingCharactersInRange:NSMakeRange(0,1)
withString:[[pString substringToIndex:1] capitalizedString]];
you can user with regular expression i have done it's works for me simple you can paste below code
+(NSString*)CaptializeFirstCharacterOfSentence:(NSString*)sentence{
NSMutableString *firstCharacter = [sentence mutableCopy];
NSString *pattern = #"(^|\\.|\\?|\\!)\\s*(\\p{Letter})";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:NULL];
[regex enumerateMatchesInString:sentence options:0 range:NSMakeRange(0, [sentence length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
//NSLog(#"%#", result);
NSRange r = [result rangeAtIndex:2];
[firstCharacter replaceCharactersInRange:r withString:[[sentence substringWithRange:r] uppercaseString]];
}];
NSLog(#"%#", firstCharacter);
return firstCharacter;
}
//Call this method
NsString *resultSentence = [UserClass CaptializeFirstCharacterOfSentence:yourTexthere];
An alternative solution in Swift:
var str = "hello"
if count(str) > 0 {
str.splice(String(str.removeAtIndex(str.startIndex)).uppercaseString, atIndex: str.startIndex)
}
For the sake of having options, I'd suggest:
NSString *myString = [NSString stringWithFormat:#"this is a string..."];
char *tmpStr = calloc([myString length] + 1,sizeof(char));
[myString getCString:tmpStr maxLength:[myString length] + 1 encoding:NSUTF8StringEncoding];
int sIndex = 0;
/* skip non-alpha characters at beginning of string */
while (!isalpha(tmpStr[sIndex])) {
sIndex++;
}
toupper(tmpStr[sIndex]);
myString = [NSString stringWithCString:tmpStr encoding:NSUTF8StringEncoding];
I'm at work and don't have my Mac to test this on, but if I remember correctly, you couldn't use [myString cStringUsingEncoding:NSUTF8StringEncoding] because it returns a const char *.
In swift you can do it as followed by using this extension:
extension String {
func ucfirst() -> String {
return (self as NSString).stringByReplacingCharactersInRange(NSMakeRange(0, 1), withString: (self as NSString).substringToIndex(1).uppercaseString)
}
}
calling your string like this:
var ucfirstString:String = "test".ucfirst()
I know the question asks specifically for an Objective C answer, however here is a solution for Swift 2.0:
let txt = "hi my friends!"
var sentencecaseString = ""
for (index, character) in txt.characters.enumerate() {
if 0 == index {
sentencecaseString += String(character).uppercaseString
} else {
sentencecaseString.append(character)
}
}
Or as an extension:
func sentencecaseString() -> String {
var sentencecaseString = ""
for (index, character) in self.characters.enumerate() {
if 0 == index {
sentencecaseString += String(character).uppercaseString
} else {
sentencecaseString.append(character)
}
}
return sentencecaseString
}