NSDiacriticInsensitiveSearch and arabic search

NSDiacriticInsensitiveSearch and arabic search - ios7

As known, the NSDiacriticInsensitiveSearch does not do the same effect on arabic letters like it does on french. That's why i'm trying to create the same effect but with arabic letters.For example, if a user enters the letter "ا" , the search bar should show all the words containing the letter " ا " and the letter : " أ " at the same time.
The use of the following line :
text = [text stringByReplacingOccurrencesOfString:#"ا" withString:#"أ"];
will not show the results of the words starting with " ا ".
In the search bar, i tried to implement the same NSDiacriticInsensitiveSearch method like i did in the french case, and it didn't work out :
NSRange nameRange = [author.name rangeOfString:text options:NSAnchoredSearch | NSDiacriticInsensitiveSearch];
Any ideas how to get this done ?

You can use the regular expression to handle the Arabic (Alif) different shapes.
Assume that you have a context, that is "محمد بن إبراهيم الابراهيمي", and the pattern to search for is "إبراهيم", then you could convert the pattern to a regular expression that handles the differentiation between the "أ". The regular expression should be "(أ|إ|ا)بر(أ|إ|ا)هيم". This will search for the pattern by its all possible shapes.
Here is a simple code that I wrote:
#import <Foundation/Foundation.h>
NSString * arabify(NSString * string)
{
NSRegularExpression * alifRegex = [NSRegularExpression regularExpressionWithPattern:#"(أ|ا|إ)" options:0 error:nil];
return [alifRegex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:#"(أ|ا|إ)"];
}
int main(int argc, const char * argv[])
{
#autoreleasepool {
NSString * context = #"محمد بن إبراهيم الابراهيمي";
NSString * pattern = #"إبراهيم";
// Get the regex for the Arabic word.
NSString * regex = arabify(pattern);
NSLog(#"context = %#", context);
NSLog(#"pattern = %#", pattern);
NSLog(#"regex = %#", regex);
NSRange range = [context rangeOfString:regex options:NSRegularExpressionSearch];
if (range.location == NSNotFound)
{
NSLog(#"Not found.");
}
else
{
NSLog(#"Found.");
NSLog(#"location = %lu, length = %lu", (unsigned long)range.location, (unsigned long)range.length);
}
}
return 0;
}
Good luck brother.

It seems that you are using the compound symbol (U+0623), which does not collate with other representations of Alif.
Did you consider other encoding methods for the Alif? You could use the decomposed variant, which then would collate with the "plain" Alif (U+0627) just how you intend:
ARABIC LETTER ALEF (U+0627) ARABIC HAMZA ABOVE (U+0654)
See here: http://www.fileformat.info/info/unicode/char/0623/index.htm

Related

In my macOS application, I am working with UserDefaults dictionaryRepresentation. Sometimes I get strings with unknown encoding. Any suggesition?

I am working with a Objective-C Application, specifically I am gathering the dictionary representation of NSUserDefaults with this code:
NSUserDefaults *defaults = [NSUserDefaults standardUserDefaults];
NSDictionary *userDefaultsDict = [defaults dictionaryRepresentation];
While enumerating keys and objects of the resulting dict, sometimes I find a kind of opaque string that you can see in the following picture:
So it seems like an encoding problem.
If I try to print description of the string, the debugger correctly prints:
Printing description of obj:
tsuqsx
However, if I try to write obj to a file, or use it in any other way, I get an unreadable output like this:
What I would like to achieve is the following:
Detect in some way that the string has the encoding problem.
Convert the string to UTF8 encoding to use it in the rest of the program.
Any help is greatly appreciated. Thanks
EDIT: Very Hacky possible Solution that helps explaining what I am trying to do.
After trying all possible solutions based on dataUsingEncoding and back, I ended up with the following solution, absolutely weird, but I post it here, in the hope that it can help somebody to guess the encoding and what to do with unprintable characters:
- (BOOL)isProblematicString:(NSString *)candidateString {
BOOL returnValue = YES;
if ([candidateString length] <= 2) {
return NO;
}
const char *temp = [candidateString UTF8String];
long length = temp[0];
char *dest = malloc(length + 1);
long ctr = 1;
long usefulCounter = 0;
for (ctr = 1;ctr <= length;ctr++) {
if ((ctr - 1) % 3 == 0) {
memcpy(&dest[ctr - usefulCounter - 1],&temp[ctr],1);
} else {
if (ctr != 1 && ctr < [candidateString length]) {
if (temp[ctr] < 0x10 || temp[ctr] > 0x1F) {
returnValue = NO;
}
}
usefulCounter += 1;
}
}
memset(&dest[length],0,1);
free(dest);
return returnValue;
}
- (NSString *)utf8StringFromUnknownEncodedString:(NSString*)originalUnknownString {
const char *temp = [originalUnknownString UTF8String];
long length = temp[0];
char *dest = malloc(length + 1);
long ctr = 1;
long usefulCounter = 0;
for (ctr = 1;ctr <= length;ctr++) {
if ((ctr - 1) % 3 == 0) {
memcpy(&dest[ctr - usefulCounter - 1],&temp[ctr],1);
} else {
usefulCounter += 1;
}
}
memset(&dest[length],0,1);
NSString *returnValue = [[NSString alloc] initWithUTF8String:dest];
free(dest);
return returnValue;
}
This returns me a string that I can use to build a full UTF8 string. I am looking for a clean solution. Any help is greatly appreciated. Thanks

We're talking about a string which comes from the /Library/Preferences/.GlobalPreferences.plist
(key com.apple.preferences.timezone.new.selected_city).
NSString *city = [[NSUserDefaults standardUserDefaults]
stringForKey:#"com.apple.preferences.timezone.new.selected_city"];
NSLog(#"%#", city); // \^Zt\^\\^]s\^]\^\u\^V\^_q\^]\^[s\^W\^Zx\^P
(lldb) p [city description]
(__NSCFString *) $1 = 0x0000600003f6c240 #"\x1at\x1c\x1ds\x1d\x1cu\x16\x1fq\x1d\x1bs\x17\x1ax\x10"
What I would like to achieve is the following:
Detect in some way that the string has the encoding problem.
Convert the string to UTF8 encoding to use it in the rest of the program.
&
After trying all possible solutions based on dataUsingEncoding and back.
This string has no encoding problem and characters like \x1a, \x1c, ... are valid characters.
You can call dataUsingEncoding: with ASCII, UTF-8, ... but all these characters will still be
present. They're called control characters (or non-printing characters). The linked Wikipedia page explains what these characters are and how they're defined in ASCII, extended ASCII and unicode.
What you're looking for is a way how to remove control characters from a string.
Remove control characters
We can create a category for our new method:
#interface NSString (ControlCharacters)
- (NSString *)stringByRemovingControlCharacters;
#end
#implementation NSString (ControlCharacters)
- (NSString *)stringByRemovingControlCharacters {
// TODO Remove control characters
return self;
}
#end
In all examples below, the city variable is created in this way ...
NSString *city = [[NSUserDefaults standardUserDefaults]
stringForKey:#"com.apple.preferences.timezone.new.selected_city"];
... and contains #"\x1at\x1c\x1ds\x1d\x1cu\x16\x1fq\x1d\x1bs\x17\x1ax\x10". Also all
examples below were tested with the following code:
NSString *cityWithoutCC = [city stringByRemovingControlCharacters];
// tsuqsx
NSLog(#"%#", cityWithoutCC);
// {length = 6, bytes = 0x747375717378}
NSLog(#"%#", [cityWithoutCC dataUsingEncoding:NSUTF8StringEncoding]);
Split & join
One way is to utilize the NSCharacterSet.controlCharacterSet.
There's a stringByTrimmingCharactersInSet:
method (NSString), but it removes these characters from the beginning/end only,
which is not what you're looking for. There's a trick you can use:
- (NSString *)stringByRemovingControlCharacters {
NSArray<NSString *> *components = [self componentsSeparatedByCharactersInSet:NSCharacterSet.controlCharacterSet];
return [components componentsJoinedByString:#""];
}
It splits the string by control characters and then joins these components back. Not a very efficient way, but it works.
ICU transform
Another way is to use ICU transform (see ICU User Guide).
There's a stringByApplyingTransform:reverse:
method (NSString), but it only accepts predefined constants. Documentation says:
The constants defined by the NSStringTransform type offer a subset of the functionality provided by the underlying ICU transform functionality. To apply an ICU transform defined in the ICU User Guide that doesn't have a corresponding NSStringTransform constant, create an instance of NSMutableString and call the applyTransform:reverse:range:updatedRange: method instead.
Let's update our implementation:
- (NSString *)stringByRemovingControlCharacters {
NSMutableString *result = [self mutableCopy];
[result applyTransform:#"[[:Cc:] [:Cf:]] Remove"
reverse:NO
range:NSMakeRange(0, self.length)
updatedRange:nil];
return result;
}
[:Cc:] represents control characters, [:Cf:] represents format characters. Both represents the same character set as the already mentioned NSCharacterSet.controlCharacterSet. Documentation:
A character set containing the characters in Unicode General Category Cc and Cf.
Iterate over characters
NSCharacterSet also offers the characterIsMember: method. Here we need to iterate over characters (unichar) and check if it's a control character or not.
Let's update our implementation:
- (NSString *)stringByRemovingControlCharacters {
if (self.length == 0) {
return self;
}
NSUInteger length = self.length;
unichar characters[length];
[self getCharacters:characters];
NSUInteger resultLength = 0;
unichar result[length];
NSCharacterSet *controlCharacterSet = NSCharacterSet.controlCharacterSet;
for (NSUInteger i = 0 ; i < length ; i++) {
if ([controlCharacterSet characterIsMember:characters[i]] == NO) {
result[resultLength++] = characters[i];
}
}
return [NSString stringWithCharacters:result length:resultLength];
}
Here we filter out all characters (unichar) which belong to the controlCharacterSet.
Other ways
There're other ways how to iterate over characters - for example - Most efficient way to iterate over all the chars in an NSString.
BBEdit & others
Let's write this string to a file:
NSString *city = [[NSUserDefaults standardUserDefaults]
stringForKey:#"com.apple.preferences.timezone.new.selected_city"];
[city writeToFile:#"/Users/zrzka/city.txt"
atomically:YES
encoding:NSUTF8StringEncoding
error:nil];
It's up to the editor how all these controls characters are handled/displayed. Here's en example - Visual Studio Code.
View - Render Control Characters off:
View - Render Control Characters on:
BBEdit displays question marks (upside down), but I'm sure there's a way how to
toggle control characters rendering. Don't have BBEdit installed to verify it.

Create NSArray of keywords starting/ending with % inside a string

I have a long string that contain keywords that start and end with the percent sign. E.g.:
My name is %user_username% and I live at %location_address%. You can
reach me at %user_phone%.
What method would I use to extract all strings that begin and end with % and put those into an NSArray so that I can replace them with their correct text representations?

Assuming that there are no % signs inside your strings of interest (e.g "a%ab%b%c"), you could use the componentsSeparatedByString: or componentsSeparatedByCharactersInSet: to get an array of strings separated by the % sign. From there, it's pretty easy to figure out which strings in that array are between the percent signs, and which are unnecessary.
I think internally though, those methods are likely implemented as something like a loop looking for %s. Maybe they parallelize the search on big strings, or use special knowledge of the internal structure of the string to make things faster -- those are the only ways I can see to speed up the search, assuming that you're stuck with keeping it all in a % delimited string (if speed is really an issue, then the answer is probably to use an alternative representation).

This is what I came up with that works:
- (NSArray *)replaceKeywords:(NSString *)keywordString {
NSString *start = #"%";
NSString *end = #"%";
NSMutableArray* strings = [NSMutableArray arrayWithCapacity:0];
NSRange startRange = [keywordString rangeOfString:start];
for( ;; ) {
if (startRange.location != NSNotFound) {
NSRange targetRange;
targetRange.location = startRange.location + startRange.length;
targetRange.length = [keywordString length] - targetRange.location;
NSRange endRange = [keywordString rangeOfString:end options:0 range:targetRange];
if (endRange.location != NSNotFound) {
targetRange.length = endRange.location - targetRange.location;
[strings addObject:[keywordString substringWithRange:targetRange]];
NSRange restOfString;
restOfString.location = endRange.location + endRange.length;
restOfString.length = [keywordString length] - restOfString.location;
startRange = [keywordString rangeOfString:start options:0 range:restOfString];
} else {
break;
}
} else {
break;
}
}
return strings;
}
I slightly modified the method from Get String Between Two Other Strings in ObjC

How to check if an NSString contains fancy characters?

I have a game that renders the player's nickname.
Normally, I use a nice, styled, bitmap font to render the nickname. However, I only have bitmaps for "normal" characters - A,B,C,...,1,2,3,...!##$%^,.... There are no bitmaps for Chinese, Japanese or whatever other "fancy" characters in any other language.
Trying to render such text with a bitmap will crash because I don't supply such bitmaps. Therefore I decided to detect whether the given string was a "fancy" string, and if that was the case, render the nickname using some generated system font.
How can I detect if a string has fancy characters? My current solution is something like
-(BOOL)isNormalText:(NSString *)text {
char accepted[] = {"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890!##$%^&*()_+{}/\\\"\'?.,"};
for (int i = 0; i < [text length]; ++i) {
char character = [text characterAtIndex:i];
BOOL found = NO;
for (int j = 0; j < 84 && !found; ++j) {
char acceptedChar = accepted[j];
if (character == acceptedChar) {
found = YES;
}
}
if (!found) {
return NO;
}
}
return YES;
}
Which does NOT work, I think. Because a fancy character is not one character - it is a sequence like "\u123".
I have seen a question, in Java, about something similar here: How to check if the word is Japanese or English?
They check if the character value is within the 255 range. But when I do this check in Objective-C, it tells me it is redundant because a char will always be within such range - which makes sense as I imagine the fancy characters to be actually a sequence like "\u123"...

Use an NSCharacterSet, fill it with the characters that you have bitmaps for, then invert the set so that it represents all characters that you don't have. Then use -[NSString rangeOfCharacterFromSet:]. If it returns NSNotFound then the string contains only valid characters.
Just as an example to illustrate what I mean:
- (BOOL) isNormalText:(NSString *) str
{
if (str == nil)
return NO;
NSCharacterSet *allowedChars = [NSCharacterSet characterSetWithCharactersInString:#"ABCDEFG"];
NSCharacterSet *notAllowedChars = [allowedChars invertedSet];
return [str rangeOfCharacterFromSet:notAllowedChars].location == NSNotFound;
}

Use regular expression checking
-(BOOL)isNormalText:(NSString *)text {
NSString * regex = #"(^[A-Za-z0-9]*$)";
NSPredicate * pred = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", regex];
BOOL isMatch = [pred evaluateWithObject:text];
return isMatch;
}

Replace substrings with characters in Cocoa

I'm quite new to Cocoa programming and I'm tiring my best to create a program which will have the user input text into a text field and then press a button. When the button is pressed the text is supposed to replace certain substrings to certain characters. None of the substrings are longer than 2 characters, though some are a single character long. After the replacement has been performed the newly acquired text is to be put into another textfield.
Examples of substring replacements may be that "n" is supposed to be changed to "5", "nj" is supposed to be changed to "g" and "ng" is to be changed to "s". So the text "Inject the syringe now!" would be changed to "Igect the syrise 5ow!"
How can I achieve this in a simple and elegant way? I have tried the following code but it doesn't seem to work.
- (IBAction)convert:(id)sender {
NSMutableString *x;
[x setString:[self.input stringValue]];
NSMutableString *output1;
[output1 setString:#""];
NSMutableString *middle;
middle = [[NSMutableString alloc] init];
int s;
unsigned long length = [x length];
for (s = 0; s < length; s = s + 1) {
if (s + 2 <= length) { // if more than or equal to two characters left
[middle setString:[x substringWithRange:NSMakeRange(s, 2)]];
if ([middle isEqualToString:#"nj"]) {
[output1 appendToString:#"g"];
s = s+1;
} else if ([middle isEqualToString:#"ng"]) {
[output1 appendToString:#"s"];
s = s+1;
} else { // if no two-character sequence matched
[middle setString:[x substringWithRange:NSMakeRange(s, 1)]];
if ([middle isEqualToString:#"n"]) {
[output1 appendString:#"5"];
}
}
} else { // if less than two characters left
[middle setString:[x substringWithRange:NSMakeRange(s, 1)]];
if ([middle isEqualToString:#"n"]) {
[output1 appendString:#"5"];
}
}
}
[self.output setStringValue:output1];
}
Here, *x is where the text from input goes, *output1 is where the result is stored, *middle consists of the piece of text being tested, and input and output are the NSTextFields.

I guess you could achieve what you want with a quite a few different methods. Here is a simple one:
Define a map for values/replacements
Sort them by length (largest length first)
Match and replace
Something like this perhaps:
#import <Foundation/Foundation.h>
NSString * matchAndReplace(NSString *input, NSDictionary *map){
NSMutableString *_input = [input mutableCopy];
// Get all keys sorted by greatest length
NSArray *keys = [map.allKeys sortedArrayUsingComparator: ^(NSString *key1, NSString *key2) {
return [#(key2.length) compare:#(key1.length)];
}];
for (NSString *key in keys) {
[_input replaceOccurrencesOfString:key
withString:map[key]
options:NSLiteralSearch
range:NSMakeRange(0,_input.length)];
}
return [_input copy];
};
int main(int argc, char *argv[]) {
#autoreleasepool {
NSDictionary *mapping = #{
#"n": #"5",
#"nj": #"g",
#"ng": #"s"
};
NSString *input = #"Inject the syringe now!";
NSLog(#"Output: %#", matchAndReplace(input, mapping));
}
}
Which will produce:
Output: Igect the syrise 5ow!
Note: This is an over-simplified way to achieve what you want (obviously) and maybe requires a few adjustments to cover every edge case, but it's simpler than your version and I hope that will be helpful to you.

How to read input in Objective-C?

I am trying to write some simple code that searches two dictionaries for a string and prints to the console if the string appears in both dictionaries. I want the user to be able to input the string via the console, and then pass the string as a variable into a message. I was wondering how I could go about getting a string from the console and using it as the argument in the following method call.
[x rangeOfString:"the string goes here" options:NSCaseInsensitiveSearch];
I am unsure as to how to get the string from the user. Do I use scanf(), or fgets(), into a char and then convert it into a NSSstring, or simply scan into an NSString itself. I am then wondering how to pass that string as an argument. Please help:
Here is the code I have so far. I know it is not succinct, but I just want to get the job done:
#import <Foundation/Foundation.h>
#include <stdio.h>
#include "stdlib.h"
int main(int argc, const char* argv[]){
#autoreleasepool {
char *name[100];
printf("Please enter the name you wish to search for");
scanf("%s", *name);
NSString *name2 = [NSString stringWithFormat:#"%s" , *name];
NSString *nameString = [NSString stringWithContentsOfFile:#"/usr/share/dict/propernames" encoding:NSUTF8StringEncoding error:NULL];
NSString *dictionary = [NSString stringWithContentsOfFile:#"/usr/share/dict/words" encoding:NSUTF8StringEncoding error:NULL];
NSArray *nameString2 = [nameString componentsSeparatedByString:#"\n"];
NSArray *dictionary2 = [dictionary componentsSeparatedByString:#"\n"];
int nsYES = 0;
int dictYES = 0;
for (NSString *n in nameString2) {
NSRange r = [n rangeOfString:name2 options:NSCaseInsensitiveSearch];
if (r.location != NSNotFound){
nsYES = 1;
}
}
for (NSString *x in dictionary2) {
NSRange l = [x rangeOfString:name2 options:NSCaseInsensitiveSearch];
if (l.location != NSNotFound){
dictYES = 1;
}
}
if (dictYES && nsYES){
NSLog(#"glen appears in both dictionaries");
}
}
}
Thanks.

Safely reading from standard input in an interactive manner in C is kind of involved. The standard functions require a fixed-size buffer, which means either some input will be too long (and corrupt your memory!) or you'll have to read in a loop. And unfortunately, Cocoa doesn't offer us a whole lot of help.
For reading standard input entirely (as in, if you're expecting an input file over standard input), there is NSFileHandle, which makes it pretty succinct. But for interactively reading and writing like you want to do here, you pretty much have to go with the linked answer for reading.
Once you have read some input into a C string, you can easily turn it into an NSString with, for example, +[NSString stringWithUTF8String:].

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

NSDiacriticInsensitiveSearch and arabic search - ios7

Related

In my macOS application, I am working with UserDefaults dictionaryRepresentation. Sometimes I get strings with unknown encoding. Any suggesition?

Create NSArray of keywords starting/ending with % inside a string

How to check if an NSString contains fancy characters?

Replace substrings with characters in Cocoa

How to read input in Objective-C?

Categories

Resources