I need to gather data from a website based on the user's input.
searchString is the user inputted value, such as "search this string".
NSString *withoutSpaces = [searchString stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
Here, I need to replace spaces with %20
Next, I need to put the new string without spaces (replaced with %20) into another string.
NSString *unescapedSearchString = [NSString stringWithFormat:
#"website.com/query?=%22%#%22", withoutSpaces];
The site I need is not really "website.com", but that's just an example. I also need the %22 to remain at the beginning and end.
As you can see, I need the %# to format the new withoutSpaces user input into the website URL.
I did a search and found examples but I could not find any with formatting such as in my case using %#.
What's the best way to "escape" the characters and keep my formatted string? Currently, when I try to access data from the website, it comes back as null. However, when I try a string without the %# formatting and an actual value, I successfully retrieve the data from a website.
Any help is greatly appreciated.
You should do things this way:
NSString *searchString = ... // the raw search string with spaces and all
NSString *quoted = [NSString stringWithFormat:#"\"%#\"", searchString];
NSString *escaped = [quoted stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString *urlString = [NSString stringWithFormat:#"website.com?query=%#&value=all", escaped];
BTW - the URL seems a little off. There should be a variable name before the = and after the ?.
I'm looking for a way in the iPhone SDK to read in a Properties file (not the XML flavor) for example this one:
# a comment
! a comment
a = a string
b = a string with escape sequences \t \n \r \\ \" \' \ (space) \u0123
c = a string with a continuation line \
continuation line
d.e.f = another string
would result in four key/value pairs.
I can't change this format as it is sent to me by a web service. Can you please direct me?
Thanks,
Emmanuel
I would take a look at ParseKit http://parsekit.com/. Otherwise you could use RegexKitLite and create some regular expressions.
I've ended with this solution if anybody is interested :
#interface NSDictionary (PropertiesFile)
+ (NSDictionary *)dictionaryWithPropertiesFile:(NSString *)file;
#end
#implementation NSDictionary (PropertiesFile)
+ (NSDictionary *)dictionaryWithPropertiesFile:(NSString *)file {
NSError *error = nil;
NSString *propertyFileContent = [[NSString alloc] initWithContentsOfFile:file encoding:NSUTF8StringEncoding error:&error];
if (error) return nil;
NSArray *properties = [propertyFileContent componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
if (properties.count == 0) return nil;
NSMutableDictionary *result = [[NSMutableDictionary alloc] initWithCapacity:properties.count];
for (NSString *propertySting in properties) {
NSArray *property = [propertySting componentsSeparatedByString:#"="];
if (property.count != 2) continue;
[result setObject:property[1] forKey:property[0]];
}
return result.allKeys.count > 0 ? result : nil;
}
#end
It's a perfectly simple parsing problem. Read a line. Ignore if comment. Check for continuation, and read/append continuation lines as needed. Look for the "=". Make the left side of the "=" (after trimming white space) the key. Either parse the right side yourself or put it into an NSString and use stringWithFormat on it to "reduce" any escapes to pure character form. Return key and reduced right side.
(But refreshing my memory on the properties file format reminds me that:
The key contains all of the characters in the line starting with the
first non-white space character and up to, but not including, the
first unescaped '=', ':', or white space character other than a line
terminator. All of these key termination characters may be included in
the key by escaping them with a preceding backslash character;
So a little scanning of the line is required to separate the key from the rest. Nothing particularly difficult, though.)
Have you considered using lex/yacc or flex/bison to generate your own compiler code from a description of the grammar for properties files? I'm not sure if there are any existing grammars defined for a Java properties file, but it seems like it would be a pretty simple grammar to write.
Here's another SO post that mentions this approach for general purpose parsing
Take a look at this PropertyParser
NSString *text = #"sample key = sample value";
PropertyParser *propertyParser = [[PropertyParser alloc] init];
NSMutableDictionary *keyValueMap = [propertyParser parse:text];
You can now use NSRegularExpression class to do this.
I have a string in my cocoa GUI that needs to have special formatting (fonts, colors, etc.). Naturally, I'm using an attributed string. For convenience, I Init the string as an RTF:
NSString *inputString = #"This string has special characters";
NSString *rtfString = [NSString stringWithFormat:#"{#"***LENGTHY RTF FORMATTING STRING *** %#", inputString];
NSAttributedString *testString = [[NSAttributedString alloc] initWithRTF:[rtfString dataUsingEncoding:NSUTF8StringEncoding] documentAttributes:nil];
The problem is, the "inputString" might have special characters, which are not displayed properly due to the UTF8Encoding. They're replaced with other symbols.
é is left as Å©.
So, right now I'm doing this:
NSData* intermediateDataString=[inputString dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
inputString = [[[NSString alloc] initWithData:intermediateDataString encoding:NSUTF8StringEncoding] autorelease];
This does not display the unexpected characters, but it does remove all accents and leaves in their stead the unaccented letter - é is left as e.
This is an improvement since everything can be read, but it is far from ideal.
Thoughts?
I would do something like this. First, create a dummy attributed string:
NSString *dummyRTFString = #"***LENGTHY RTF FORMATTING STRING *** A";
NSAttributedString *dummyAS = [[NSAttributedString alloc]
initWithRTF:[rtfString dataUsingEncoding:NSUTF8StringEncoding]
documentAttributes:nil];
and obtain the attributes:
NSDictionary*attributes=[dummyAS attributesAtIndex:0 effectiveRange:NULL];
[dummyAS release];
Now I will use this attribute to create another attributed string:
NSAttributedString* as=[[NSAttributedString alloc] initWithString:inputString attributes:attributes];
Another approach is to use HTML instead of RTF; then you can include non-ascii characters as unicode in it.
In your first line of code, I assume that's really #"This string has special characters" otherwise you'd get a compile error. And it looks like your second line has an extra #".
If you know you're using UTF-8, why say NSASCIIStringEncoding?
Really, you should put the RTF including the string with special characters in a resource, not embedded in your code.
I'm trying to compare names without any punctuation, spaces, accents etc.
At the moment I am doing the following:
-(NSString*) prepareString:(NSString*)a {
//remove any accents and punctuation;
a=[[[NSString alloc] initWithData:[a dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];
a=[a stringByReplacingOccurrencesOfString:#" " withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"'" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"`" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"-" withString:#""];
a=[a stringByReplacingOccurrencesOfString:#"_" withString:#""];
a=[a lowercaseString];
return a;
}
However, I need to do this for hundreds of strings and I need to make this more efficient. Any ideas?
NSString* finish = [[start componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:#""];
Before using any of these solutions, don't forget to use decomposedStringWithCanonicalMapping to decompose any accented letters. This will turn, for example, é (U+00E9) into e ́ (U+0065 U+0301). Then, when you strip out the non-alphanumeric characters, the unaccented letters will remain.
The reason why this is important is that you probably don't want, say, “dän” and “dün”* to be treated as the same. If you stripped out all accented letters, as some of these solutions may do, you'll end up with “dn”, so those strings will compare as equal.
So, you should decompose them first, so that you can strip the accents and leave the letters.
*Example from German. Thanks to Joris Weimar for providing it.
On a similar question, Ole Begemann suggests using stringByFoldingWithOptions: and I believe this is the best solution here:
NSString *accentedString = #"ÁlgeBra";
NSString *unaccentedString = [accentedString stringByFoldingWithOptions:NSDiacriticInsensitiveSearch locale:[NSLocale currentLocale]];
Depending on the nature of the strings you want to convert, you might want to set a fixed locale (e.g. English) instead of using the user's current locale. That way, you can be sure to get the same results on every machine.
One important precision over the answer of BillyTheKid18756 (that was corrected by Luiz but it was not obvious in the explanation of the code):
DO NOT USE stringWithCString as a second step to remove accents, it can add unwanted characters at the end of your string as the NSData is not NULL-terminated (as stringWithCString expects it).
Or use it and add an additional NULL byte to your NSData, like Luiz did in his code.
I think a simpler answer is to replace:
NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
By:
NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
If I take back the code of BillyTheKid18756, here is the complete correct code:
// The input text
NSString *text = #"BûvérÈ!#$&%^&(*^(_()-*/48";
// Defining what characters to accept
NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
[acceptedCharacters addCharactersInString:#" _-.!"];
// Turn accented letters into normal letters (optional)
NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
// Corrected back-conversion from NSData to NSString
NSString *sanitizedText = [[[NSString alloc] initWithData:sanitizedData encoding:NSASCIIStringEncoding] autorelease];
// Removing unaccepted characters
NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:#""];
If you are trying to compare strings, use one of these methods. Don't try to change data.
- (NSComparisonResult)localizedCompare:(NSString *)aString
- (NSComparisonResult)localizedCaseInsensitiveCompare:(NSString *)aString
- (NSComparisonResult)compare:(NSString *)aString options:(NSStringCompareOptions)mask range:(NSRange)range locale:(id)locale
You NEED to consider user locale to do things write with strings, particularly things like names.
In most languages, characters like ä and å are not the same other than they look similar. They are inherently distinct characters with meaning distinct from others, but the actual rules and semantics are distinct to each locale.
The correct way to compare and sort strings is by considering the user's locale. Anything else is naive, wrong and very 1990's. Stop doing it.
If you are trying to pass data to a system that cannot support non-ASCII, well, this is just a wrong thing to do. Pass it as data blobs.
https://developer.apple.com/library/ios/documentation/cocoa/Conceptual/Strings/Articles/SearchingStrings.html
Plus normalizing your strings first (see Peter Hosey's post) precomposing or decomposing, basically pick a normalized form.
- (NSString *)decomposedStringWithCanonicalMapping
- (NSString *)decomposedStringWithCompatibilityMapping
- (NSString *)precomposedStringWithCanonicalMapping
- (NSString *)precomposedStringWithCompatibilityMapping
No, it's not nearly as simple and easy as we tend to think.
Yes, it requires informed and careful decision making. (and a bit of non-English language experience helps)
Consider using the RegexKit framework. You could do something like:
NSString *searchString = #"This is neat.";
NSString *regexString = #"[\W]";
NSString *replaceWithString = #"";
NSString *replacedString = [searchString stringByReplacingOccurrencesOfRegex:regexString withString:replaceWithString];
NSLog (#"%#", replacedString);
//... Thisisneat
Consider using NSScanner, and specifically the methods -setCharactersToBeSkipped: (which accepts an NSCharacterSet) and -scanString:intoString: (which accepts a string and returns the scanned string by reference).
You may also want to couple this with -[NSString localizedCompare:], or perhaps -[NSString compare:options:] with the NSDiacriticInsensitiveSearch option. That could simplify having to remove/replace accents, so you can focus on removing puncuation, whitespace, etc.
If you must use an approach like you presented in your question, at least use an NSMutableString and replaceOccurrencesOfString:withString:options:range: — that will be much more efficient than creating tons of nearly-identical autoreleased strings. It could be that just reducing the number of allocations will boost performance "enough" for the time being.
To give a complete example by combining the answers from Luiz and Peter, adding a few lines, you get the code below.
The code does the following:
Creates a set of accepted characters
Turn accented letters into normal letters
Remove characters not in the set
Objective-C
// The input text
NSString *text = #"BûvérÈ!#$&%^&(*^(_()-*/48";
// Create set of accepted characters
NSMutableCharacterSet *acceptedCharacters = [[NSMutableCharacterSet alloc] init];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet letterCharacterSet]];
[acceptedCharacters formUnionWithCharacterSet:[NSCharacterSet decimalDigitCharacterSet]];
[acceptedCharacters addCharactersInString:#" _-.!"];
// Turn accented letters into normal letters (optional)
NSData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *sanitizedText = [NSString stringWithCString:[sanitizedData bytes] encoding:NSASCIIStringEncoding];
// Remove characters not in the set
NSString* output = [[sanitizedText componentsSeparatedByCharactersInSet:[acceptedCharacters invertedSet]] componentsJoinedByString:#""];
Swift (2.2) example
let text = "BûvérÈ!#$&%^&(*^(_()-*/48"
// Create set of accepted characters
let acceptedCharacters = NSMutableCharacterSet()
acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.letterCharacterSet())
acceptedCharacters.formUnionWithCharacterSet(NSCharacterSet.decimalDigitCharacterSet())
acceptedCharacters.addCharactersInString(" _-.!")
// Turn accented letters into normal letters (optional)
let sanitizedData = text.dataUsingEncoding(NSASCIIStringEncoding, allowLossyConversion: true)
let sanitizedText = String(data: sanitizedData!, encoding: NSASCIIStringEncoding)
// Remove characters not in the set
let components = sanitizedText!.componentsSeparatedByCharactersInSet(acceptedCharacters.invertedSet)
let output = components.joinWithSeparator("")
Output
The output for both examples would be: BuverE!_-48
Just bumped into this, maybe its too late, but here is what worked for me:
// text is the input string, and this just removes accents from the letters
// lossy encoding turns accented letters into normal letters
NSMutableData *sanitizedData = [text dataUsingEncoding:NSASCIIStringEncoding
allowLossyConversion:YES];
// increase length by 1 adds a 0 byte (increaseLengthBy
// guarantees to fill the new space with 0s), effectively turning
// sanitizedData into a c-string
[sanitizedData increaseLengthBy:1];
// now we just create a string with the c-string in sanitizedData
NSString *final = [NSString stringWithCString:[sanitizedData bytes]];
#interface NSString (Filtering)
- (NSString*)stringByFilteringCharacters:(NSCharacterSet*)charSet;
#end
#implementation NSString (Filtering)
- (NSString*)stringByFilteringCharacters:(NSCharacterSet*)charSet {
NSMutableString * mutString = [NSMutableString stringWithCapacity:[self length]];
for (int i = 0; i < [self length]; i++){
char c = [self characterAtIndex:i];
if(![charSet characterIsMember:c]) [mutString appendFormat:#"%c", c];
}
return [NSString stringWithString:mutString];
}
#end
These answers didn't work as expected for me. Specifically, decomposedStringWithCanonicalMapping didn't strip accents/umlauts as I'd expected.
Here's a variation on what I used that answers the brief:
// replace accents, umlauts etc with equivalent letter i.e 'é' becomes 'e'.
// Always use en_GB (or a locale without the characters you wish to strip) as locale, no matter which language we're taking as input
NSString *processedString = [string stringByFoldingWithOptions: NSDiacriticInsensitiveSearch locale: [NSLocale localeWithLocaleIdentifier: #"en_GB"]];
// remove non-letters
processedString = [[processedString componentsSeparatedByCharactersInSet:[[NSCharacterSet letterCharacterSet] invertedSet]] componentsJoinedByString:#""];
// trim whitespace
processedString = [processedString stringByTrimmingCharactersInSet: [NSCharacterSet whitespaceCharacterSet]];
return processedString;
Peter's Solution in Swift:
let newString = oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
Example:
let oldString = "Jo_ - h !. nn y"
// "Jo_ - h !. nn y"
oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet)
// ["Jo", "h", "nn", "y"]
oldString.componentsSeparatedByCharactersInSet(NSCharacterSet.letterCharacterSet().invertedSet).joinWithSeparator("")
// "Johnny"
I wanted to filter out everything except letters and numbers, so I adapted Lorean's implementation of a Category on NSString to work a little different. In this example, you specify a string with only the characters you want to keep, and everything else is filtered out:
#interface NSString (PraxCategories)
+ (NSString *)lettersAndNumbers;
- (NSString*)stringByKeepingOnlyLettersAndNumbers;
- (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string;
#end
#implementation NSString (PraxCategories)
+ (NSString *)lettersAndNumbers { return #"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; }
- (NSString*)stringByKeepingOnlyLettersAndNumbers {
return [self stringByKeepingOnlyCharactersInString:[NSString lettersAndNumbers]];
}
- (NSString*)stringByKeepingOnlyCharactersInString:(NSString *)string {
NSCharacterSet *characterSet = [NSCharacterSet characterSetWithCharactersInString:string];
NSMutableString * mutableString = #"".mutableCopy;
for (int i = 0; i < [self length]; i++){
char character = [self characterAtIndex:i];
if([characterSet characterIsMember:character]) [mutableString appendFormat:#"%c", character];
}
return mutableString.copy;
}
#end
Once you've made your Categories, using them is trivial, and you can use them on any NSString:
NSString *string = someStringValueThatYouWantToFilter;
string = [string stringByKeepingOnlyLettersAndNumbers];
Or, for example, if you wanted to get rid of everything except vowels:
string = [string stringByKeepingOnlyCharactersInString:#"aeiouAEIOU"];
If you're still learning Objective-C and aren't using Categories, I encourage you to try them out. They're the best place to put things like this because it gives more functionality to all objects of the class you Categorize.
Categories simplify and encapsulate the code you're adding, making it easy to reuse on all of your projects. It's a great feature of Objective-C!
I have a fairly simple question concerning NSString however it doesn't seem to do what I want.
this is what i have
NSString *title = [NSString stringWithformat: character.name, #"is the character"];
This is a line in my parser takes the charactername and inserts in into a plist , however it doesn't insert the #"is the character" is there something I'm doing wrong?
Your code is wrong. It should be :
NSString *title
= [NSString stringWithformat:#"%# is the character", character.name];
assuming that character.name is another NSString.
Read the Formatting String Objects paragraph of the String Programming Guide for Cocoa to learn everything about formatting strings.
stringWithFormat takes a format string as the first argument so, assuming character.name is the name of your character, you need:
NSString *title = [NSString stringWithformat: #"%s is the character",
character.name];
What you have is the character name as the format string so, if it's #"Bob" then Bob is what you'll get. If it was "#Bob %s", that would work but would probably stuff up somewhere else that you display just the character name :-)
Note that you should use "%s" for a C string, I think "%#" is the correct format specifier if character.name is an NSString itself.