Regular expression for separating words by uppercase letters and numbers - objective-c

I was wondering if anyone might know what the regular expression would be to turn this:
West4thStreet
into this:
West 4th Street
I'm going to add the spaces to the string in Objective-C.
Thanks!

I don't know exactly where you want to put in spaces, but try something like [a-z.-][^a-z .-] and then put a space between the two characters in each match.

Something like this perl regex substitution would put a space before each group of capital letters or numbers. (You'd want to trim space before the string in this case also.) I assume you don't want it to break up eg: 45thStreet to 4 5th Street
Letters I'm less certain of.
s/([A-Z]+|[0-9]+)/ \1/g
I created a pattern to not match the beginning of the line for my personal amusement:
s/([^\^])([A-Z]+|[0-9]+)/\1 \2/g

This should work, if all your strings truly match the format of your example:
([A-Z][a-z]+)(\d+[a-z]+)([A-Z][a-z]+)
You can then separate the groups with spaces.

Another option would be to not use RegExKit and use code to loop through each character in the string and insert a space after each capital letter or after first decimal..
NSMutableString *myText2 = [[NSMutableString alloc] initWithString:#"The1stTest"];
bool isNumber=false;
for(int x=myText2.length-1;x>1;x--)
{
bool isUpperCase = [[NSCharacterSet uppercaseLetterCharacterSet] characterIsMember:[myText2 characterAtIndex:x]];
bool isLowerCase = [[NSCharacterSet lowercaseLetterCharacterSet] characterIsMember:[myText2 characterAtIndex:x]];
if([[NSCharacterSet decimalDigitCharacterSet] characterIsMember:[myText2 characterAtIndex:x]])
isNumber = true;
if((isUpperCase || isLowerCase) && isNumber)
{
[myText2 insertString:#" " atIndex:x+1];
isNumber=false;
}
if(isUpperCase)
[myText2 insertString:#" " atIndex:x];
}
NSLog(#"%#",myText2); // Output: "The 1st Test"

Related

additional logic to this exercise missing

Writing a basic program to count the number of words in a string. I've changed my original code to account for multiple spaces between words. By setting one variable to the current index and one variable to the previous index and comparing them, I can say "if this current index is a space, but the previous index contains something other than a space (basically saying a character), then increase the word count".
int main(int argc, const char * argv[]) {
#autoreleasepool {
//establishing the string that we'll be parsing through.
NSString * paragraph = #"This is a test paragraph and we will be testing out a string counter.";
//we're setting our counter that tracks the # of words to 0
int wordCount = 0;
/*by setting current to a blank space ABOVE the for loop, when the if statement first runs, it's comparing [paragraph characterAtIndex:i to a blank space. Once the loop runs through for the first time, the next value that current will have is characterAtIndex:0, while the if statement in the FOR loop will hold a value of characterAtIndex:1*/
char current = ' ';
for (int i=0; i< paragraph.length; i++) {
if ([paragraph characterAtIndex:i] == ' ' && (current != ' ')) {
wordCount++;
}
current = [paragraph characterAtIndex:i];
//after one iteration, current will be T and it will be comparing it to paragraph[1] which is h.
}
wordCount ++;
NSLog(#"%i", wordCount);
}
return 0;
}
I tried adding "or" statements to account for delimiters such as ";" "," and "." instead of just looking at a space. It didn't work...any idea what I can do, logically speaking, to account for anything that isn't a letter (but preferably just limiting it to these four delimiters - . , ; and space.
A standard way to solve these types of problems is to build a finite state machine, your code isn't quite one but its close.
Instead of thinking about comparing the previous and current characters think in terms of states - you can start with just two, in a word and not in a word.
Now for each state you consider what the current character implies in terms of actions and changes to the state. For example, if the state is not in a word and the current character is a letter then the action is increment word count and the next state is in a word.
In (Objective-)C you can build a simple finite state machine using an enum to give the states names and a case statement inside a loop. In pseudo-code this is something like:
typedef enum { NotInWord, InWord } State;
State currentState = NotInWord;
NSUInteger wordCount = 0;
for currentChar in sourceString
case currentState of
NotInWord:
if currentChar is word start character -- e.g. a letter
then
increment wordCount;
currentState = InWord;
InWord:
if currentChar is not a word character -- e.g. a letter
then
currentState = NotInWord;
end case
end for
The above is just a step from your original algorithm - recasting it in terms of states rather than the previous character.
Now if you want to get smarter you can add more states. For example how many words are there in "Karan's question"? Two. So you might want to allow a single apostrophe in a word. To handle that you can add a state AfterApostrophe whose logic is the same as the current InWord; and modify InWord logic to include if the current character is an apostrophe the next state is AfterApostrophe - that would allow one apostrophe in a word (or its end, which is also valid). Next you might want to consider hyphenated words, etc...
To test if a character is a particular type you have two easy choices:
If this is just an exercise and you are happy to stick with the ASCII range of characters there are functions such as isdigit(), isletter() etc.
If you want to handle full Unicode you can use the NSCharacterSet type with its pre-defined sets for letters, digits, etc.
See the documentation for both of the above choices.
HTH
I don't understand, You should be able to add or statements....
int main(void) {
char paragraph[] = "This is a test paragraph,EXTRAWORDHERE and we will be testing out a string.";
char current = ' ';
int i;
int wordCount = 0;
for (i = 0; i < sizeof(paragraph); i++){
if ((paragraph[i] == 32 || paragraph[i] == 44) && !(current == 32 || current == 44)){ //32 = ascii for space, 44 for comma
wordCount++;
}
current = paragraph[i];
}
wordCount++;
printf("%d\n",wordCount);
return 0;
}
I suppose it would be better to change the comparison of current from a not equal to into an equal to. Hopefully that helps.

java.text.DecimalFormat equivalent in Objective C

In java, I have
String snumber = null;
String mask = "000000000000";
DecimalFormat df = new DecimalFormat(mask);
snumber = df.format(number); //'number' is of type 'long' passed to a function
//which has this code in it
I am not aware of the DecimalFormat operations in java and so finding it hard to write an equivalent Obj C code.
How can I achieve this? Any help would be appreciated.
For that particular case you can use some C-style magic inside Objective-C:
long number = 123;
int desiredLength = 10;
NSString *format = [NSString stringWithFormat:#"%%0%dd", desiredLength];
NSString *snumber = [NSString stringWithFormat:format, number];
Result is 0000000123.
Format here will be %010d.
10d means that you'll have 10 spots for number aligned to right.0 at the beginning causes that all "empty" spots will be filled with 0.
If number is shorter than desiredLength, it is formatted just as it is (without leading zeros).
Of course, above code is valid only when you want to have numbers with specified length with gaps filled by zeros.
For other scenarios you could e.g. write own custom class which would use appropriate printf/NSLog formats to produce strings formatted as you wish.
In Objective-C, instead of using DecimalFormat "masks", you have to live with string formats.

Need regular expression that will work to find numeric and alpha characters in a string

Here's what I'm trying to do. A user can type in a search string, which can include '*' or '?' wildcard characters. I'm finding this works with regular strings but not with ones including numeric characters.
e.g:
414D512052524D2E535441524B2E4E45298B8751202AE908
1208
if I look for a section of that hex string, it returns false. If I look for "120" or "208" in the "1208" string it fails.
Right now, my regular expression pattern ends up looking like this when a user enters, say "w?f": '\bw.?f\b'
I'm (obviously) not well-versed in regular expressions at the moment, but would appreciate any pointers someone may have to handle numeric characters in the way I need to - thanks!
Code in question:
/**
*
* #param searchString
* #param strToBeSearched
* #return
*/
public boolean findString(String searchString, String strToBeSearched) {
Pattern pattern = Pattern.compile(wildcardToRegex(searchString));
return pattern.matcher(strToBeSearched).find();
}
private String wildcardToRegex(String wildcard){
StringBuffer s = new StringBuffer(wildcard.length());
s.append("\\b");
for (int i = 0, is = wildcard.length(); i < is; i++) {
char c = wildcard.charAt(i);
switch(c) {
case '*':
s.append(".*");
break;
case '?':
s.append(".?");
break;
default:
s.append(c);
break;
}
}
s.append("\\b");
return(s.toString());
}
Let's assume your string to search in is
1208
The search "term" the user enters is
120
The pattern then is
\b120\b
The \b (word boundary) meta-character matches beginning and end of "words".
In our example, this can't work because 120 != 1208
The pattern has to be
\b.*120.*\b
where .* means match a variable number of characters (including null).
Solution:
either add the .*s to your wildcardToRegex(...) method to make this functionality work out-of-the-box,
or tell your users to search for *120*, because your * wildcard character does exactly the same.
This is, in fact, my preference because the user can then define whether to search for entries starting with something (search for something*), including something (*something*), ending with something (*something), or exactly something (something).

Regular Expression for validate price in decimal

I really unable to find any workaround for regular expression to input price in decimal.
This what I want:-
12345
12345.1
12345.12
12345.123
.123
0.123
I also want to restrict digits.
I really created one but not validating as assumed
^([0-9]{1,5}|([0-9]{1,5}\.([0-9]{1,3})))$
Also want to know how is above expression different from the one
^([0-9]{1,5}|([0-9].([0-9]{1,3})))$ thats working fine.
Anyone with good explanation.
"I am using NSRegularExpression - Objective C" if this helps to answer more precisely
- (IBAction)btnTapped {
NSRegularExpression * regex = [NSRegularExpression regularExpressionWithPattern:
#"^\\d{1,5}([.]\\d{1,3})?|[.]\\d{1,3}$" options:NSRegularExpressionCaseInsensitive error:&error];
if ([regex numberOfMatchesInString:txtInput.text options:0 range:NSMakeRange(0, [txtInput.text length])])
NSLog(#"Matched : %#",txtInput.text);
else
NSLog(#"Not Matched : %#",txtInput.text);
}
"I am doing it in a buttonTap method".
This simple one should suit your needs:
\d*[.]?\d+
"Digits (\d+) that can be preceded by a dot ([.]?), which can itself be preceded by digits (\d*)."
Since you're talking about prices, neither scientific notation nor negative numbers are necessary.
Just as a point of interest, here's the one I usually used, scientific notation and negative numbers included:
[-+]?\d*[.]?\d+(?:[eE][-+]?\d+)?
For the new requirements (cf. comments), you can't specify how many digits you want on the first regex I gave, since it's not the way it has been built.
This one should suit your needs better:
\d{1,5}([.]\d{1,3})?|[.]\d{1,3}
"Max 5 digits (\d{1,5}) possibly followed ((...)?) by a dot itself followed by max 3 digits ([.]\d{1,3}), or (|) simply a dot followed by max 3 digits ([.]\d{1,3})".
Let's do this per-partes:
Sign in the beginning: [+-]?
Fraction number: \.\d+
Possible combinations (after sign):
Number: \d+
Fraction without zero \.\d+
And number with fraction: \d+\.\d+
So to join it all together <sign>(number|fraction without zero|number with fraction):
^[+-]?(\d+|\.\d+|\d+\.\d+)$
If you're not restricting the lengths to 5 digits before the decimal and 3 digits after then you could use this:
^[+-]?(?:[0-9]*\.[0-9]|[0-9]+)$
If you are restricting it to 5 before and 3 after max then you'd need something like this:
^[+-]?(?:[0-9]{0,5}\.[0-9]{1,3}|[0-9]{1,5})$
As far as the difference between your regexes goes, the first one limits the length of the number of digits before the decimal marker to 1-5 with and without decimals present. The second one only allows a single digit in front of the decimal pointer and 1-5 digits if there is no decimal.
How about this: ^([+-])?(\d+)?([.,])?(\d+)?$
string input = "bla";
if (!string.IsNullOrWhiteSpace(input))
{
string pattern = #"^(\s+)?([-])?(\s+)?(\d+)?([,.])?(\d+)(\s+)?$";
input = input.Replace("\'", ""); // Remove thousand's separator
System.Text.RegularExpressions.Regex.IsMatch(input, pattern);
// if server culture = de then reverse the below replace
input = input.Replace(',', '.');
}
Edit:
Oh oh - just realized that's where we run into a little bit of a problem if an en-us user uses ',' as thousand's separator....
So here a better one:
string input = "+123,456";
if (!string.IsNullOrWhiteSpace(input))
{
string pattern = #"^(\s+)?([+-])?(\s+)?(\d+)?([.,])?(\d+)(\s+)?$";
input = input.Replace(',', '.'); // Ensure no en-us thousand's separator
input = input.Replace("\'", ""); // Remove thousand's separator
input = System.Text.RegularExpressions.Regex.Replace(input, #"\s", ""); // Remove whitespaces
bool foo = System.Text.RegularExpressions.Regex.IsMatch(input, pattern);
if (foo)
{
bool de = false;
if (de) // if server-culture = de
input = input.Replace('.', ',');
double d = 0;
bool bar = double.TryParse(input, out d);
System.Diagnostics.Debug.Assert(foo == bar);
Console.WriteLine(foo);
Console.WriteLine(input);
}
else
throw new ArgumentException("input");
}
else
throw new NullReferenceException("input");
Edit2:
Instead of going through the hassle of getting the server culture, just use the tryparse overload with the culture and don't resubstitute the decimal separator.
double.TryParse(input
, System.Globalization.NumberStyles.Any
, new System.Globalization.CultureInfo("en-US")
, out d
);

Converting uppercase string to title case in Objective-C

I created the following method which starts by using the built-in convertStringToTitleCase method on NSString but it really just capitalizes the first letter of each word. I see in .NET there is a method for TextInfo.ToTitleCase which attempts what I'd like to do with Objective-C but also falls short.
http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx
The method I wrote to start is below. How would you handle properly casing an uppercase string? Would a database of words to convert to all uppercase/lowercase help?
- (NSString *)convertStringToTitleCase:(NSString *)str {
NSMutableString *convertedStr = [NSMutableString stringWithString:[str capitalizedString]];
NSRange range = NSMakeRange(0, convertedStr.length);
// a list of words to always make lowercase could be placed here
[convertedStr replaceOccurrencesOfString:#" De "
withString:#" de "
options:NSLiteralSearch
range:range];
// a list of words to always make uppercase could be placed here
[convertedStr replaceOccurrencesOfString:#" Tv "
withString:#" TV "
options:NSLiteralSearch
range:range];
return convertedStr;
}
As noted in comments, the .NET method you refer to doesn't do "proper" title case (that is, follow a list of exception words to be left in either all-caps or all-lowercase), so -[NSString capitalizedString] is as equivalent as you'll get. If you want exception words, you'll have to write your own method (or find someone else who did, as a google search for NSString "title case" might).
How "proper" your title casing gets depends on how many exception words you're willing to throw at it. How much of the English language do you want it to support? What about other languages? It'll also depend on how far you go in analyzing word boundaries -- you might want "TV" to stay all-caps regardless of whether it's in quotes, at the end of a sentence, etc., but you probably also don't want "you've" to come out "You'Ve".
If you want to process exception words, your plan of repeatedly running replaceOccurrencesOfString... will get slower the more exception words you have. (Also, using spaces in your search/replace strings means you aren't considering other word boundaries you might want to.)
It might be useful to consider NSRegularExpression, since regular expressions already have pretty robust notions of case and word boundaries. If that doesn't work well for you, using a scanner to read through the input string while producing a transformed output string would be more efficient than running multiple search/replace operations.
A nice one-liner(not a general solution, probably very inefficient on huge strings):
[[str lowercaseString] capitalizedString];
extension String {
/**
Get the title case string.
*/
var titleCase: String {
get {
return getTitleCaseString()
}
}
// MARK: Private methods.
/**
Get title case string.
- returns: The title case string regarding the lowercase words.
*/
private func getTitleCaseString() -> String {
struct Holder {
static let lowercaseWords = ["a", "an", "and", "at", "but", "by", "else", "for",
"from", "if", "in", "into", "is", "nor", "of", "off",
"on", "or", "out", "the", "to", "via", "vs", "with"]
}
return replaceToLowercaseAllOccurrencesOfWords(Holder.lowercaseWords).capitalizeFirst
}
/**
Replace to lowercase all occurrences of lowercase words.
- parameter lowercaseWords: The lowercase words to replace.
- returns: String with all occurrences replace to the lowercase words.
*/
private func replaceToLowercaseAllOccurrencesOfWords(lowercaseWords: [String]) -> String {
let capitalizedSelf = NSMutableString(string: self.capitalizedString)
for word in lowercaseWords {
if let lowercaseWordRegex = try? NSRegularExpression(pattern: "\\b\(word)\\b", options: .CaseInsensitive) {
lowercaseWordRegex.replaceMatchesInString(capitalizedSelf,
options: NSMatchingOptions(),
range: NSMakeRange(0, capitalizedSelf.length),
withTemplate: word)
}
}
return capitalizedSelf as String
}
/**
Capitalize first char.
*/
private var capitalizeFirst: String {
if isEmpty { return "" }
var result = self
result.replaceRange(startIndex...startIndex, with: String(self[startIndex]).uppercaseString)
return result
}
}