Collect a word between two spaces in objective c - objective-c

I'm trying to implement stuff similar to spell check, but I need to get the word that is limited by a space. EX: "HI HOW R U", I need to collect HI, HOW and so on as they type. i.e. After user hits HI and space I need to collect HI and do a spell check.

Check the documentation for NSString Here. You want the message componentsSepeparatedByString:.

I don't know objective-C, but I'm fairly sure it'll have a Regexp library - although it'd be straightforward to code it without one.
Regexp: \b([^\s])*\b
\b = word boundary (whitespace, comma, dot, exclamation-mark, etc.)
\s = whitespace character
[...] = character set
[^...] = negated character set (any character(s) EXCEPT ...)
() = grouping construct
* = zero or more times
So the suggested expression would start matching at any word boundary, then match every subsequent character that is not a whitespace character, then match a word boundary.
Your stated case is so simple you may just want to look for spaces (one char at a time) and get the substring, but RegExp is very widely used across a range of languages and platforms, and so it's fairly easy to find an expression when you need to - and one often does for common stuff like checking if zip codes, phone numbers, email addresses and so on are syntactically correct. So it's worth learning in any case. :)

Related

VBA replace certain carriage

All.
I am used to programming VBA in Excel, but am new to the structures in Word.
I am working through a library of text files to update them. Many of them are either OCR documents, or were manually entered.
Each has a recurring pattern, the most common of which is unnecessary carriage returns.
For example, I am looking at several text files where there is a double return after each line. A search and replace of all double carriage returns removes all paragraph distinctions.
However, each line is approximately 30 characters long, and if I manually perform the following logic, it gives me a functional document.
If there is a double carriage return after 30+ characters, I replace them with a space.
If there were less than 30 characters prior to the double return, I replace them with a single return.
Can anyone help me with some rudimentary code that would help me get started on that? I could then modify it for each "pattern" of text documents I have.
e.g.
In this case, there are more than
thirty characters per line. And I
will keep going to illustrate this
example.
This would be a new paragraph, and
would be separated by another of
the single returns.
I want code that would return:
In this case, there are more than thirty character returns. And I will keep going to illustrate this example.
This would be a new paragraph, and would be separated by another of the single returns.
Let me know if anyone can throw something out that I can play with!
You can do this without code (which RegEx requires), simply using Word's own wildcard Find/Replace tools, where:
Find = ([!^13]{30,})[^13]{1,}
Replace = \1^32
and, to clean up the residual multi-paragraph breaks:
Find = [^13]{2,}
Replace = ^p
You could, of course, record the above as a macro...
Here is a RegEx that might work for you:
(\n\n)(?<!\.(\n\n))
The substitution is just a plain space, you can try it out (and modify / tweak it) here: https://regex101.com/r/zG9GPw/4
This 'pattern' tells the RegEx engine to look for the newline character \n which occurs x2 like this \n\n (worth noting this is from your question and might be different in your files, e.g. could be \r\n) and it assumes that a valid line break will be proceeded by a full stop: \..
In RegEx the full stop symbol is a single character wild card so it needs to be escaped with the '\' (n and r are normal characters, escaping them tells the RegEx engine they represent newline and return characters).
So... the expression is looking for a group of x2 newline characters but then uses a negative look-behind to exclude any matches where the previous character was a full stop.
Anyway, it's all explained on the site:
Here is how you could do a RegEx find and replace using NotePad++ (I'm not sure if it comes with RegEx or if a plugin is needed, either way it is easy). But you can set a location, filters (to target specific file types), and other options (such as search in sub-directories).
Other than that, as #MacroPod pointed out you could also do this with MS Word, document by document, not using any code :)

REGEXP_REPLACE explanation

Hi may i know what does the below query means?
REGEXP_REPLACE(number,'[^'' ''-/0-9:-#A-Z''[''-`a-z{-~]', 'xy') ext_number
part 1
In terms of explaining what the function function call is doing:
It is a function call to analyse an input string 'number' with a regex (2nd argument) and replace any parts of the string which match a specific string. As for the name after the parenthesis I am not sure, but the documentation for the function is here
part 2
Sorry to be writing a question within an answer here but I cannot respond in comments yet (not enough rep)
Does this regex work? Unless sql uses different syntax this would appear to be a non-functional regex. There are some red flags, e.g:
The entire regex is wrapped in square parenthesis, indicating a set of characters but seems to predominantly hold an expression
There is a range indicator between a single quote and a character (invalid range: if a dash was required in the match it should be escaped with a '\' (backslash))
One set of square brackets is never closed
After some minor tweaks this regex is valid syntax:
^'' ''\-\/0-9:-#A-Z''[''-a-z{-~]`, but does not match anything I can think of, it is important to know what string is being examined/what the context is for the program in order to identify what the regex might be attempting to do
It seems like it is meant to replaces all ASCII control characters in the column or variable number with xy.
[] encloses a class of characters. Any character in that class matches. [^] negates that, hence all characters match, that are not in the class.
- is a range operator, e.g. a-z means all characters from a to z, like abc...xyz.
It seams like characters enclosed in ' should be escaped (The second ' is to escape the ' in the string itself.) At least this would make some sense. (But for none of the DBMS I found having a regexp_replace() function (Postgres, Oracle, DB2, MariaDB, MySQL), I found something in the docs, that would indicate this escape mechanism. They all use \, but maybe I missed something? Unfortunately you didn't tag which DBMS you're actually using!)
Now if you take an ASCII table you'll see, that the ranges in the expression make up all printable characters (counting space as printable) in groups from space to /, 0 to 9, : to #, etc.. Actually it might have been shorter to express it as '' ''-~, space to ~.
Given the negation, all these don't match. The ones left are from NUL to US and DEL. These match and get replaced by xy one by one.

Objective C - RegEx - Invalid Range when trying to match spaces [duplicate]

How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");

Regular expression to extract a number of steps

I have a localized string that looks something like this in English:
"
5 Mile(s)
5,252 Step(s)
"
My app is localized both in left-to-right and right-to-left languages so I don't want to make assumptions either about the ordering of the step(s) or about the formatting of the number (e.g. 5,252 can be 5.252 depending on user locale). So I need to account for possibilities that can include things like
Step(s) 5.252
as well as what's above.
A few other caveats
All I know is that if the Step(s) line is in there, it will be on its own line (hence in my regex I require \n at each end of the string)
No guarantee that the Mile(s) information will be in the string at all, let alone whether it will be before or after Step(s)
Here's my attempt at pattern extraction:
NSString *patternString = [NSString stringWithFormat:#"\\n(([0-9,\\.]*)\s*%#|%#\s*([0-9,\\.]*))\\n",
NSLocalizedString(#"Step(s)",nil), NSLocalizedString(#"Step(s)",nil)];
There appear to be two problems with this:
XCode is indicating Unknown escape sequence '\s' for the second \s in the pattern string above
No matches are being found even for strings like the following:
0.2 Mile(s)
1,482 Step(s)
Ideally I would extract the 1,482 out of this string in a way that is localization friendly. How should I modify my regex?
as far as the regex, perhaps this approach might work - it simply matches (with named groups) each couplet of numbers in sequence, with the assumption the first is miles and the second is steps. Decimals in the . or , form are optional:
(?<miles>\d+(?:[.,]\d+)?).*?(?<steps>\d+(?:[.,]\d+)?)
(and i think it should be \\s) - i'm not an ios guy, but if you can use a regex literal it would be way more readable.
regular expression demo
First I'd like to ask - Why is Mile(s) mentioned in the question at all?
And now to my two bits - you could simply use a positive look-ahead:
^(?=.*Step\(s\))[^\d]*(\d+(?:[.,]\d+)?)
It makes sure the expected word is present on the line, and then captures the number on it, allowing for localized, optional, decimal separator and decimals. This way it doesn't matter if the numer is before, or after, the "word".
It doesn't take localization of the "word" into account, but that you seem to have handled by yourself ;)
See it here at regex101.
Your regex is close, although in Obj-C you need to double-escape the \s and (s):
^(([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$
In your NSLocalizedString you likely also need to escape the parentheses enclosing (s):
NSString *patternString = [NSString stringWithFormat:#"^(([\\d,.]+)\\s%#|%#\\s([\\d,.]+))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
If you don't escape (s) then the regex engine is probably going to interpret it as a capture group.
Looking at NSLog you can see what the pattern actually reads like:
NSLog(#"patternString: %#", patternString);
Output:
patternString: ^(([\d,.]+)\sStep\(s\)|Step\(s\)\s([\d,.]+))$
Since you mentioned the Mile(s) part may not be in the string at all I'm assuming it isn't relevant to the regular expression. As I understand from the question, you just need to capture the number of steps and nothing else. On this basis, here's a modified version of your existing regex:
NSString *patternString =
[NSString stringWithFormat:#"^(?:([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
Demo:
https://www.regex101.com/r/Q6ff1b/1
This is based on the following tips/modifications:
Use the m (= UREGEX_MULTILINE) flag option when creating the regex to specify that ^ and $ match the start and end of each line. This is more sophisticated than using \n as it will also handle the start and end of the string where this might not be present. See here.
Always use a double backslash (\\) for regex escaping - otherwise NSString will interpret the single backslash to be escaping the next character and convert it before it gets to the regex.
Literal parentheses need to be escaped - e.g. Step\\(s\\) instead of Step(s).
Characters within a character class (i.e. anything within the [] square brackets) don't need to be escaped - so it would be . rather than \\. - the latter.
If you are using (x|y|...) as a choice and don't need it to be a capturing group, use ?: after the first parenthesis to ensure it doesn't get captured - i.e. (?:x|y|...).

How to separate words characters and non word characters?

Unicode have categories of characters. Some are alpha numeric. Some are punctuation.
What about if I want to know whether a word belongs to keyword or not
For example,
A,a,b,c, tend to belong to words. So is Ƈ,Ǝ,ǟ, so are all chinese characters.
Sentences like
Hello World, I "like" (to) eat ƇƎǟ and 款开源 ©
Have keywords:
Hello
World
I
like
to
eat
ƇƎǟ
款
开
源
Here, , (),© are not word characters and hence should just be ignored and use.
© doesn't count as punctuation either. '©'.IsPunctuation returns false in vb.net but I want to get rid of that too.
Now I want to make a program that can split sentences into keywords. For that I need to know which characters are word characters and which one is not.
Is there a vb.net function for that?
Do it the other way round: use IsLetter for your test. Or better yet, use regular expressions to split your string by words:
Dim str = "Hello World, I ""like"" (to) eat ƇƎǟ and 款开源 ©"
Dim wordPattern As New Regex("\p{L}+")
For Each match in wordPattern.Matches(str))
Console.WriteLine(match)
Next
Here, \p{L} matches any word character. However, the above matches “款开源” in a single rather than in separate matches since there is no separator between the characters.
u need to deal with "keycodes"
like if u only want letters [a-z]
then
for(c>='a' && c<='z'){
}
or
for(c>=97 && C<=122){
}