Match words but could possibly contain spaces within word - sql

Is there a way to match words in regexp (or SQL) with spaces so, for example,
This would match to
T h i s
T hi s
Th is

You can use \s* after each letter, meanings that 0 or more white spaces. but you can use a simple solution using replace()...
WordThis.replace(' ','').equals("this")

You can try an expression like this (t\s*h\s*i\s*s\s*).
You'll need to ensure your settings are case insensitive.

I suppose it is only way to add \s* manually between every characters in word.

Related

sql looking for pattern

I have a string similar to this 'MSH|^~\&|STF_ALL_LAB_IN_C...
I'm trying to find some sql that will bring back all messages that contain
MSH|^~\&|(any 3 characters)_(anything after the underscore).
Tried something like this
WHERE TransText LIKE 'MSH|^~\&|%_%_%_'
But that doesn't seem to require the underscore.
Any suggestions?
MSH|^~&|(any 3 characters)_(anything after the underscore).
The pattern would be:
where TransText like 'MSH|^~\&|___\_%'
In some databases, the backslash would need to be escaped, so that would be:
where TransText like 'MSH|^~\\&|___\_%'
_ is a special character in a LIKE clause. It matches any one character, where % matches any series of 0 or more characters.
You need to escape it, using \_.

SQL Special Characters on Tables

we are using following SQL to identify several rows which contain only SPECIAL CHARACTERs 'F#C(!!'. But seems it's not capturing some SPECIAL CHARACTER like 'F#C(!!'. Script should capture only special characters. Could you please share the optimized script to use for this scenario case
when regexp_like(column_name '^[^a-zA-Z]*$') then 'number'
when regexp_like(column_name, '^[^g-zG-Z]*$') then 'hex'
else 'string'
end
Try removing the ^ and $ anchors from the regex pattern you are using with regexp_like:
select *
from your_table
where regexp_like(column_name, '[^a-zA-Z]');
The above logic would check for the presence of one or more characters which are not letters. If you instead want to check for non alphanumeric, then use [^A-Za-z0-9].

REGEXP_REPLACE explanation

Hi may i know what does the below query means?
REGEXP_REPLACE(number,'[^'' ''-/0-9:-#A-Z''[''-`a-z{-~]', 'xy') ext_number
part 1
In terms of explaining what the function function call is doing:
It is a function call to analyse an input string 'number' with a regex (2nd argument) and replace any parts of the string which match a specific string. As for the name after the parenthesis I am not sure, but the documentation for the function is here
part 2
Sorry to be writing a question within an answer here but I cannot respond in comments yet (not enough rep)
Does this regex work? Unless sql uses different syntax this would appear to be a non-functional regex. There are some red flags, e.g:
The entire regex is wrapped in square parenthesis, indicating a set of characters but seems to predominantly hold an expression
There is a range indicator between a single quote and a character (invalid range: if a dash was required in the match it should be escaped with a '\' (backslash))
One set of square brackets is never closed
After some minor tweaks this regex is valid syntax:
^'' ''\-\/0-9:-#A-Z''[''-a-z{-~]`, but does not match anything I can think of, it is important to know what string is being examined/what the context is for the program in order to identify what the regex might be attempting to do
It seems like it is meant to replaces all ASCII control characters in the column or variable number with xy.
[] encloses a class of characters. Any character in that class matches. [^] negates that, hence all characters match, that are not in the class.
- is a range operator, e.g. a-z means all characters from a to z, like abc...xyz.
It seams like characters enclosed in ' should be escaped (The second ' is to escape the ' in the string itself.) At least this would make some sense. (But for none of the DBMS I found having a regexp_replace() function (Postgres, Oracle, DB2, MariaDB, MySQL), I found something in the docs, that would indicate this escape mechanism. They all use \, but maybe I missed something? Unfortunately you didn't tag which DBMS you're actually using!)
Now if you take an ASCII table you'll see, that the ranges in the expression make up all printable characters (counting space as printable) in groups from space to /, 0 to 9, : to #, etc.. Actually it might have been shorter to express it as '' ''-~, space to ~.
Given the negation, all these don't match. The ones left are from NUL to US and DEL. These match and get replaced by xy one by one.

Collect a word between two spaces in objective c

I'm trying to implement stuff similar to spell check, but I need to get the word that is limited by a space. EX: "HI HOW R U", I need to collect HI, HOW and so on as they type. i.e. After user hits HI and space I need to collect HI and do a spell check.
Check the documentation for NSString Here. You want the message componentsSepeparatedByString:.
I don't know objective-C, but I'm fairly sure it'll have a Regexp library - although it'd be straightforward to code it without one.
Regexp: \b([^\s])*\b
\b = word boundary (whitespace, comma, dot, exclamation-mark, etc.)
\s = whitespace character
[...] = character set
[^...] = negated character set (any character(s) EXCEPT ...)
() = grouping construct
* = zero or more times
So the suggested expression would start matching at any word boundary, then match every subsequent character that is not a whitespace character, then match a word boundary.
Your stated case is so simple you may just want to look for spaces (one char at a time) and get the substring, but RegExp is very widely used across a range of languages and platforms, and so it's fairly easy to find an expression when you need to - and one often does for common stuff like checking if zip codes, phone numbers, email addresses and so on are syntactically correct. So it's worth learning in any case. :)

How to remove strings contained in a list in VB.NET?

How can I find words like and, or, to, a, no, with, for etc. in a sentence using VB.NET and remove them. Also where can I find all words list like above.
Note that unless you use Regex word boundaries you risk falling afoul of the Scunthorpe (Sfannythorpe) problem.
string pattern = #"\band\b";
Regex re = new Regex(pattern);
string input = "a band loves and its fans";
string output = re.Replace(input, ""); // a band loves its fans
Notice the 'and' in 'band' is untouched.
You can indeed replace your list of words using the .Replace function (as colithium described) ...
myString.Replace("and", "")
Edit:
... but indeed, a nicer way is to use Regular Expressions (as edg suggested) to avoid replacing parts of words.
As your question suggests that you would like to clean-up a sentence to keep meaningfull words, you have to do more than just remove two- and three letter words.
What you need is a list of stop-words:
http://en.wikipedia.org/wiki/Stop_word
A comma seperated list of stop-words for the English language can be found here:
http://www.textfixer.com/resources/common-english-words.txt
The easiest way is:
myString.Replace("and", "")
You'd loop over your word list and have a statement like the above. Google for a list of common English words?
List of English 2 Letter Words
List of English 3 Letter Words
You can match the words and remove them using regular expressions.