SQL (MySQL): Match first letter of any word in a string? - sql

(Note: This is for MySQL's SQL, not SQL Server.)
I have a database column with values like "abc def GHI JKL". I want to write a WHERE clause that includes a case-insensitive test for any word that begins with a specific letter. For example, that example would test true for the letters a,c,g,j because there's a 'word' beginning with each of those letters. The application is for a search that offers to find records that have only words beginning with the specified letter. Also note that there is not a fulltext index for this table.

You can use a LIKE operation. If your words are space-separated, append a space to the start of the string to give the first word a chance to match:
SELECT
StringCol
FROM
MyTable
WHERE
' ' + StringCol LIKE '% ' + MyLetterParam + '%'
Where MyLetterParam could be something like this:
'[acgj]'
To look for more than a space as a word separator, you can expand that technique. The following would treat TAB, CR, LF, space and NBSP as word separators.
WHERE
' ' + StringCol LIKE '%['+' '+CHAR(9)+CHAR(10)+CHAR(13)+CHAR(160)+'][acgj]%'
This approach has the nice touch of being standard SQL. It would work unchanged across the major SQL dialects.

Using REGEXP opearator:
SELECT * FROM `articles` WHERE `body` REGEXP '[[:<:]][acgj]'
It returns records where column body contains words starting with a,c,g or i (case insensitive)
Be aware though: this is not a very good idea if you expect any heavy load (not using index - it scans every row!)

Check the Pattern Matching and Regular Expressions sections of the MySQL Reference Manual.

Related

How to search using similar characters in random positions in Microsoft SQL?

I'm looking for a query that would allow the user to use a variation of characters while searching for a result. The character positions are completely random. We use special characters È,Š,Ć,Č,Ž and Đ so all of the variations have to match, because most of users do not know how to spell correctly.
Example:
MISIC
MISIĆ
MISIČ
MIŠIC
MIŠIĆ
MIŠIČ
You can search it by using COLLATE
SELECT *
FROM TableNAme
WHERE
columnName COLLATE Like '%MISIC%' COLLATE Latin1_general_CI_AI
latin1 makes the server treat strings using charset latin 1,
basically ascii.
CI specifies case-insensitive, so "ABC" equals to "abc".
AI specifies accent-insensitive,so 'ü' equals to 'u'.
for more information collation go through the
Collete
refereance : #JINO SHAJI
as per #Adephx comment this is working as expected with few modification
SELECT * FROM [TABLE] WHERE [COLUMN] LIKE '%NAME%' COLLATE Latin1_general_CI_AI
Applying COLLATION is a great practice, especially if we want to get rid of all Accent-marks, however, if we need more granular control over individual accent-characters (È,Š,Ć,Č,Ž), we can do something like below to selectively compare individual accent-characters.
Most DBMSs provide string-comparison functionality based on how the words sound (pronounced). SQL Server provides two built-in functions for this: SOUNDEX() and DIFFERENCE(). In this scenario we can do this:
IF (DIFFERENCE('MISIC', 'MISIĆ')>=4)
AND (DIFFERENCE('MISIC', 'MISIČ')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIC')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIĆ')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIČ')>=4)
PRINT 'Same word'
ELSE
PRINT 'Different word.'
Actually, in many languages 'Š' sounds quite different than 'S', therefore SQL Server considers them as less-compatible, but here is a workaround to impose equivalence:
WITH words AS (SELECT value FROM STRING_SPLIT(N'MISIĆ,MISIČ,MIŠIC,MIŠIĆ,MIŠIČ', ','))
SELECT
value,
CASE WHEN (DIFFERENCE('MISIC', replace(value,'Š','S'))>=4)
THEN 'Same word'
ELSE 'Not same'
END AS 'Comparison'
FROM words
Output:
value comparison
----- ----------
MISIĆ Same word
MISIČ Same word
MIŠIC Same word
MIŠIĆ Same word
MIŠIČ Same word
Above example will work in "Microsoft SQL Server 2016" or above, note that the STRING_SPLIT() function is only used to iterate over the array of words/strings, this function is not available in SQL Server 2014 or below.
Hope this helps.

Regex not matching correct string

I am busy building a lookup table for specific names of merchants. I tried to make use of the following regex but it's returning less results than the standard "like" function in Netezza SQL. Please refer to below:
SQL Like function: where trim(upper(a.MRCH_NME)) like '%CNA %' -- returns 4622 matches
Regex function in Netezza SQL: where array_combine(regexp_extract_all(trim(upper(a.MRCH_NME)),'.*CNA\s','i'),'|') = 'CNA' -- returns 2226 matches
I looked at the two result sets and found that strings such as the following aren't matched:
!C CNA INT ARR
*CNA PLATZ 0400
015764 CNA CRAD
C#CNA PARK 0
I made use of the following regex expression: /.*CNA\s'/
Any idea why the above strings aren't being returned as matches?
Thank you.
You probably should be using regexp_like:
SELECT *
FROM yourTable
WHERE REGEXP_LIKE(MRCH_NME, 'CNA[ ]', 'i');
This would be logically identical to the following query using LIKE:
SELECT *
FROM yourTable
WHERE MRCH_NME LIKE '%CNA ';
It seems to me the problem is more with your code rather than the regex. Look: like '%CNA %' returns all entries that contain a CNA substring followed with a literal space anywhere inside the entry. The '.*CNA\s' regex matches any 0+ chars other than newline followed with CNA and **any whitespace char*.
Acc. to this reference, \s matches "a white space character. White space is defined as [\t\n\f\r\p{Z}].
Thus, you should in fact just use
WHERE REGEXP_LIKE(MRCH_NME, 'CNA ', 'i')
or, better with a word boundary check:
WHERE REGEXP_LIKE(MRCH_NME, '\bCNA\b', 'i')
where \b marks a transition from a word to non-word and non-word to word character, thus ensuring a whole word search and justifying the regex usage.
If you do not need to match the merchant name as a whole word, use the regular LIKE with '%CNA %', it should be more efficient.

Sqlite - How to remove substring in a field using wildcards

I would like to remove all occurrences of a certain string pattern across all fields in a table.
For example, find all occurrences of the pattern "<XY>*<Xy>" where * represents any possible configuration of characters. I want to remove just those substrings and leave the remainder of the string intact.
This is an example of what I would like to use as my SQL command, but of course this doesn't work:
UPDATE Table SET Field = replace(Field, '<XY>*<Xy>', '');
What is the solution?
Here is an option which attempts to splice around the <XY>...</XY> tags:
UPDATE yourTable
SET Field = SUBSTR(Field, 1, INSTR(Field, '<XY>') - 1) ||
SUBSTR(Field, INSTR(Field, '</XY>') + 5)
WHERE Field LIKE '%<XY>%</XY>%'
It updates fields containing this pattern with the concatenation of everything coming before the first <XY> and everything coming after the second </XY>.
Note that I used <XY>...</XY> rather than what you had originally, because INSTR() is not case sensitive, and both tags would appear as being the same thing.
Demo
The demo is for MySQL but the sytnax is almost identical to SQLite.

Search for “whole word match” with SQL Server LIKE pattern

Does anyone have a LIKE pattern that matches whole words only?
It needs to account for spaces, punctuation, and start/end of string as word boundaries.
I am not using SQL Full Text Search as that is not available. I don't think it would be necessary for a simple keyword search when LIKE should be able to do the trick. However if anyone has tested performance of Full Text Search against LIKE patterns, I would be interested to hear.
Edit:
I got it to this stage, but it does not match start/end of string as a word boundary.
where DealTitle like '%[^a-zA-Z]pit[^a-zA-Z]%'
I want this to match "pit" but not "spit" in a sentence or as a single word.
E.g. DealTitle might contain "a pit of despair" or "pit your wits" or "a pit" or "a pit." or "pit!" or just "pit".
Full text indexes is the answer.
The poor cousin alternative is
'.' + column + '.' LIKE '%[^a-z]pit[^a-z]%'
FYI unless you are using _CS collation, there is no need for a-zA-Z
you can just use below condition for whitespace delimiters:
(' '+YOUR_FIELD_NAME+' ') like '% doc %'
it works faster and better than other solutions. so in your case it works fine with "a pit of despair" or "pit your wits" or "a pit" or "a pit." or just "pit", but not works for "pit!".
I think the recommended patterns exclude words with do not have any character at the beginning or at the end. I would use the following additional criteria.
where DealTitle like '%[^a-z]pit[^a-z]%' OR
DealTitle like 'pit[^a-z]%' OR
DealTitle like '%[^a-z]pit'
I hope it helps you guys!
Surround your string with spaces and create a test column like this:
SELECT t.DealTitle
FROM yourtable t
CROSS APPLY (SELECT testDeal = ' ' + ISNULL(t.DealTitle,'') + ' ') fx1
WHERE fx1.testDeal LIKE '%[^a-z]pit[^a-z]%'
If you can use regexp operator in your SQL query..
For finding any combination of spaces, punctuation and start/end of string as word boundaries:
where DealTitle regexp '(^|[[:punct:]]|[[:space:]])pit([[:space:]]|[[:punct:]]|$)'
Another simple alternative:
WHERE DealTitle like '%[^a-z]pit[^a-z]%' OR
DealTitle like '[^a-z]pit[^a-z]%' OR
DealTitle like '%[^a-z]pit[^a-z]'
This is a good topic and I want to complement this to someone how needs to find some word in some string passing this as element of a query.
SELECT
ST.WORD, ND.TEXT_STRING
FROM
[ST_TABLE] ST
LEFT JOIN
[ND_TABLE] ND ON ND.TEXT_STRING LIKE '%[^a-z]' + ST.WORD + '[^a-z]%'
WHERE
ST.WORD = 'STACK_OVERFLOW' -- OPTIONAL
With this you can list all the incidences of the ST.WORD in the ND.TEXT_STRING and you can use the WHERE clausule to filter this using some word.
You could search for the entire string in SQL:
select * from YourTable where col1 like '%TheWord%'
Then you could filter the returned rows client site, adding the extra condition that it must be a whole word. For example, if it matches the regex:
\bTheWord\b
Another option is to use a CLR function, available in SQL Server 2005 and higher. That would allow you to search for the regex server-side. This MSDN artcile has the details of how to set up a dbo.RegexMatch function.
Try using charindex to find the match:
Select *
from table
where charindex( 'Whole word to be searched', columnname) > 0

MySQL match existence of a term amongst comma seperated values using REGEXP

I have a MySQL select statement dilemma that has thrown me off course a little. In a certain column I have a category term. This term could look something like 'category' or it could contain multiple CSV's like this: 'category, bank special'.
I need to retrieve all rows containing the term 'bank special' in the comma separated value. Currently I am using the following SQL statement:
SELECT * FROM tblnecklaces WHERE nsubcat REGEXP 'bank special';
Now this works OK, but if I had the category as follows: 'diamond special' for example then the row is still retrieved because the term 'special' in my column seems to be matching up to the term 'bank special' in my REGEXP statement.
How would I go abut checking for the existence of the whole phrase 'bank special' only and not partially matching the words?
Many thanks for your time and help
The simplest solution is to use the LIKE clause (% is wildcard):
SELECT * FROM tblnecklaces WHERE nsubcat LIKE '%bank special%';
Note that LIKE is also a lot faster than REGEXP.
You can prefix the column with comma's, and compare it to the bank special:
SELECT *
FROM tblnecklaces
WHERE ',' + replace(nsubcat,', ','') + ',' LIKE ',bank special,'
I put in a replace to remove optional space after a comma, because your example has one.
Not tested but this RegExp should work (LIKE works too but will match items that starts or ends with the same phrase):
SELECT * FROM tblnecklaces WHERE nsubcat REGEXP '(^|, )bank special($|,)';