How to search using similar characters in random positions in Microsoft SQL? - sql

I'm looking for a query that would allow the user to use a variation of characters while searching for a result. The character positions are completely random. We use special characters È,Š,Ć,Č,Ž and Đ so all of the variations have to match, because most of users do not know how to spell correctly.
Example:
MISIC
MISIĆ
MISIČ
MIŠIC
MIŠIĆ
MIŠIČ

You can search it by using COLLATE
SELECT *
FROM TableNAme
WHERE
columnName COLLATE Like '%MISIC%' COLLATE Latin1_general_CI_AI
latin1 makes the server treat strings using charset latin 1,
basically ascii.
CI specifies case-insensitive, so "ABC" equals to "abc".
AI specifies accent-insensitive,so 'ü' equals to 'u'.
for more information collation go through the
Collete
refereance : #JINO SHAJI
as per #Adephx comment this is working as expected with few modification
SELECT * FROM [TABLE] WHERE [COLUMN] LIKE '%NAME%' COLLATE Latin1_general_CI_AI

Applying COLLATION is a great practice, especially if we want to get rid of all Accent-marks, however, if we need more granular control over individual accent-characters (È,Š,Ć,Č,Ž), we can do something like below to selectively compare individual accent-characters.
Most DBMSs provide string-comparison functionality based on how the words sound (pronounced). SQL Server provides two built-in functions for this: SOUNDEX() and DIFFERENCE(). In this scenario we can do this:
IF (DIFFERENCE('MISIC', 'MISIĆ')>=4)
AND (DIFFERENCE('MISIC', 'MISIČ')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIC')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIĆ')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIČ')>=4)
PRINT 'Same word'
ELSE
PRINT 'Different word.'
Actually, in many languages 'Š' sounds quite different than 'S', therefore SQL Server considers them as less-compatible, but here is a workaround to impose equivalence:
WITH words AS (SELECT value FROM STRING_SPLIT(N'MISIĆ,MISIČ,MIŠIC,MIŠIĆ,MIŠIČ', ','))
SELECT
value,
CASE WHEN (DIFFERENCE('MISIC', replace(value,'Š','S'))>=4)
THEN 'Same word'
ELSE 'Not same'
END AS 'Comparison'
FROM words
Output:
value comparison
----- ----------
MISIĆ Same word
MISIČ Same word
MIŠIC Same word
MIŠIĆ Same word
MIŠIČ Same word
Above example will work in "Microsoft SQL Server 2016" or above, note that the STRING_SPLIT() function is only used to iterate over the array of words/strings, this function is not available in SQL Server 2014 or below.
Hope this helps.

Related

How to pull a string of numbers out of a table that are placed randomly

I'm attempting to isolate eight digits from a cell that contains other numbers as well as text and no rhyme or reason to where it is placed. An example return would look something like this:
will deliver 11/07 in USA at 12:30 with conf# 12345678
I need the conf# only, but it could be at the end, beginning, middle of the string and I don't know how to isolate it. I'm working in DB2 so I can't use functions such as PATINDEX or CHARINDEX, so what are my other option for pulling out only "12345678" regardless of where it is located?
While DB2 doesn't have PATINDEX or CHARINDEX, it does have LOCATE.
If your DB2 version supportx pureXML, you can use the regular expression support in XQuery, something like:
select xmlcast(
xmlquery(
' if (fn:matches( $YOURCOLUMN, "(^|.*[^\d])(\d{8})([^\d].*$|$)")) then fn:replace( $YOURCOLUMN,"(^|.*[^\d])(\d{8})([^\d].*$|$)","$2") else "" '
)
as varchar(20)
)
from YOURTABLE
This assumes that 8-digit sequence appears only once in the column. You may need to tweak the regex to support some border cases.

IS it possible to use the 'Where' clause in SQL to only show a field containing only letters & Numbers?

I want to be able to select only the field where a certain field contains both letters and numbers. for example:
Select [field1], [field2]
from [db1].[table1]
where [field2] = *LETTERS AND NUMBERS*
Im using SQL Server 2005, also im sorry bu im not a hundred percent sure about the data type of the field because it is on a linked server and un-accessibleat the minute. Hope you can help
:)
LIKE will do it. This is a double negative
where [field2] NOT LIKE '%[^0-9a-z]%'
It says:
%[^0-9a-z]% means not (alphanumeric)
NOT LIKE '%[^0-9a-z]%' means not(not(alphanumeric)) -> alphanumeric
Edit:
For all numbers... "it works"
SELECT 'it works' WHERE '1234567' NOT LIKE '%[^0-9a-z]%'
All letters
SELECT 'it works' WHERE 'abcdefg' NOT LIKE '%[^0-9a-z]%'
Contains non-alphanumeric
SELECT 'it works' WHERE 'abc_123' NOT LIKE '%[^0-9a-z]%'
Edit 2:
This solution is for
only alphanumeric, any mixture of letters and numbers
Edit 3:
letters followed by numbers
where [field2] NOT LIKE '%[^0-9a-z]%' AND [field2] LIKE '[a-z]%[0-9]'
Edit:
Finally, 2 letters and upto 3 numbers
where
[field2] LIKE '[a-z][a-z][0-9]'
OR
[field2] LIKE '[a-z][a-z][0-9][0-9]'
OR
[field2] LIKE '[a-z][a-z][0-9][0-9][0-9]'
If you need it to contain both numerics and letters, and no other characters, I think you have to use 3 like clauses. One NOT LIKE, as #gbn said, then 2 LIKEs to ensure both character classes are represented:
select * from (select '123' union all select 'abc' union all select 'a2') t(Field)
where Field LIKE '%[0-9]%' and Field like '%[a-z]%'
AND Field NOT LIKE '%[^0-9a-z]%'
returns one row, with 'a2'.
If it should only be letters followed by numbers, I'm thinking you might be able to achieve this with a further not like, again inspired by #gbn:
NOT LIKE '%[0-9]%[a-z]%'
But it is starting to look like a regex in CLR might be the preferred route.
What you would want to do is SQL-based regexp matching. Check this out: http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Quotes:
"Although T-SQL is extremely powerful for most data processing, it provides little support for text analysis or manipulation. Attempting to perform any sophisticated text analysis using the built-in string functions results in massively large functions and stored procedures that are difficult to debug and maintain."
And:
"However there's SQLCLR, a CLR user-defined function (UDF) that lets you create an efficient and less error-prone set of functions using the Microsoft® .NET Framework."
Then you get code examples. Isn't Microsoft great? :D
I believe PATINDEX will do the trick for you. The query below checks for non 0-9 and non a-z characters, returning 0 if it doesn't find any (i.e., only #s and letters)
Select [field1], [field2]
from [db1].[table1]
where patindex('%[^0-9a-z]%', [field2]) = 0
Select [field1], [field2]
from [db1].[table1]
where [field2] REGEXP '^[0-9a-fA-F]*$'

Search for “whole word match” with SQL Server LIKE pattern

Does anyone have a LIKE pattern that matches whole words only?
It needs to account for spaces, punctuation, and start/end of string as word boundaries.
I am not using SQL Full Text Search as that is not available. I don't think it would be necessary for a simple keyword search when LIKE should be able to do the trick. However if anyone has tested performance of Full Text Search against LIKE patterns, I would be interested to hear.
Edit:
I got it to this stage, but it does not match start/end of string as a word boundary.
where DealTitle like '%[^a-zA-Z]pit[^a-zA-Z]%'
I want this to match "pit" but not "spit" in a sentence or as a single word.
E.g. DealTitle might contain "a pit of despair" or "pit your wits" or "a pit" or "a pit." or "pit!" or just "pit".
Full text indexes is the answer.
The poor cousin alternative is
'.' + column + '.' LIKE '%[^a-z]pit[^a-z]%'
FYI unless you are using _CS collation, there is no need for a-zA-Z
you can just use below condition for whitespace delimiters:
(' '+YOUR_FIELD_NAME+' ') like '% doc %'
it works faster and better than other solutions. so in your case it works fine with "a pit of despair" or "pit your wits" or "a pit" or "a pit." or just "pit", but not works for "pit!".
I think the recommended patterns exclude words with do not have any character at the beginning or at the end. I would use the following additional criteria.
where DealTitle like '%[^a-z]pit[^a-z]%' OR
DealTitle like 'pit[^a-z]%' OR
DealTitle like '%[^a-z]pit'
I hope it helps you guys!
Surround your string with spaces and create a test column like this:
SELECT t.DealTitle
FROM yourtable t
CROSS APPLY (SELECT testDeal = ' ' + ISNULL(t.DealTitle,'') + ' ') fx1
WHERE fx1.testDeal LIKE '%[^a-z]pit[^a-z]%'
If you can use regexp operator in your SQL query..
For finding any combination of spaces, punctuation and start/end of string as word boundaries:
where DealTitle regexp '(^|[[:punct:]]|[[:space:]])pit([[:space:]]|[[:punct:]]|$)'
Another simple alternative:
WHERE DealTitle like '%[^a-z]pit[^a-z]%' OR
DealTitle like '[^a-z]pit[^a-z]%' OR
DealTitle like '%[^a-z]pit[^a-z]'
This is a good topic and I want to complement this to someone how needs to find some word in some string passing this as element of a query.
SELECT
ST.WORD, ND.TEXT_STRING
FROM
[ST_TABLE] ST
LEFT JOIN
[ND_TABLE] ND ON ND.TEXT_STRING LIKE '%[^a-z]' + ST.WORD + '[^a-z]%'
WHERE
ST.WORD = 'STACK_OVERFLOW' -- OPTIONAL
With this you can list all the incidences of the ST.WORD in the ND.TEXT_STRING and you can use the WHERE clausule to filter this using some word.
You could search for the entire string in SQL:
select * from YourTable where col1 like '%TheWord%'
Then you could filter the returned rows client site, adding the extra condition that it must be a whole word. For example, if it matches the regex:
\bTheWord\b
Another option is to use a CLR function, available in SQL Server 2005 and higher. That would allow you to search for the regex server-side. This MSDN artcile has the details of how to set up a dbo.RegexMatch function.
Try using charindex to find the match:
Select *
from table
where charindex( 'Whole word to be searched', columnname) > 0

I have a problem in SQL Server 2000 when searching for a term in farsi

I have a problem in SQL Server 2000 with farsi search.
I have a table with nvarchar fields with unicode (farsi) values and need to search content of that with unicode (farsi) text.
I am using
select * from table1
where fieldname like '%[farsi word]%'
My farsi word is exist but return 0 row.
What can I do?
thanks all.
If you're using NVARCHAR fields, you should also use Unicode when searching! You do this by prepending a N before your search term:
select * from table1
where fieldname like N'%[farsi word]%'
Also: be aware the if your search term begins with a % wildcard, you've basically disabled all use of any indices there might be to speed up your search. Using LIKE %...% for searching will always result in a pretty slow table scan....

SQL (MySQL): Match first letter of any word in a string?

(Note: This is for MySQL's SQL, not SQL Server.)
I have a database column with values like "abc def GHI JKL". I want to write a WHERE clause that includes a case-insensitive test for any word that begins with a specific letter. For example, that example would test true for the letters a,c,g,j because there's a 'word' beginning with each of those letters. The application is for a search that offers to find records that have only words beginning with the specified letter. Also note that there is not a fulltext index for this table.
You can use a LIKE operation. If your words are space-separated, append a space to the start of the string to give the first word a chance to match:
SELECT
StringCol
FROM
MyTable
WHERE
' ' + StringCol LIKE '% ' + MyLetterParam + '%'
Where MyLetterParam could be something like this:
'[acgj]'
To look for more than a space as a word separator, you can expand that technique. The following would treat TAB, CR, LF, space and NBSP as word separators.
WHERE
' ' + StringCol LIKE '%['+' '+CHAR(9)+CHAR(10)+CHAR(13)+CHAR(160)+'][acgj]%'
This approach has the nice touch of being standard SQL. It would work unchanged across the major SQL dialects.
Using REGEXP opearator:
SELECT * FROM `articles` WHERE `body` REGEXP '[[:<:]][acgj]'
It returns records where column body contains words starting with a,c,g or i (case insensitive)
Be aware though: this is not a very good idea if you expect any heavy load (not using index - it scans every row!)
Check the Pattern Matching and Regular Expressions sections of the MySQL Reference Manual.