SQL Server | Look for specific keywords in strings - sql

I need your help.
I try to match a manually created lookup of specific keywords with a fact comment table. Purpose: an attempt to categorize these comments.
Example
comment: A lot more power than the equivalent from Audi.
keyword from keyword-list: Audi
category from keyword-list: competitor
I tried something like
SELECT
FC.comment_id, KWM.keyword, KWM.category
FROM
dbo.factcomments FC
INNER JOIN
(SELECT
keywordmatcher = '%[,. ]' + keyword + '[ .,]%',
keyword,
category
FROM
dbo.keywordlist) KWM ON FC.comment LIKE KWM.keywordmatcher
Maybe a bad example, but I only want specific matches --> no matches if the keyword is part of another word in the fact comments (e.g. 'part' but not 'apart').
Because my first try didn't match keywords at the beginning/end of strings I did something really nasty:
SELECT
FC.comment_id, KWM.keyword, KWM.category
FROM
dbo.factcomments FC
INNER JOIN
(SELECT
keyword,
category
FROM
dbo.keywordlist) KWM ON FC.comment LIKE '%[,. ]' + KWM.keyword + '[ .,]%'
OR FC.comment LIKE KWM.keyword + '[ .,]%'
OR FC.comment LIKE '%[,. ]' + KWM.keyword
I know...
Besides the fact that I also want to detect those comments where there are '!', '?', ''', '-' or '_' before or after these keywords - is there any clever way to do so?
In fact I want any comments where there are no word characters before or after the keyword, any other character is OK.

In the JOIN condition, REPLACE() all non-alphanumeric characters in FC.Comment with a space character, and surround it with spaces. Something like this:
' '+REPLACE(FC.Comment, ...)+' '
Then do your LIKE Comparison like this:
LIKE '% '+KWM.Keyword+' %'

A different approach may be.
declare #comment varchar(255)=concat(' ','A lot more power than the equivalent from Audi.',' ')
declare #keyword varchar(50)='Audi'
DECLARE #allowedStrings VARCHAR(100)
DECLARE #teststring VARCHAR(100)
SET #allowedStrings = '><()!?#_-.\/?!*&^%$#()~'
;WITH CTE AS
(
SELECT SUBSTRING(#allowedStrings, 1, 1) AS [String], 1 AS [Start], 1 AS [Counter]
UNION ALL
SELECT SUBSTRING(#allowedStrings, [Start] + 1, 1) AS [String], [Start] + 1, [Counter] + 1
FROM CTE
WHERE [Counter] < LEN(#allowedStrings)
)
SELECT #comment = REPLACE(#comment, CTE.[String], '') FROM CTE
Change the #comment variable however you like and check the result
SELECT
#comment as Comment , #keyword as KeyWord,
iif(substring(#comment,PATINDEX(concat('%',#keyword,'%'),#comment)-1,len(#keyword)+2)=' Audi ',1,0) as isMatch
This is a borrowed idea from https://stackoverflow.com/a/29162400/10735793

Related

Compare all possible substring of in two strings sql server 2008

I have a table with a column and value JobSkill = ".net sap lead". Now user enter the value "abap sap hana". I want to include a where condition which match exactly 3 or more continuous characters including space. In above scenario both have common "sap" substring so the condition should result in true. Below is my query. Please help. Previously I am using charindex but it does not resolve the purpose. I am using sql server 2008
SELECT Email_Id, JobSkill FROM Jobs
WHERE CHARINDEX(JobSkill, "abap sap hana") > 0
You need to create a function which loops through all positions of characters of String1 except the last 2, and check if String2 is like '%' + [(x,x+1,x+2)] + '%' string, where x is current position.
So for stings ('abcd acd g', 'ert acd'),
it should check
'ert acd' like '%abc%'
'ert acd' like '%bcd%'
'ert acd' like '%cd %'
'ert acd' like '%d a%'
and so on...
If like returns TRUE, break the loop.
Try like this,
SELECT j.Email_Id
,j.JobSkill
FROM Jobs j
INNER JOIN (
SELECT LTRIM(RTRIM(m.n.value('.[1]', 'varchar(8000)'))) SearchString
FROM (
SELECT CAST('<XMLRoot><RowData>' + REPLACE(#Input, ' ', '</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
) t
CROSS APPLY x.nodes('/XMLRoot/RowData') m(n)
) T ON j.JobSkill LIKE '%' + T.SearchString + '%'

Using SQL Server to replace line breaks in columns with spaces

I have a large table of data where some of my columns contain line breaks. I would like to remove them and replace them with some spaces instead.
Can anybody tell me how to do this in SQL Server?
Thanks in advance
SELECT REPLACE(REPLACE(#str, CHAR(13), ''), CHAR(10), '')
This should work, depending on how the line breaks are encoded:
update t
set col = replace(col, '
', ' ')
where col like '%
%';
That is, in SQL Server, a string can contain a new line character.
#Gordon's answer should work, but in case you're not sure how your line breaks are encoded, you can use the ascii function to return the character value. For example:
declare #entry varchar(50) =
'Before break
after break'
declare #max int = len(#entry)
; with CTE as (
select 1 as id
, substring(#entry, 1, 1) as chrctr
, ascii(substring(#entry, 1, 1)) as code
union all
select id + 1
, substring(#entry, ID + 1, 1)
, ascii(substring(#entry, ID + 1, 1))
from CTE
where ID <= #max)
select chrctr, code from cte
print replace(replace(#entry, char(13) , ' '), char(10) , ' ')
Depending where your text is coming from, there are different encodings for a line break. In my test string I put the most common.
First I replace all CHAR(10) (Line feed) with CHAR(13) (Carriage return), then all doubled CRs to one CR and finally all CRs to the wanted replace (you want a blank, I put a dot for better visability:
Attention: Switch the output to "text", otherwise you wont see any linebreaks...
DECLARE #text VARCHAR(100)='test single 10' + CHAR(10) + 'test 13 and 10' + CHAR(13) + CHAR(10) + 'test single 13' + CHAR(13) + 'end of test';
SELECT #text
DECLARE #ReplChar CHAR='.';
SELECT REPLACE(REPLACE(REPLACE(#text,CHAR(10),CHAR(13)),CHAR(13)+CHAR(13),CHAR(13)),CHAR(13),#ReplChar);
I have the same issue, means I have a column having values with line breaks in it. I use the query
update `your_table_name` set your_column_name = REPLACE(your_column_name,'\n','')
And this resolves my issue :)
Basically '\n' is the character for Enter key or line break and in this query, I have replaced it with no space (which I want)
Keep Learning :)
zain

how to compare string in SQL without using LIKE

I am using SQL query shown below to compare AMCcode. But if I compare the AMCcode '1' using LIKE operator it will compare all the entries with AMCcode 1, 10,11,12,13. .. 19, 21,31.... etc. But I want to match the AMCcode only with 1. Please suggest how can I do it. The code is given below :
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like
(
CASE
WHEN #AMCCode IS NULL THEN '%'
ELSE '%'+#AMCCode+ '%'
END
)
This is part of the code where I need to replace the LIKE operator with any other operator which will give the AMCcode with 1 when I want to search AMCcode of 1, not all 10,11,12..... Please help
I think you are looking for something like this:
where ',' + cast(PM.PA_AMCCode as varchar(255)) + ',' like '%,' + #AMCCodes + ',%'
This includes the delimiters in the comparison.
Note that a better method is to split the string and use a join, something like this:
select t.*
from t cross apply
(select cast(code as int) as code
from dbo.split(#AMCCodes, ',') s(code)
) s
where t.AMCCode = s.code;
This is better because under some circumstances, this version can make use of an index on AMCCode.
If you want to exactly match the value, you don't need to use '%' in your query. You can just use like as below
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like
(
CASE
WHEN #AMCCode IS NULL THEN '%'
ELSE #AMCCode
END
)
Possibly you can remove case statement and query like
ISNULL(CONVERT(VARCHAR(10),PM.PA_AMCCode), '') like #AMCCode

How can I remove leading and trailing quotes in SQL Server?

I have a table in a SQL Server database with an NTEXT column. This column may contain data that is enclosed with double quotes. When I query for this column, I want to remove these leading and trailing quotes.
For example:
"this is a test message"
should become
this is a test message
I know of the LTRIM and RTRIM functions but these workl only for spaces. Any suggestions on which functions I can use to achieve this.
I have just tested this code in MS SQL 2008 and validated it.
Remove left-most quote:
UPDATE MyTable
SET FieldName = SUBSTRING(FieldName, 2, LEN(FieldName))
WHERE LEFT(FieldName, 1) = '"'
Remove right-most quote: (Revised to avoid error from implicit type conversion to int)
UPDATE MyTable
SET FieldName = SUBSTRING(FieldName, 1, LEN(FieldName)-1)
WHERE RIGHT(FieldName, 1) = '"'
I thought this is a simpler script if you want to remove all quotes
UPDATE Table_Name
SET col_name = REPLACE(col_name, '"', '')
You can simply use the "Replace" function in SQL Server.
like this ::
select REPLACE('this is a test message','"','')
note: second parameter here is "double quotes" inside two single quotes and third parameter is simply a combination of two single quotes. The idea here is to replace the double quotes with a blank.
Very simple and easy to execute !
My solution is to use the difference in the the column values length compared the same column length but with the double quotes replaced with spaces and trimmed in order to calculate the start and length values as parameters in a SUBSTRING function.
The advantage of doing it this way is that you can remove any leading or trailing character even if it occurs multiple times whilst leaving any characters that are contained within the text.
Here is my answer with some test data:
SELECT
x AS before
,SUBSTRING(x
,LEN(x) - (LEN(LTRIM(REPLACE(x, '"', ' ')) + '|') - 1) + 1 --start_pos
,LEN(LTRIM(REPLACE(x, '"', ' '))) --length
) AS after
FROM
(
SELECT 'test' AS x UNION ALL
SELECT '"' AS x UNION ALL
SELECT '"test' AS x UNION ALL
SELECT 'test"' AS x UNION ALL
SELECT '"test"' AS x UNION ALL
SELECT '""test' AS x UNION ALL
SELECT 'test""' AS x UNION ALL
SELECT '""test""' AS x UNION ALL
SELECT '"te"st"' AS x UNION ALL
SELECT 'te"st' AS x
) a
Which produces the following results:
before after
-----------------
test test
"
"test test
test" test
"test" test
""test test
test"" test
""test"" test
"te"st" te"st
te"st te"st
One thing to note that when getting the length I only need to use LTRIM and not LTRIM and RTRIM combined, this is because the LEN function does not count trailing spaces.
I know this is an older question post, but my daughter came to me with the question, and referenced this page as having possible answers. Given that she's hunting an answer for this, it's a safe assumption others might still be as well.
All are great approaches, and as with everything there's about as many way to skin a cat as there are cats to skin.
If you're looking for a left trim and a right trim of a character or string, and your trailing character/string is uniform in length, here's my suggestion:
SELECT SUBSTRING(ColName,VAR, LEN(ColName)-VAR)
Or in this question...
SELECT SUBSTRING('"this is a test message"',2, LEN('"this is a test message"')-2)
With this, you simply adjust the SUBSTRING starting point (2), and LEN position (-2) to whatever value you need to remove from your string.
It's non-iterative and doesn't require explicit case testing and above all it's inline all of which make for a cleaner execution plan.
The following script removes quotation marks only from around the column value if table is called [Messages] and the column is called [Description].
-- If the content is in the form of "anything" (LIKE '"%"')
-- Then take the whole text without the first and last characters
-- (from the 2nd character and the LEN([Description]) - 2th character)
UPDATE [Messages]
SET [Description] = SUBSTRING([Description], 2, LEN([Description]) - 2)
WHERE [Description] LIKE '"%"'
You can use following query which worked for me-
For updating-
UPDATE table SET colName= REPLACE(LTRIM(RTRIM(REPLACE(colName, '"', ''))), '', '"') WHERE...
For selecting-
SELECT REPLACE(LTRIM(RTRIM(REPLACE(colName, '"', ''))), '', '"') FROM TableName
you could replace the quotes with an empty string...
SELECT AllRemoved = REPLACE(CAST(MyColumn AS varchar(max)), '"', ''),
LeadingAndTrailingRemoved = CASE
WHEN MyTest like '"%"' THEN SUBSTRING(Mytest, 2, LEN(CAST(MyTest AS nvarchar(max)))-2)
ELSE MyTest
END
FROM MyTable
Some UDFs for re-usability.
Left Trimming by character (any number)
CREATE FUNCTION [dbo].[LTRIMCHAR] (#Input NVARCHAR(max), #TrimChar CHAR(1) = ',')
RETURNS NVARCHAR(max)
AS
BEGIN
RETURN REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(#Input,' ','¦'), #TrimChar, ' ')), ' ', #TrimChar),'¦',' ')
END
Right Trimming by character (any number)
CREATE FUNCTION [dbo].[RTRIMCHAR] (#Input NVARCHAR(max), #TrimChar CHAR(1) = ',')
RETURNS NVARCHAR(max)
AS
BEGIN
RETURN REPLACE(REPLACE(RTRIM(REPLACE(REPLACE(#Input,' ','¦'), #TrimChar, ' ')), ' ', #TrimChar),'¦',' ')
END
Note the dummy character '¦' (Alt+0166) cannot be present in the data (you may wish to test your input string, first, if unsure or use a different character).
To remove both quotes you could do this
SUBSTRING(fieldName, 2, lEN(fieldName) - 2)
you can either assign or project the resulting value
You can use TRIM('"' FROM '"this "is" a test"') which returns: this "is" a test
CREATE FUNCTION dbo.TRIM(#String VARCHAR(MAX), #Char varchar(5))
RETURNS VARCHAR(MAX)
BEGIN
RETURN SUBSTRING(#String,PATINDEX('%[^' + #Char + ' ]%',#String)
,(DATALENGTH(#String)+2 - (PATINDEX('%[^' + #Char + ' ]%'
,REVERSE(#String)) + PATINDEX('%[^' + #Char + ' ]%',#String)
)))
END
GO
Select dbo.TRIM('"this is a test message"','"')
Reference : http://raresql.com/2013/05/20/sql-server-trim-how-to-remove-leading-and-trailing-charactersspaces-from-string/
I use this:
UPDATE DataImport
SET PRIO =
CASE WHEN LEN(PRIO) < 2
THEN
(CASE PRIO WHEN '""' THEN '' ELSE PRIO END)
ELSE REPLACE(PRIO, '"' + SUBSTRING(PRIO, 2, LEN(PRIO) - 2) + '"',
SUBSTRING(PRIO, 2, LEN(PRIO) - 2))
END
Try this:
SELECT left(right(cast(SampleText as nVarchar),LEN(cast(sampleText as nVarchar))-1),LEN(cast(sampleText as nVarchar))-2)
FROM TableName

How can I find Unicode/non-ASCII characters in an NTEXT field in a SQL Server 2005 table?

I have a table with a couple thousand rows. The description and summary fields are NTEXT, and sometimes have non-ASCII chars in them. How can I locate all of the rows with non ASCII characters?
I have sometimes been using this "cast" statement to find "strange" chars
select
*
from
<Table>
where
<Field> != cast(<Field> as varchar(1000))
First build a string with all the characters you're not interested in (the example uses the 0x20 - 0x7F range, or 7 bits without the control characters.) Each character is prefixed with |, for use in the escape clause later.
-- Start with tab, line feed, carriage return
declare #str varchar(1024)
set #str = '|' + char(9) + '|' + char(10) + '|' + char(13)
-- Add all normal ASCII characters (32 -> 127)
declare #i int
set #i = 32
while #i <= 127
begin
-- Uses | to escape, could be any character
set #str = #str + '|' + char(#i)
set #i = #i + 1
end
The next snippet searches for any character that is not in the list. The % matches 0 or more characters. The [] matches one of the characters inside the [], for example [abc] would match either a, b or c. The ^ negates the list, for example [^abc] would match anything that's not a, b, or c.
select *
from yourtable
where yourfield like '%[^' + #str + ']%' escape '|'
The escape character is required because otherwise searching for characters like ], % or _ would mess up the LIKE expression.
Hope this is useful, and thanks to JohnFX's comment on the other answer.
Here ya go:
SELECT *
FROM Objects
WHERE
ObjectKey LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?#\[\^_`{|}~\]\\]%' ESCAPE '\'
It's probably not the best solution, but maybe a query like:
SELECT *
FROM yourTable
WHERE yourTable.yourColumn LIKE '%[^0-9a-zA-Z]%'
Replace the "0-9a-zA-Z" expression with something that captures the full ASCII set (or a subset that your data contains).
Technically, I believe that an NCHAR(1) is a valid ASCII character IF & Only IF UNICODE(#NChar) < 256 and ASCII(#NChar) = UNICODE(#NChar) though that may not be exactly what you intended. Therefore this would be a correct solution:
;With cteNumbers as
(
Select ROW_NUMBER() Over(Order By c1.object_id) as N
From sys.system_columns c1, sys.system_columns c2
)
Select Distinct RowID
From YourTable t
Join cteNumbers n ON n <= Len(CAST(TXT As NVarchar(MAX)))
Where UNICODE(Substring(TXT, n.N, 1)) > 255
OR UNICODE(Substring(TXT, n.N, 1)) <> ASCII(Substring(TXT, n.N, 1))
This should also be very fast.
I started with #CC1960's solution but found an interesting use case that caused it to fail. It seems that SQL Server will equate certain Unicode characters to their non-Unicode approximations. For example, SQL Server considers the Unicode character "fullwidth comma" (http://www.fileformat.info/info/unicode/char/ff0c/index.htm) the same as a standard ASCII comma when compared in a WHERE clause.
To get around this, have SQL Server compare the strings as binary. But remember, nvarchar and varchar binaries don't match up (16-bit vs 8-bit), so you need to convert your varchar back up to nvarchar again before doing the binary comparison:
select *
from my_table
where CONVERT(binary(5000),my_table.my_column) != CONVERT(binary(5000),CONVERT(nvarchar(1000),CONVERT(varchar(1000),my_table.my_column)))
If you are looking for a specific unicode character, you might use something like below.
select Fieldname from
(
select Fieldname,
REPLACE(Fieldname COLLATE Latin1_General_BIN,
NCHAR(65533) COLLATE Latin1_General_BIN,
'CustomText123') replacedcol
from table
) results where results.replacedcol like '%CustomText123%'
My previous answer was confusing UNICODE/non-UNICODE data. Here is a solution that should work for all situations, although I'm still running into some anomalies. It seems like certain non-ASCII unicode characters for superscript characters are being confused with the actual number character. You might be able to play around with collations to get around that.
Hopefully you already have a numbers table in your database (they can be very useful), but just in case I've included the code to partially fill that as well.
You also might need to play around with the numeric range, since unicode characters can go beyond 255.
CREATE TABLE dbo.Numbers
(
number INT NOT NULL,
CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (number)
)
GO
DECLARE #i INT
SET #i = 0
WHILE #i < 1000
BEGIN
INSERT INTO dbo.Numbers (number) VALUES (#i)
SET #i = #i + 1
END
GO
SELECT *,
T.ID, N.number, N'%' + NCHAR(N.number) + N'%'
FROM
dbo.Numbers N
INNER JOIN dbo.My_Table T ON
T.description LIKE N'%' + NCHAR(N.number) + N'%' OR
T.summary LIKE N'%' + NCHAR(N.number) + N'%'
and t.id = 1
WHERE
N.number BETWEEN 127 AND 255
ORDER BY
T.id, N.number
GO
-- This is a very, very inefficient way of doing it but should be OK for
-- small tables. It uses an auxiliary table of numbers as per Itzik Ben-Gan and simply
-- looks for characters with bit 7 set.
SELECT *
FROM yourTable as t
WHERE EXISTS ( SELECT *
FROM msdb..Nums as NaturalNumbers
WHERE NaturalNumbers.n < LEN(t.string_column)
AND ASCII(SUBSTRING(t.string_column, NaturalNumbers.n, 1)) > 127)