MS SQL Server - replace names while avoiding words containing the names - sql

This is my first time posting on Stack Overflow, so please let me know if I can do anything better or provide more information.
I have been working on this issue for a few days now. I have a table with comments from employees about the company. Some of them could refer to specific employees in the company. For HR reasons, we want to replace any occurrence of an employee name with the word 'employee'. We aren't accounting for typos or misspellings.
An example of my desired outcome would be:
Input: 'I dislike dijon mustard. My boss Jon sucks.'
Name to search for: 'Jon'
Output: 'I dislike dijon mustard. My boss employee sucks.'
Another example:
Input: 'Aggregating data is boring. Greg is the worst person ever.'
Name to search for: 'Greg'
Output: 'Aggregating data is boring. employee is the worst person ever.'
I want to search the comments for occurrences of the employee names, but only if they aren't followed by other letters or numbers on either end. Occurrences with spaces or punctuation on either end of the name should be replaced.
So far I have tried the suggestions in the following threads:
How to replace a specific word in a sentence without replacing in substring in SQL Server
replacing-in-substring-in-s
This yielded the following
update c
set c.Comment = rtrim(ltrim(Replace(replace(' ' + c.Comment + ' ',' ' + en.FirstName + ' ', 'employee'), ' ' + en.FirstName + ' ', 'employee')))
from AnswerComment c
join #EmployeeNames en on en.SurveyId = c.SurveyId
and c.Comment like '%' + en.FirstName + '%'
However, I got results like this:
Input: 'I hate bob.'
Name to search for: 'Bob'
Output: 'I hate bob.'
Input: 'Jon sucks'
Name to search for: 'Jon'
Output: 'employeesucks'
A coworker looked at this thread Replace whole word using ms sql server "replace"
and gave me the following based off of it:
DECLARE #token VARCHAR(10) = 'bob';
DECLARE #replaceToken VARCHAR(10) = 'employee';
DECLARE #paddedToken VARCHAR(10) = ' ' + #token + ' ';
DECLARE #paddedReplaceToken VARCHAR(10) = ' ' + #replaceToken + ' ';
;WITH Step1 AS (
SELECT CommentorId
, QuestionId
, Comment
, REPLACE(Comment, #paddedToken, #paddedReplaceToken) AS [Value]
FROM AnswerComment
WHERE SurveyId = 90492
AND Comment LIKE '%' + #token + '%'
), Step2 AS (
SELECT CommentorId
, QuestionId
, Comment
, REPLACE([Value], #paddedToken, #paddedReplaceToken) AS [Value]
FROM Step1
), Step3 AS (
SELECT CommentorId
, QuestionId
, Comment
, IIF(CHARINDEX(LTRIM(#paddedToken), [Value]) = 1, STUFF([Value], 1, LEN(TRIM(#paddedToken)), TRIM(#paddedReplaceToken)), [Value]) AS [Value]
FROM Step2
)
SELECT CommentorId
, QuestionId
, Comment
, IIF(CHARINDEX(REVERSE(RTRIM(#paddedToken)), REVERSE([Value])) = 1,
REVERSE(STUFF(REVERSE([Value]), CHARINDEX(REVERSE(RTRIM(#paddedToken)), REVERSE([Value])), LEN(RTRIM(#paddedToken)), REVERSE(RTRIM(#paddedReplaceToken)))),
[Value])
FROM Step3;
But I have no idea how I would implement this.
Another thread I can't find anymore suggested using %[^a-z0-9A-Z]% for searching, like this:
update c
set c.Comment = REPLACE(c.Comment, en.FirstName, 'employee')
from AnswerComment c
join #EmployeeNames en on en.SurveyId = c.SurveyId
and c.Comment like '%' + en.FirstName + '%'
and c.Comment not like '%[^a-z0-9A-Z]%' + en.FirstName + '%[^a-z0-9A-Z]%'
select ##ROWCOUNT [first names replaced]
This doesn't work for me. It replaces occurrences of the employee names even if they're part of a larger word, like in this example:
Input: 'I dislike dijon mustard.'
Name to search for: 'Jon'
Output: 'I dislike diemployee mustard.'
At this point it seems to me that it's impossible to accomplish this. Is there anything wrong with how I've implemented these, or anything obvious that I'm missing?

Here is a method that uses a combination of STUFF and PATINDEX.
It'll only replace the first occurence of the name in the comment.
So it might have to be executed more than once till nothing gets updated by it.
UPDATE c
SET c.Comment = STUFF(c.Comment, PATINDEX('%[^a-z0-9]'+en.FirstName+'[^a-z0-9]%', '/'+c.Comment+'/'), len(en.FirstName), 'employee')
FROM AnswerComment c
JOIN #EmployeeNames en ON en.SurveyId = c.SurveyId
WHERE '/'+c.Comment+'/' LIKE '%[^a-z0-9]'+en.FirstName+'[^a-z0-9]%';

Something like this seems to work.
declare #charsTable table (notallowed char(1))
insert into #charsTable (notallowed) values (',')
insert into #charsTable (notallowed) values ('.')
insert into #charsTable (notallowed) values (' ')
declare #input nvarchar(max) = 'Aggregating data is boring. Greg is the worst person ever.'
declare #name nvarchar(50) = 'Greg'
--declare #input nvarchar(max) = 'I dislike dijon mustard. You know who sucks? My boss Jon.'
--declare #name nvarchar(50) = 'Jon'
select case when #name + notallowed = value or notallowed + #name = value or notallowed + #name = value then replace(value, #name, 'employee') else value end 'data()' from string_split(#input, ' ')
left join #charsTable on #name + notallowed = value or notallowed + #name = value or notallowed + #name + notallowed = value
for xml path('')
Results:
Aggregating data is boring. employee is the worst person ever.
I dislike dijon mustard. You know who sucks? My boss employee.

Related

SQL how to do a LIKE search on each value of a declared variable

I have a query where I am trying to do a LIKE search on each value of a declared variable, instead of doing a like search on the entire field value/string.
Example:
DECLARE #name VARCHAR(30)
SET #name = 'John Smith'
SELECT name FROM customers WHERE name like '%'+ #name + '%'
The record I am looking for is "John and Jane Smith". The query above returns NO result. If the user searches just 'John' OR just 'Smith' there are too many results returned.
I am trying to get the query to search like the query below:
SELECT name from customers WHERE name LIKE '%John% %Smith%'
I've searched for many options but not sure if my search terms are not correct, I have yet to find a solution.
I would try replacing spaces in your #name with '% %'
Something like
SET #nameFilter = REPLACE(#name,' ','% %')
SELECT name FROM customers WHERE name like '%'+ # nameFilter + '%'
A full-text search seems like the best approach. But you can approximate this at the word level by splitting the search term and looking for each individual word:
with words as (
select value as word
from string_split(#name)
)
select c.name
from customers c cross apply
(select count(*) as cnt
from words w
where c.name like '%' + c.word + '%'
) w
where w.cnt = (select count(*) from words);
This uses the string_split() functionality available in more recent versions of SQL Server. There are online versions of the function for older versions.
This was answered/accepted before I could post and what #sugar2Code posted is how I would do it.
That said, I was unclear if you wanted both the first and last name needed to be similar or just one of them. What I put together will allow you to decide using a parameter.
-- Sample Data
DECLARE #t TABLE (CustomerName VARCHAR(30))
INSERT #t VALUES('Johny Smith'),('Freddie Roach'),('Mr. Smithers'),('Johnathan Smithe');
-- User Arguments
DECLARE
#name VARCHAR(30) = 'John Smith',
#partialmatch BIT = 1;
-- Dynamic Solution
SELECT
t.CustomerName,
FNMatch = SIGN(pos.F),
LNMatch = SIGN(pos.L)
FROM #t AS t
CROSS JOIN
(
SELECT SUBSTRING(#name,1,f.Mid-1), SUBSTRING(#name,f.Mid+1,8000)
FROM (VALUES(CHARINDEX(' ',#name))) AS f(Mid)
) AS f(FName,LName)
CROSS APPLY (VALUES (CHARINDEX(f.FName,t.CustomerName), CHARINDEX(f.LName,t.CustomerName))) AS pos(F,L)
WHERE (#partialmatch = 0 AND pos.F*pos.L > 0)
OR (#partialmatch = 1 AND pos.F+pos.L > 0);
When #partialmatch = 1 you get:
CustomerName FNMatch LNMatch
------------------------------ ----------- -----------
Johny Smith 1 1
Mr. Smithers 0 1
Johnathan Smithe 1 1
Setting #partialMatch to 0 will exclude "Mr. Smithers".

SQL Server - COALESCE WHEN NOTHING RETURNS , GET DEFAULT VALUE

I'm trying to use Coalesce function in SQL Server to concatente multiple names. But when the conditon in the query returns no rows or nothing, I need to return a default value. I tried some condition using case statement but I can't figure it out what I missed.
declare #Names varchar(max) = '',
#Key varchar(max) = 'ABC'
select #Names = COALESCE(#Names, '') + isnull(T0.A, #Key) + ', '
from TData P
left join TNames T0 on T0.C + '\' + T0.D = P.#Key
where OBID=581464
and ID < 1432081
select #Names
You can do it with 2 minor changes to your current code, but I suspect this is an XYProblem, and you might benefit more from editing your question to include sample data and desired results (so perhaps we can suggest a better solution).
Anyway, what I had in mind is this:
declare #Names varchar(max), -- remove the = '', so that #Names starts as null
#Key varchar(max) = 'ABC'
select #Names = COALESCE(#Names, '') + isnull(T0.A, #Key) + ', '
from TData P
left join TNames T0 on T0.C + '\' + T0.D = P.#Key -- ' (This is just to fix the coding colors)
where OBID=581464
and ID < 1432081
select COALESCE(#Names, 'default value') -- since #Names started as null, and the query did not return any results, it's still null...

deleting second comma in data

Ok so I have a table called PEOPLE that has a name column. In the name column is a name, but its totally a mess. For some reason its not listed such as last, first middle. It's sitting like last,first,middle and last first (and middle if there) are separated by a comma.. two commas if the person has a middle name.
example:
smith,steve
smith,steve,j
smith,ryan,tom
I'd like the second comma taken away (for parsing reason ) spaces put after existing first comma so the above would come out looking like:
smith, steve
smith, steve j
smith, ryan tom
Ultimately I'd like to be able to parse the names into first, middle, and last name fields, but that's for another post :_0. I appreciate any help.
thank you.
Drop table T1;
Create table T1(Name varchar(100));
Insert T1 Values
('smith,steve'),
('smith,steve,j'),
('smith,ryan,tom');
UPDATE T1
SET Name=
CASE CHARINDEX(',',name, CHARINDEX(',',name)+1) WHEN
0 THEN Name
ELSE
LEFT(name,CHARINDEX(',',name, CHARINDEX(',',name)+1)-1)+' ' +
RIGHT(name,LEN(Name)-CHARINDEX(',',name, CHARINDEX(',',name)+1))
END
Select * from T1
This seems to work. Not the most concise but avoids cursors.
DECLARE #people TABLE (name varchar(50))
INSERT INTO #people
SELECT 'smith,steve'
UNION
SELECT 'smith,steve,j'
UNION
SELECT 'smith,ryan,tom'
UNION
SELECT 'commaless'
SELECT name,
CASE
WHEN CHARINDEX(',',name) > 0 THEN
CASE
WHEN CHARINDEX(',',name,CHARINDEX(',',name) + 1) > 0 THEN
STUFF(STUFF(name, CHARINDEX(',',name,CHARINDEX(',',name) + 1), 1, ' '),CHARINDEX(',',name),1,', ')
ELSE
STUFF(name,CHARINDEX(',',name),1,', ')
END
ELSE name
END AS name2
FROM #people
Using a table function to split apart the names with a delimiter and for XML Path to stitch them back together, we can get what you're looking for! Hope this helps!
Declare #People table(FullName varchar(200))
Insert Into #People Values ('smith,steve')
Insert Into #People Values ('smith,steve,j')
Insert Into #People Values ('smith,ryan,tom')
Insert Into #People Values ('smith,john,joseph Jr')
Select p.*,stuff(fn.FullName,1,2,'') as ModifiedFullName
From #People p
Cross Apply (
select
Case When np.posID<=2 Then ', ' Else ' ' End+np.Val
From #People n
Cross Apply Custom.SplitValues(n.FullName,',') np
Where n.FullName=p.FullName
For XML Path('')
) fn(FullName)
Output:
ModifiedFullName
smith, steve
smith, steve j
smith, ryan tom
smith, john joseph Jr
SplitValues table function definition:
/*
This Function takes a delimited list of values and returns a table containing
each individual value and its position.
*/
CREATE FUNCTION [Custom].[SplitValues]
(
#List varchar(max)
, #Delimiter varchar(1)
)
RETURNS
#ValuesTable table
(
posID int
,val varchar(1000)
)
AS
BEGIN
WITH Cte AS
(
SELECT CAST('<v>' + REPLACE(#List, #Delimiter, '</v><v>') + '</v>' AS XML) AS val
)
INSERT #ValuesTable (posID,val)
SELECT row_number() over(Order By x) as posID, RTRIM(LTRIM(Split.x.value('.', 'VARCHAR(1000)'))) AS val
FROM Cte
CROSS APPLY val.nodes('/v') Split(x)
RETURN
END
GO
String manipulation in SQLServer, outside of writing your own User Defined Function, is limited but you can use the PARSENAME function for your purposes here. It takes a string, splits it on the period character, and returns the segment you specify.
Try this:
DECLARE #name VARCHAR(100) = 'smith,ryan,tom'
SELECT REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 1)) + ', ' +
REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 2)) +
COALESCE(' ' + REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 3)), '')
Result: smith, ryan tom
If you set #name to 'smith,steve' instead, you'll get:
Result: smith, steve
Segment 1 actually gives you the last segment, segment 2 the second to last etc. Hence I've used REVERSE to get the order you want. In the case of 'steve,smith', segment 3 will be null, hence the COALESCE to add an empty string if that is the case. The REPLACE of course changes the commas to periods so that the split will work.
Note that this is a bit of a hack. PARSENAME will not work if there are more than four parts and this will fail if the name happens to contain a period. However if your data conforms to these limitations, hopefully it provides you with a solution.
Caveat: it sounds like your data may be inconsistently formatted. In that case, applying any automated treatment to it is going to be risky. However, you could try:
UPDATE people SET name = REPLACE(name, ',', ' ')
UPDATE people SET name = LEFT(name, CHARINDEX(' ', name)-1)+ ', '
+ RIGHT(name, LEN(name) - CHARINDEX(' ', name)
That'll work for the three examples you give. What it will do to the rest of your set is another question.
Here's an example with CHARINDEX() and SUBSTRING
WITH yourTable
AS
(
SELECT names
FROM
(
VALUES ('smith,steve'),('smith,steve,j'),('smith,ryan,tom')
) A(names)
)
SELECT names AS old,
CASE
WHEN comma > 0
THEN SUBSTRING(spaced_names,0,comma + 1) --before the comma
+ SUBSTRING(spaced_names,comma + 2,1000) --after the comma
ELSE spaced_names
END AS new
FROM yourTable
CROSS APPLY(SELECT CHARINDEX(',',names,CHARINDEX(',',names) + 1),REPLACE(names,',',', ')) AS CA(comma,spaced_names)

Trouble extracting First and LastName from Full Name column

I am having FullName column and I am extracting the First Name and last name using the following query
select SUBSTRING(FULL_NAME, 1, CHARINDEX(' ', FULL_NAME) - 1) AS FirstName,
SUBSTRING(FULL_NAME, CHARINDEX(' ', FULL_NAME) + 1, 500) AS LastName
from [dbo].[TABLE]
But in the Full Name column there are just First names, some 10 digit phone numbers, 4 digit extensions and some text like 'this is a special case'.
How should I modify my query to accommodate these exceptions? And also when there are only single words in the Full Name column I am getting this following error message:
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Parsing good names from free form fields is not an easy task...
I would suggest a dual approach.
Identify common patterns, i.e. you might find phone number with something like this
Where IsNumeric( Replace(Field,'-','')=1
and you might identify single names with
Where charindex(' ',trim(field))=0
etc.
Once you've identified them, the write code to attempt to split them...
So you might use the code you have above with the following WHERE clause
select SUBSTRING(FULL_NAME, 1, CHARINDEX(' ', FULL_NAME) - 1) AS FirstName,
SUBSTRING(PRQ_BP_CONTACT_NAME, CHARINDEX(' ', FULL_NAME) + 1, 500)
AS LastN
from [dbo].[TABLE]
where charindex(' ',trim(field))>0 and Where IsNumeric( Replace(Field,'-','')=0
Use the WHERE clauses to (a) make sure you only get records you can parse and (b) help identify the oddball cases you'll like need to do by hand...
Good luck
You could go with a function this allows you to put in any logic you need in the transform and keep things a bit more readable :
create function dbo.namepart( #fullname varchar(50), #part varchar(5))
returns varchar(10)
as
begin
declare #first varchar(50)
declare #last varchar(50)
declare #sp int
if #fullname like '%special value%' return ''
if #fullname like '% %'
begin
set #sp = CHARINDEX(' ', #fullname)
set #first = left(#fullname, #sp - 1)
set #last = substring(#fullname,#sp + 1 ,50)
if isnumeric(#last) <> 0 set #last = ''
end
else
begin
set #first = #fullname
set #last = ''
end
if #part like 'f%'
return #first
else
return #last
return ''
end
Sample data
create table blah(
full_name varchar(50)
)
insert into blah values ( 'john smith' ), ('paul 12345'),('evan'),('special value')
And see if it works
select
dbo.namepart(full_name,'first') first,
dbo.namepart(full_name,'last') last,
full_name
from blah
http://sqlfiddle.com/#!6/eb28f/2

Combination of 'LIKE' and 'IN' using t-sql

How can I do this kind of selection:
SELECT *
FROM Street
WHERE StreetName LIKE IN ('% Main Street', 'foo %')
Please don't tell me that I can use OR because these actually comes from a query.
There is no combined LIKE and IN syntax but you can use LIKE to JOIN onto your query as below.
;WITH Query(Result) As
(
SELECT '% Main Street' UNION ALL
SELECT 'foo %'
)
SELECT DISTINCT s.*
FROM Street s
JOIN Query q ON StreetName LIKE q.Result
Or to use your example in the comments
SELECT DISTINCT s.*
FROM Street s
JOIN CarStreets cs ON s.StreetName LIKE cs.name + '%'
WHERE cs.Streets = 'offroad'
You don't have a lot of choices here.
SELECT * FROM Street Where StreetName LIKE '% Main Street' OR StreetName LIKE 'foo %'
If this is part of an existing, more complicated query (which is the impression I'm getting), you could create a table value function that does the checking for you.
SELECT * FROM Street Where StreetName IN (dbo.FindStreetNameFunction('% Main Street|foo %'))
I'd recommend using the simplest solution (the first). If this is nested inside a larger, more complicated query, post it and we'll take a look.
I had a similar conundrum but due to only needing to match the start of a string, I changed my 'like' to SUBSTRING as such:
SELECT *
FROM codes
WHERE SUBSTRING(code, 1, 12) IN ('012316963429', '012315667849')
You can resort to Dynamic SQL and wrapping up all in a stored procedure.
If you get the LIKE IN param in a string as tokens with a certain separator, like
'% Main Street,foo %,Another%Street'
first you need to create a function that receives a list of LIKE "tokens" and returns a table of them.
CREATE FUNCTION [dbo].[SplitList]
(
#list nvarchar(MAX),
#delim nvarchar(5)
)
RETURNS #splitTable table
(
value nvarchar(50)
)
AS BEGIN
While (Charindex(#delim, #list)>0) Begin
Insert Into #splitTable (value)
Select ltrim(rtrim(Substring(#list, 1, Charindex(#delim, #list)-1)))
Set #list = Substring(#list, Charindex(#delim, #list)+len(#delim), len(#list))
End
Insert Into #splitTable (value) Select ltrim(rtrim(#list))
Return
END
Then in the SP you have the following code
declare
#sql nvarchar(MAX),
#subWhere nvarchar(MAX)
#params nvarchar(MAX)
-- prepare the where sub-clause to cover LIKE IN (...)
-- it will actually generate where sub clause StreetName Like option1 or StreetName Like option2 or ...
set #subWhere = STUFF(
(
--(**)
SELECT ' OR StreetName like ''' + value + '''' FROM SplitList('% Main Street,foo %,Another%Street', ',')
FOR XML PATH('')
), 1, 4, '')
-- create the dynamic SQL
set #sql ='select * from [Street]
where
(' + #subWhere + ')
-- and any additional query params here, if needed, like
AND StreetMinHouseNumber = #minHouseNumber
AND StreetNumberOfHouses between (#minNumberOfHouses and #maxNumberOfHouses)'
set #params = ' #minHouseNumber nvarchar(5),
#minNumberOfHouses int,
#minNumberOfHouses int'
EXECUTE sp_executesql #sql, #params,
#minHouseNumber,
#minNumberOfHouses,
#minNumberOfHouses
Of course, if you have your LIKE IN parameters in another table or you gather it through a query, you can replace that in line (**)
I believe I can clarify what he is looking for, but I don't know the answer. I'll use my situation to demonstrate. I have a table with a column called "Query" that holds SQL queries. These queries sometimes contain table names from one of my databases. I need to find all Query rows that contain table names from a particular database. So, I can use the following code to get the table names:
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
I'm trying to use a WHERE IN clause to identify the Query rows that contain the table names I'm interested in:
SELECT *
FROM [DatasourceQuery]
WHERE Query IN LIKE
(
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
)
I believe the OP is trying to do something like that.
This is my way:
First create a table function:
create function [splitDelimeter](#str nvarchar(max), #delimeter nvarchar(10)='*')
returns #r table(val nvarchar(max))
as
begin
declare #x nvarchar(max)=#str
set #x='<m>'+replace(#x, #delimeter, '</m><m>')+'</m>'
declare #xx xml=cast(#x as xml)
insert #r(val)
SELECT Tbl.Col.value('.', 'nvarchar(max)') id
FROM #xx.nodes('/m') Tbl(Col)
return
end
Then split the search text with your preference delimeter. After that you can do your select with left join as below:
declare #s nvarchar(max)='% Main Street*foo %'
select a.* from street a
left join gen.splitDelimeter(#s, '*') b
on a.streetname like b.val
where val is not null
What I did when solving a similar problem was:
SELECT DISTINCT S.*
FROM Street AS S
JOIN (SELECT value FROM String_Split('% Main Street,foo %', N',')) T
ON S.StreetName LIKE T.value;
Which is functionally similar to Martin's answer but a more direct answer to the question.
Note: DISTINCT is used because you might get multiple matches for a single row.