SQL: Splitting a column into multiple words to search user input - sql

I want to compare the individual words from the user input to individual words from a column in my table.
For example, consider these rows in my table:
ID Name
1 Jack Nicholson
2 Henry Jack Blueberry
3 Pontiac Riddleson Jack
Consider that the user's input is 'Pontiac Jack'. I want to assign weights/ranks for each match, so I can't use a blanket LIKE (WHERE Name LIKE #SearchString).
If Pontiac is present in any row, I want to award it 10 points. Each match for Jack gets another 10 points, etc. So row 3 would get 20 points, and rows 1 and 2 get 10.
I have split the user input into individual words, and stored them into a temporary table #SearchWords(Word).
But I can't figure out a way to have a SELECT statement that allows me to combine this. Maybe I'm going about this the wrong way?
Cheers,
WT

For SQL Server, try this:
SELECT Word, COUNT(Word) * 10 AS WordCount
FROM SourceTable
INNER JOIN SearchWords ON CHARINDEX(SearchWords.Word, SourceTable.Name) > 0
GROUP BY Word

What about this? (this is MySQL syntax, I think you only have to replace the CONCAT and do it with +)
SELECT names.id, count(searchwords.word) FROM names, searchwords WHERE names.name LIKE CONCAT('%', searchwords.word, '%') GROUP BY names.id
Then you would have a SQL result with the ID of the names-table and count of the words that match to that id.

You could do it via a common table expression that works out the weighting. For example:
--** Set up the example tables and data
DECLARE #Name TABLE (id INT IDENTITY, name VARCHAR(50));
DECLARE #SearchWords TABLE (word VARCHAR(50));
INSERT INTO #Name
(name)
VALUES ('Jack Nicholson')
,('Henry Jack Blueberry')
,('Pontiac Riddleson Jack')
,('Fred Bloggs');
INSERT INTO #SearchWords
(word)
VALUES ('Jack')
,('Pontiac');
--** Example SELECT with #Name selected and ordered by words in #SearchWords
WITH Order_CTE (weighting, id)
AS (
SELECT COUNT(*) AS weighting
, id
FROM #Name AS n
JOIN #SearchWords AS sw
ON n.name LIKE '%' + sw.word + '%'
GROUP BY id
)
SELECT n.name
, cte.weighting
FROM #Name AS n
JOIN Order_CTE AS cte
ON n.id = cte.id
ORDER BY cte.weighting DESC;
Using this technique, you can also apply a value to each search word if you wanted to. So you could make Jack more valueable than Pontiac. This would look something like this:
--** Set up the example tables and data
DECLARE #Name TABLE (id INT IDENTITY, name VARCHAR(50));
DECLARE #SearchWords TABLE (word VARCHAR(50), value INT);
INSERT INTO #Name
(name)
VALUES ('Jack Nicholson')
,('Henry Jack Blueberry')
,('Pontiac Riddleson Jack')
,('Fred Bloggs');
--** Set up search words with associated value
INSERT INTO #SearchWords
(word, value)
VALUES ('Jack',10)
,('Pontiac',20)
,('Bloggs',40);
--** Example SELECT with #Name selected and ordered by words and values in #SearchWords
WITH Order_CTE (weighting, id)
AS (
SELECT SUM(sw.value) AS weighting
, id
FROM #Name AS n
JOIN #SearchWords AS sw
ON n.name LIKE '%' + sw.word + '%'
GROUP BY id
)
SELECT n.name
, cte.weighting
FROM #Name AS n
JOIN Order_CTE AS cte
ON n.id = cte.id
ORDER BY cte.weighting DESC;

Seems to me that the best thing to do would be to maintain a separate table with all the individual words. Eg:
ID Word FK_ID
1 Jack 1
2 Nicholson 1
3 Henry 2
(etc)
This table would be kept up to date with triggers, and you'd have a non-clustered index on 'Word', 'FK_ID'. Then the SQL to produce your weightings would be simple and efficient.

How about something like this....
Select id, MAX(names.name), count(id)*10 from names
inner join #SearchWords as sw on
names.name like '%'+sw.word+'%'
group by id
assuming that table with names called "names".

Related

SQL-using if exist in matching two columns in table

im trying to use keywords like detergent, soap, dish etc to match two column in my sql table, if the keywords find match in two column, i want to have another column saying its a matched. i am planning to use the if exist but i do not know the proper syntax.
sample column:
Column1 Column2
-----------------------------------------------
detergent powder all powder detergent
dish washing liquid dish liquid for washing
hand soap hand liquid soap
Here is the simplest solution to your question. The trick is in the "virtual" column, aliased as Match, that we create in the select statement. This column is computed using a case statement to see if the search term appears in both of the columns. Note we need to use the like statement with wildcard operators %.
create table Example (Column1 varchar(max), Column2 varchar(max));
insert into Example select 'detergent powder', 'all powder detergent';
insert into Example select 'dish washing liquid', 'dish liquid for washing' ;
insert into Example select 'hand soap', 'hand liquid soap';
declare #search varchar(20) = 'detergent';
select Column1,
Column2,
case when Column1 like '%' + #search + '%' and
Column2 like '%' + #search + '%'
then 'matched'
else 'not matched' end as [Match]
from Example;
We could also create the Match column as a "real" column in the table and modify this script slightly to update that column based on the same criteria.
Here's an example that checks if any of the 3 words appears in both columns.
Sample data:
CREATE TABLE Test (
Id INT IDENTITY(1,1) PRIMARY KEY,
Col1 VARCHAR(100),
Col2 VARCHAR(100)
);
INSERT INTO Test (Col1, Col2) VALUES
('detergent powder', 'all powder detergent'),
('dish washing liquid', 'dish liquid for washing'),
('hand soap', 'hand liquid soap'),
('soap dish', 'detergent');
Query:
SELECT t.*
, cast(
case
when exists (
select 1
from (values ('soap'),('detergent'),('dish')) s(search)
join (values (Col1),(Col2)) c(col)
on c.col like '%'+s.search+'%'
group by s.search
having count(*) = 2
) then 1 else 0 end as bit) as hasMatch
FROM Test t;
An EXISTS checks if there's at least 1 result from a query.
And the HAVING clause makes sure that 2 matches per search words are needed.
But it can also be done without that GROUP BY & HAVING clause:
SELECT t.*
, cast(case when exists (
select 1
from (values ('soap'),('detergent'),('dish')) s(search)
where Col1 like '%'+s.search+'%'
and Col2 like '%'+s.search+'%'
) then 1 else 0 end as bit) as hasMatch
FROM Test t;
A test on rextester here

What is the best way to join between two table which have coma seperated columns

Table1
ID Name Tags
----------------------------------
1 Customer1 Tag1,Tag5,Tag4
2 Customer2 Tag2,Tag6,Tag4,Tag11
3 Customer5 Tag6,Tag5,Tag10
and Table2
ID Name Tags
----------------------------------
1 Product1 Tag1,Tag10,Tag6
2 Product2 Tag2,Tag1,Tag5
3 Product5 Tag1,Tag2,Tag3
what is the best way to join Table1 and Table2 with Tags column?
It should look at the tags column which coma seperated on table 2 for each coma seperated tag on the tags column in the table 1
Note: Tables are not full-text indexed.
The best way is not to have comma separated values in a column. Just use normalized data and you won't have trouble with querying like this - each column is supposed to only have one value.
Without this, there's no way to use any indices, really. Even a full-text index behaves quite different from what you might thing, and they are inherently clunky to use - they're designed for searching for text, not meaningful data. In the end, you will not get much better than something like
where (Col like 'txt,%' or Col like '%,txt' or Col like '%,txt,%')
Using a xml column might be another alternative, though it's still quite a bit silly. It would allow you to treat the values as a collection at least, though.
I don't think there will ever be an easy and efficient solution to this. As Luaan pointed out, it is a very bad idea to store data like this : you lose most of the power of SQL when you squeeze what should be individual units of data into a single cell.
But you can manage this at the slight cost of creating two user-defined functions. First, use this brilliant recursive technique to split the strings into individual rows based on your delimiter :
CREATE FUNCTION dbo.TestSplit (#sep char(1), #s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn AS SplitIndex,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS SplitPart
FROM Pieces
)
Then, make a function that takes two strings and counts the matches :
CREATE FUNCTION dbo.MatchTags (#a varchar(512), #b varchar(512))
RETURNS INT
AS
BEGIN
RETURN
(SELECT COUNT(*)
FROM dbo.TestSplit(',', #a) a
INNER JOIN dbo.TestSplit(',', #b) b
ON a.SplitPart = b.SplitPart)
END
And that's it, here is a test roll with table variables :
DECLARE #A TABLE (Name VARCHAR(20), Tags VARCHAR(100))
DECLARE #B TABLE (Name VARCHAR(20), Tags VARCHAR(100))
INSERT INTO #A ( Name, Tags )
VALUES
( 'Customer1','Tag1,Tag5,Tag4'),
( 'Customer2','Tag2,Tag6,Tag4,Tag11'),
( 'Customer5','Tag6,Tag5,Tag10')
INSERT INTO #B ( Name, Tags )
VALUES
( 'Product1','Tag1,Tag10,Tag6'),
( 'Product2','Tag2,Tag1,Tag5'),
( 'Product5','Tag1,Tag2,Tag3')
SELECT * FROM #A a
INNER JOIN #B b ON dbo.MatchTags(a.Tags, b.Tags) > 0
I developed a solution as follows:
CREATE TABLE [dbo].[Table1](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Table2](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
get sample data for Table1, it will insert 28000 records
INSERT INTO Table1
SELECT CustomerID,CompanyName, (FirstName + ',' + LastName)
FROM AdventureWorks.SalesLT.Customer
GO 3
sample data for Table2.. i need same tags for Table2
declare #tag1 nvarchar(50) = 'Donna,Carreras'
declare #tag2 nvarchar(50) = 'Johnny,Caprio'
get sample data for Table2, it will insert 9735 records
INSERT INTO Table2
SELECT ProductID,Name, (case when(right(ProductID,1)>=5) then #tag1 else #tag2 end)
FROM AdventureWorks.SalesLT.Product
GO 3
My Solution
create TABLE #dt (
Id int IDENTITY(1,1) PRIMARY KEY,
Tag nvarchar(250) NOT NULL
);
I've create temp table and i will fill with Distinct Tag-s in Table1
insert into #dt(Tag)
SELECT distinct Tag
FROM Table1
Now i need to vertical table for tags
create TABLE #Tags ( Tag nvarchar(250) NOT NULL );
Now i'am fill #Tags table with While, you can use Cursor but while is faster
declare #Rows int = 1
declare #Tag nvarchar(1024)
declare #Id int = 0
WHILE #Rows>0
BEGIN
Select Top 1 #Tag=Tag,#Id=Id from #dt where Id>#Id
set #Rows =##RowCount
if #Rows>0
begin
insert into #Tags(Tag) SELECT Data FROM dbo.StringToTable(#Tag, ',')
end
END
last step : join Table2 with #Tags
select distinct t.*
from Table2 t
inner join #Tags on (',' + t.Tag + ',') like ('%,' + #Tags.Tag + ',%')
Table rowcount= 28000 Table2 rowcount=9735 select is less than 2 second
I use this kind of solution with paths of trees. First put a comma at the very begin and at the very end of the string. Than you can call
Where col1 like '%,' || col2 || ',%'
Some database index the column also for the like(postgres do it partially), therefore is also efficient. I don't know sqlserver.

find substrings in sql table using sql query

I have a column of "name" in my sql table. In my sql query i want to fetch all the records where column "name" is substring of my input string.
For exapmle, user enters "My name is Davivd", then I want to fetch all the records where name is David.
P.S: User may enters something like this "Its David here".
Anyone who knows please let me know. Thanku
A simple view of this would be:
DECLARE #x VARCHAR(255)
SET #x = 'My name is David'
SELECT a FROM tablex WHERE #x LIKE '%' + tablex.name + '%'
This reverses #shortspider's response.
Your question is a bit unclear, but if you have a name column you can find it as part of any string using the CHARINDEX function:
Example:
DECLARE #TABLE TABLE (ID INT IDENTITY(1,1), NAME VARCHAR(100))
INSERT INTO #TABLE(NAME)
SELECT 'DAVID' UNION ALL
SELECT 'GOLIATH' UNION ALL
SELECT 'DAVE' UNION ALL
SELECT 'MARTIN'
SELECT *
FROM #TABLE
WHERE CHARINDEX(NAME,'DID YOU EVER READ THE STORY ABOUT DAVID AND GOLIATH?') > 0
SELECT *
FROM #TABLE
WHERE CHARINDEX(NAME,'MY FAVOURITE MOVIE DIRECTOR IS MARTIN SCORCESE. I LOVE HIS GLASSES.') > 0
Try: SELECT * FROM table_name WHERE name LIKE '%David%'

How to compare two sub queries in one sql statement

I have a table tbl_Country, which contains columns called ID and Name. The Name column has multiple country names separated by comma, I want the id when I pass multiple country names to compare with Name column values. I am splitting the country names using a function - the sample query looks like this:
#country varchar(50)
SELECT *
FROM tbl_Country
WHERE (SELECT *
FROM Function(#Country)) IN (SELECT *
FROM Function(Name))
tbl_country
ID Name
1 'IN,US,UK,SL,NZ'
2 'IN,PK,SA'
3 'CH,JP'
parameter #country ='IN,SA'
i have to get
ID
1
2
NOTE: The Function will split the string into a datatable
Try this
SELECT * FROM tbl_Country C
LEFT JOIN tbl_Country C1 ON C1.Name=C.Country
Try this:
SELECT *
FROM tbl_Country C
WHERE ',' + #country + ',' LIKE '%,' + C.Name + ',%';
Basically, by specifying multiple values in a single column, you are violating the 1st NF. Therefore, the following might not be a good approach but provides the solution that you are looking for:
declare #country varchar(50)= 'IN,SA'
declare #counterend int
declare #counterstart int =1
declare #singleCountry varchar(10)
set #counterend = (select COUNT(*) from fnSplitStringList(#country))
create table #temp10(
id int
,name varchar(50))
while #counterstart<= #counterend
begin
;with cte as (
select stringliteral country
, ROW_NUMBER() over (order by stringliteral) countryseq
from fnSplitStringList(#country))
select #singleCountry = (select country FROM cte where countryseq=#counterstart)
insert into #temp10(id, name)
select * from tbl_country t1
where not exists (select id from #temp10 t2 where t1.id=t2.id)
and name like '%' + #singleCountry +'%'
set #counterstart= #counterstart+1
end
select * from #temp10
begin drop table #temp10 end
How it works: It splits the passed string and ranks it. Afterwards, it loops through all the records for every single Value(country) produced and inserts them into temptable.
try this,
select a.id FROM tbl_Country a inner join
(SELECT country FROM dbo.Function(#Country)) b on a.name=b.country

Update specific columns in a table iteratively (Do a bulk update)

My Table Schema is as follows:
Gender: char(1), not null
Last Name: varchar(25), null
First Name: varhcar(35), not null
The data in the table looks like:
Gender | Last Name | First Name |
M Doe John
F Marie Jane
M Jones Jameson
F Simpson Alice
I now am trying to update all the names in the table from the names present in the txt file.
My Query is as follows:
-- Sort out the Forenames we'll be using for the data, we make a #Name2 table because I have yet to figure our
-- inserting specific columns using BULK INSERT and without using a format file.
CREATE TABLE #Name (Name VARCHAR(50))
CREATE TABLE #ForeNames (FirstName VARCHAR(50), Gender VARCHAR(1))
-- Move data in the #Name2 table
BULK INSERT #Name FROM "c:\girlsforenames.txt" WITH (ROWTERMINATOR='\n')
-- Now move it to the forename table and add the gender
INSERT INTO #ForeNames SELECT [Name], 'F' FROM #Name
-- Delete the names from temporary table
TRUNCATE TABLE #Name
-- Same for the boys
BULK INSERT #Name FROM "c:\boysforenames.txt" WITH (ROWTERMINATOR='\n')
INSERT INTO #ForeNames SELECT [Name], 'M' FROM #Name
-- Now do the surnames
TRUNCATE TABLE #Name
BULK INSERT #Name FROM "c:\surnames.txt" WITH (ROWTERMINATOR='\n')
DECLARE #Counter BIGINT
SET #Counter = 4
WHILE (#Counter > 0)
BEGIN
UPDATE TableName
set
[last_name]= (SELECT TOP 1 FirstName from #ForeNames),
[first_name]=(SELECT TOP 1 Name FROM #Name ORDER BY NEWID()),
[gender]= ( SELECT TOP 1 Gender FROM #ForeNames ORDER BY NEWID());
SET #Counter=#Counter-1
END
DROP TABLE #Name
DROP TABLE #ForeNames
SELECT * FROM TableName
What Happens is all the rows in the table are updated with the same values and each time i execute the query they are updated with the new set of values.
What I want is to loop through each row and update it and den update the next row with the other set of random name. But here it is updating the same random name across all the rows of the table.
Any help would be appreciated.
Each SELECT statement is only being executed once in your example (and thus returning 1 result), and since your UPDATE isn't being limited, you're applying the same value to every row.
If you want to update each row with different values, you can use a CTE and the ROW_NUMBER() function to update rows at a time.
There's no need to loop, you can do it in one fell swoop:
WITH cte AS (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS n1
FROM TableName
)
UPDATE cte
SET FirstName = names.Name
FROM cte
JOIN (SELECT *,ROW_NUMBER() OVER (ORDER BY NEWID()) AS n2
FROM #name
)names
on cte.n1 = names.n2
Demo: SQL Fiddle
This example is just for the FirstName.