What is the best way to join between two table which have coma seperated columns - sql

Table1
ID Name Tags
----------------------------------
1 Customer1 Tag1,Tag5,Tag4
2 Customer2 Tag2,Tag6,Tag4,Tag11
3 Customer5 Tag6,Tag5,Tag10
and Table2
ID Name Tags
----------------------------------
1 Product1 Tag1,Tag10,Tag6
2 Product2 Tag2,Tag1,Tag5
3 Product5 Tag1,Tag2,Tag3
what is the best way to join Table1 and Table2 with Tags column?
It should look at the tags column which coma seperated on table 2 for each coma seperated tag on the tags column in the table 1
Note: Tables are not full-text indexed.

The best way is not to have comma separated values in a column. Just use normalized data and you won't have trouble with querying like this - each column is supposed to only have one value.
Without this, there's no way to use any indices, really. Even a full-text index behaves quite different from what you might thing, and they are inherently clunky to use - they're designed for searching for text, not meaningful data. In the end, you will not get much better than something like
where (Col like 'txt,%' or Col like '%,txt' or Col like '%,txt,%')
Using a xml column might be another alternative, though it's still quite a bit silly. It would allow you to treat the values as a collection at least, though.

I don't think there will ever be an easy and efficient solution to this. As Luaan pointed out, it is a very bad idea to store data like this : you lose most of the power of SQL when you squeeze what should be individual units of data into a single cell.
But you can manage this at the slight cost of creating two user-defined functions. First, use this brilliant recursive technique to split the strings into individual rows based on your delimiter :
CREATE FUNCTION dbo.TestSplit (#sep char(1), #s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn AS SplitIndex,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS SplitPart
FROM Pieces
)
Then, make a function that takes two strings and counts the matches :
CREATE FUNCTION dbo.MatchTags (#a varchar(512), #b varchar(512))
RETURNS INT
AS
BEGIN
RETURN
(SELECT COUNT(*)
FROM dbo.TestSplit(',', #a) a
INNER JOIN dbo.TestSplit(',', #b) b
ON a.SplitPart = b.SplitPart)
END
And that's it, here is a test roll with table variables :
DECLARE #A TABLE (Name VARCHAR(20), Tags VARCHAR(100))
DECLARE #B TABLE (Name VARCHAR(20), Tags VARCHAR(100))
INSERT INTO #A ( Name, Tags )
VALUES
( 'Customer1','Tag1,Tag5,Tag4'),
( 'Customer2','Tag2,Tag6,Tag4,Tag11'),
( 'Customer5','Tag6,Tag5,Tag10')
INSERT INTO #B ( Name, Tags )
VALUES
( 'Product1','Tag1,Tag10,Tag6'),
( 'Product2','Tag2,Tag1,Tag5'),
( 'Product5','Tag1,Tag2,Tag3')
SELECT * FROM #A a
INNER JOIN #B b ON dbo.MatchTags(a.Tags, b.Tags) > 0

I developed a solution as follows:
CREATE TABLE [dbo].[Table1](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Table2](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
get sample data for Table1, it will insert 28000 records
INSERT INTO Table1
SELECT CustomerID,CompanyName, (FirstName + ',' + LastName)
FROM AdventureWorks.SalesLT.Customer
GO 3
sample data for Table2.. i need same tags for Table2
declare #tag1 nvarchar(50) = 'Donna,Carreras'
declare #tag2 nvarchar(50) = 'Johnny,Caprio'
get sample data for Table2, it will insert 9735 records
INSERT INTO Table2
SELECT ProductID,Name, (case when(right(ProductID,1)>=5) then #tag1 else #tag2 end)
FROM AdventureWorks.SalesLT.Product
GO 3
My Solution
create TABLE #dt (
Id int IDENTITY(1,1) PRIMARY KEY,
Tag nvarchar(250) NOT NULL
);
I've create temp table and i will fill with Distinct Tag-s in Table1
insert into #dt(Tag)
SELECT distinct Tag
FROM Table1
Now i need to vertical table for tags
create TABLE #Tags ( Tag nvarchar(250) NOT NULL );
Now i'am fill #Tags table with While, you can use Cursor but while is faster
declare #Rows int = 1
declare #Tag nvarchar(1024)
declare #Id int = 0
WHILE #Rows>0
BEGIN
Select Top 1 #Tag=Tag,#Id=Id from #dt where Id>#Id
set #Rows =##RowCount
if #Rows>0
begin
insert into #Tags(Tag) SELECT Data FROM dbo.StringToTable(#Tag, ',')
end
END
last step : join Table2 with #Tags
select distinct t.*
from Table2 t
inner join #Tags on (',' + t.Tag + ',') like ('%,' + #Tags.Tag + ',%')
Table rowcount= 28000 Table2 rowcount=9735 select is less than 2 second

I use this kind of solution with paths of trees. First put a comma at the very begin and at the very end of the string. Than you can call
Where col1 like '%,' || col2 || ',%'
Some database index the column also for the like(postgres do it partially), therefore is also efficient. I don't know sqlserver.

Related

Check if a temp table exists when I only know part of the name?

I have a function for checking if certain tables exist in my database, using part of the table name as a key to match (my table naming conventions include unique table name prefixes). It uses a select statement as below, where #TablePrefix is a parameter to the function and contains the first few characters of the table name:
DECLARE #R bit;
SELECT #R = COUNT(X.X)
FROM (
SELECT TOP(1) 1 X FROM sys.tables WHERE [name] LIKE #TablePrefix + '%'
) AS X;
RETURN #R;
My question is, how can I extend this function to work for #temp tables too?
I have tried checking the first char of the name for # then using the same logic to select from tempdb.sys.tables, but this seems to have a fatal flaw - it returns a positive result when any temp table exists with a matching name, even if not created by the current session - and even if created by SPs in a different database. There does not seem to be any straightforward way to narrow the selection down to only those temp tables that exist in the context of the current session.
I cannot use the other method that seems universally to be suggested for checking temp tables - IF OBJECT('tempdb..#temp1') IS NOT NULL - because that requires me to know the full name of the table, not just a prefix.
create table #abc(id bit);
create table #abc_(id bit);
create table #def__(id bit);
create table #xyz___________(id bit);
go
select distinct (left(t.name, n.r)) as tblname
from tempdb.sys.tables as t with(nolock)
cross join (select top(116) row_number() over(order by(select null)) as r from sys.all_objects with(nolock)) as n
where t.name like '#%'
and object_id('tempdb..'+left(t.name, n.r)) is not null;
drop table #abc;
drop table #abc_;
drop table #def__;
drop table #xyz___________;
Try something like this:
DECLARE #TablePrefix VARCHAR(50) = '#temp';
DECLARE #R BIT, #pre VARCHAR(50) = #TablePrefix + '%';
SELECT #R = CASE LEFT ( #pre, 1 )
WHEN '#' THEN (
SELECT CASE WHEN EXISTS ( SELECT * FROM tempdb.sys.tables WHERE [name] LIKE #pre ) THEN 1
ELSE 0
END )
ELSE (
SELECT CASE WHEN EXISTS ( SELECT * FROM sys.tables WHERE [name] LIKE #pre ) THEN 1
ELSE 0
END )
END;
SELECT #R AS TableExists;

Checking existence of all words words of a column of table 1 in other column of table 2

I have a table which contains product_name field. Then another table with models.
===products
product_id, product_name
===models
model_id, model_name
I am looking for a way to do the following.
Model names can have words separated by hyphen i.e JVC-600-BLACK
For each model I need to check the existence of each words of model in product name.
I'll need result in some where like below.
== results
model_id, product_id
If someone can point me in right direction, that would be a great help.
Notes
These are huge tables with about millions of records and number of
words in model_name are not fixed.
words in model may exist in any order or in between or other words in product name
Here's a function that splits the first string into parts using - as a delimiter and looks up each part in the second string, returning 1 if all parts were found and 0 otherwise.
CREATE FUNCTION dbo.func(#str1 varchar(max), #str2 varchar(max))
RETURNS BIT
AS
BEGIN
DECLARE #pos INT, #newPos INT,
#delimiter NCHAR(1)
SET #delimiter = '-'
SET #pos = 1
SET #newPos = 0
WHILE (#newPos < LEN(#str1))
BEGIN
SET #newPos = CHARINDEX(#delimiter, #str1, #pos)
IF #newPos = 0
SET #newPos = LEN(#str1)+1
DECLARE #data2 NVARCHAR(MAX)
SET #data2 = SUBSTRING(#str1, #pos, #newPos-#pos)
IF CHARINDEX(#data2, #str2) = 0
RETURN 0
SET #pos = #newPos + 1
IF #newPos = 0
BREAK
END
RETURN 1
END
You can use the above function for your problem as follows:
SELECT model_id, product_id
FROM models
JOIN products
ON dbo.func(models.model_name, products.product_name) = 1
It's not going to be fast, but I don't think a fast solution exists, since your problem doesn't allow for indexing. It may be possible to change the database structure to allow for this, but how exactly this can be done largely depends on what your data looks like.
I don't know if this solution is faster, for you to check if you care:
--=======================
-- sample data
-- ======================
declare #Products table
(
product_id int,
product_name nvarchar(max)
)
insert into #Products select 1, 'sdfsd def1 abc1klm1 sdljkfd'
insert into #Products select 2, 'sdfsd def2 abc2klm2 sdljkfd'
insert into #Products select 3, 'sdfsd def3 abc3klm3 sdljkfd'
declare #Models table
(
model_id int,
model_name nvarchar(max)
)
insert into #Models select 1, 'abc1-def1-klm1'
insert into #Models select 2, 'abc2-def2-klm2'
insert into #Models select 3, 'abc3-def3-klm3'
--=======================
-- solution
-- ======================
select t1.product_id, t2.model_id from #Products t1
cross join (
select
t1.model_id, Word = t2.r.value('.', 'nvarchar(max)')
from (select model_id, x = cast('<e>' + replace(model_name, '-', '</e><e>') + '</e>' as xml) from #Models ) t1
cross apply x.nodes('e') as t2 (r)
) t2
group by product_id, model_id
having min(charindex(word, product_name)) != 0
You may want to consider using the Full Text Search feature of SQL Server. In a nutshell, it catalogs all of the words (ignoring noise words like "and", "or", "a" and "the" by default but this list of noise worlds is configurable) in the tables and columns you specify when setting up the Full Text Catalog and offers a handful of functions that allow you to utilize that catalog to quickly find rows.

Remove a sentence from a paragraph that has a specific pattern with T-SQL

I have a large number of descriptions that can be anywhere from 5 to 20 sentences each. I am trying to put a script together that will locate and remove a sentence that contains a word with numbers before or after it.
before example: Hello world. Todays department has 345 employees. Have a good day.
after example: Hello world. Have a good day.
My main problem right now is identifying the violation.
Here "345 employees" is what causes the sentence to be removed. However, each description will have a different number and possibly a different variation of the word employee.
I would like to avoid having to create a table of all the different variations of employee.
JTB
This would make a good SQL Puzzle.
Disclaimer: there are probably TONS of edge cases that would blow this up
This would take a string, split it out into a table with a row for each sentence, then remove the rows that matched a condition, and then finally join them all back into a string.
CREATE FUNCTION dbo.fn_SplitRemoveJoin(#Val VARCHAR(2000), #FilterCond VARCHAR(100))
RETURNS VARCHAR(2000)
AS
BEGIN
DECLARE #tbl TABLE (rid INT IDENTITY(1,1), val VARCHAR(2000))
DECLARE #t VARCHAR(2000)
-- Split into table #tbl
WHILE CHARINDEX('.',#Val) > 0
BEGIN
SET #t = LEFT(#Val, CHARINDEX('.', #Val))
INSERT #tbl (val) VALUES (#t)
SET #Val = RIGHT(#Val, LEN(#Val) - LEN(#t))
END
IF (LEN(#Val) > 0)
INSERT #tbl VALUES (#Val)
-- Filter out condition
DELETE FROM #tbl WHERE val LIKE #FilterCond
-- Join back into 1 string
DECLARE #i INT, #rv VARCHAR(2000)
SET #i = 1
WHILE #i <= (SELECT MAX(rid) FROM #tbl)
BEGIN
SELECT #rv = IsNull(#rv,'') + IsNull(val,'') FROM #tbl WHERE rid = #i
SET #i = #i + 1
END
RETURN #rv
END
go
CREATE TABLE #TMP (rid INT IDENTITY(1,1), sentence VARCHAR(2000))
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 345 employees. Have a good day.')
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else')
SELECT
rid, sentence, dbo.fn_SplitRemoveJoin(sentence, '%[0-9] Emp%')
FROM #tmp t
returns
rid | sentence | |
1 | Hello world. Todays department has 345 employees. Have a good day. | Hello world. Have a good day.|
2 | Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else | Hello world. Have a good day. |
I've used the split/remove/join technique as well.
The main points are:
This uses a pair of recursive CTEs, rather than a UDF.
This will work with all English sentence endings: . or ! or ?
This removes whitespace to make the comparison for "digit then employee" so you don't have to worry about multiple spaces and such.
Here's the SqlFiddle demo, and the code:
-- Split descriptions into sentences (could use period, exclamation point, or question mark)
-- Delete any sentences that, without whitespace, are like '%[0-9]employ%'
-- Join sentences back into descriptions
;with Splitter as (
select ID
, ltrim(rtrim(Data)) as Data
, cast(null as varchar(max)) as Sentence
, 0 as SentenceNumber
from Descriptions -- Your table here
union all
select ID
, case when Data like '%[.!?]%' then right(Data, len(Data) - patindex('%[.!?]%', Data)) else null end
, case when Data like '%[.!?]%' then left(Data, patindex('%[.!?]%', Data)) else Data end
, SentenceNumber + 1
from Splitter
where Data is not null
), Joiner as (
select ID
, cast('' as varchar(max)) as Data
, 0 as SentenceNumber
from Splitter
group by ID
union all
select j.ID
, j.Data +
-- Don't want "digit+employ" sentences, remove whitespace to search
case when replace(replace(replace(replace(s.Sentence, char(9), ''), char(10), ''), char(13), ''), char(32), '') like '%[0-9]employ%' then '' else s.Sentence end
, s.SentenceNumber
from Joiner j
join Splitter s on j.ID = s.ID and s.SentenceNumber = j.SentenceNumber + 1
)
-- Final Select
select a.ID, a.Data
from Joiner a
join (
-- Only get max SentenceNumber
select ID, max(SentenceNumber) as SentenceNumber
from Joiner
group by ID
) b on a.ID = b.ID and a.SentenceNumber = b.SentenceNumber
order by a.ID, a.SentenceNumber
One way to do this. Please note that it only works if you have one number in all sentences.
declare #d VARCHAR(1000) = 'Hello world. Todays department has 345 employees. Have a good day.'
declare #dr VARCHAR(1000)
set #dr = REVERSE(#d)
SELECT REVERSE(RIGHT(#dr,LEN(#dr) - CHARINDEX('.',#dr,PATINDEX('%[0-9]%',#dr))))
+ RIGHT(#d,LEN(#d) - CHARINDEX('.',#d,PATINDEX('%[0-9]%',#d)) + 1)

SQL - Joining tables where one of the columns is a list

I'm tryin to join two tables. The problem i'm having is that one of the columns i'm trying to join on is a list.
So is it possible to join two tables using "IN" rather than "=". Along the lines of
SELECT ID
FROM tableA INNER JOIN
tableB ON tableB.misc IN tableA.misc
WHERE tableB.miscTitle = 'help me please'
tableB.misc = 1
tableA.misc = 1,2,3
Thanks in advance
No what you want is not possible without a major workaround. DO NOT STORE ITEMS YOU WANT TO JOIN TO IN A LIST! In fact a comma delimited list should almost never be stored in a database. It is only acceptable if this is note type information that will never need to be used in a query where clasue or join.
If you are stuck with this horrible design, then you will have to parse out the list to a temp table or table variable and then join through that.
Try this:
SELECT ID
FROM tableA INNER JOIN
tableB ON ',' + TableA.misc + ',' like '%,' + cast(tableB.misc as varchar) + ',%'
WHERE tableB.miscTitle = 'help me please'
A string parsing function like the one found here together with a CROSS APPLY should do the trick.
CREATE FUNCTION [dbo].[fnParseStringTSQL] (#string NVARCHAR(MAX),#separator NCHAR(1))
RETURNS #parsedString TABLE (string NVARCHAR(MAX))
AS
BEGIN
DECLARE #position int
SET #position = 1
SET #string = #string + #separator
WHILE charindex(#separator,#string,#position) <> 0
BEGIN
INSERT into #parsedString
SELECT substring(#string, #position, charindex(#separator,#string,#position) - #position)
SET #position = charindex(#separator,#string,#position) + 1
END
RETURN
END
go
declare #tableA table (
id int,
misc char(1)
)
declare #tableB table (
misc varchar(10),
miscTitle varchar(20)
)
insert into #tableA
(id, misc)
values
(1, '1')
insert into #tableB
(misc, miscTitle)
values
('1,2,3','help me please')
select id
from #tableB b
cross apply dbo.fnParseStringTSQL(b.misc,',') p
inner join #tableA a
on a.misc = p.string
where b.miscTitle = 'help me please'
drop function dbo.fnParseStringTSQL
Is ID also in tableB? If so, you can reverse the tables, and run the IN backwards, in the WHERE section, like so:
SELECT ID
FROM tableB
WHERE tableB.miscTitle = 'help me please'
AND tableB.misc IN (SELECT tableA.misc FROM tableA)
If it's not, you could use a cross join to get all combinations of rows between the tables, then remove the rows that don't obey the IN. WARNING: This will become a huge join if the tables are large. Example:
SELECT ID
FROM tableA
CROSS JOIN tableB
WHERE tableB.miscTitle = 'help me please'
AND tableB.misc IN tableA.misc
EDIT: didn't realize "in a list" meant a comma-delimited VARCHAR. SQL's IN won't work for that, nor should you ever store joinable data that way in a database.

Comma-separated value insertion In SQL Server 2005

How can I insert values from a comma-separated input parameter with a stored procedure?
For example:
exec StoredProcedure Name 17,'127,204,110,198',7,'162,170,163,170'
you can see that I have two comma-separated value lists in the parameter list. Both will have the same number of values: if the first has 5 comma-separated values, then the second one also has 5 comma-separated values.
127 and 162 are related
204 and 170 are related
...and same for the others.
How can I insert these two values?
One comma-separated value is inserted, but how do I insert two?
Have a lok at something like (Full Example)
DECLARE #Inserts TABLE(
ID INT,
Val1 INT,
Val2 INT,
Val3 INT
)
DECLARE #Param1 INT,
#Param2 VARCHAR(100),
#Param3 INT,
#Param4 VARCHAR(100)
SELECT #Param1 = 17,
#Param2 = '127,204,110,198',
#Param3 = 7,
#Param4 = '162,170,163,170'
DECLARE #Table1 TABLE(
ID INT IDENTITY(1,1),
Val INT
)
DECLARE #Table2 TABLE(
ID INT IDENTITY(1,1),
Val INT
)
DECLARE #textXML XML
SELECT #textXML = CAST('<d>' + REPLACE(#Param2, ',', '</d><d>') + '</d>' AS XML)
INSERT INTO #Table1
SELECT T.split.value('.', 'nvarchar(max)') AS data
FROM #textXML.nodes('/d') T(split)
SELECT #textXML = CAST('<d>' + REPLACE(#Param4, ',', '</d><d>') + '</d>' AS XML)
INSERT INTO #Table2
SELECT T.split.value('.', 'nvarchar(max)') AS data
FROM #textXML.nodes('/d') T(split)
INSERT INTO #Inserts
SELECT #Param1,
t1.Val,
#Param3,
t2.Val
FROM #Table1 t1 INNER JOIN
#Table2 t2 ON t1.ID = t2.ID
SELECT *
FROM #Inserts
You need a way to split and process the string in TSQL, there are many ways to do this. This article covers the PROs and CONs of just about every method:
"Arrays and Lists in SQL Server 2005 and Beyond, When Table Value Parameters Do Not Cut it" by Erland Sommarskog
You need to create a split function. This is how a split function can be used:
SELECT
*
FROM YourTable y
INNER JOIN dbo.yourSplitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach to split a string in TSQL but there are numerous ways to split strings in SQL Server, see the previous link, which explains the PROs and CONs of each.
For the Numbers Table method to work, you need to do this one time table setup, which will create a table Numbers that contains rows from 1 to 10,000:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this split function:
CREATE FUNCTION [dbo].[FN_ListToTableRows]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will return empty rows, and row numbers
----------------
SELECT
ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(#SplitOn, ListValue, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS ListValue
) AS InnerQuery
INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
WHERE SUBSTRING(ListValue, number, 1) = #SplitOn
);
GO
You can now easily split a CSV string into a table and join on it. To accomplish your task, set up a test table to insert into:
create table YourTable (col1 int, col2 int)
then create your procedure:
CREATE PROCEDURE StoredProcedureName
(
#Params1 int
,#Array1 varchar(8000)
,#Params2 int
,#Array2 varchar(8000)
)
AS
INSERT INTO YourTable
(col1, col2)
SELECT
a1.ListValue, a2.ListValue
FROM dbo.FN_ListToTableRows(',',#Array1) a1
INNER JOIN dbo.FN_ListToTableRows(',',#Array2) a2 ON a1.RowNumber=a2.RowNumber
GO
test it out:
exec StoredProcedureName 17,'127,204,110,198',7,'162,170,163,170'
select * from YourTable
OUTPUT:
(4 row(s) affected)
col1 col2
----------- -----------
127 162
204 170
110 163
198 170
(4 row(s) affected)
This may not be an answer to your question... But I thought of letting you know that there is a better way to pass related values (Table Format) to a stored procedure... XML... You can build the XML string in your app (just as regular string) and pass it on to the stored procedure as a parameter... You can then use the following syntax to get it into a table. Hope this helps... In this way you can pass an entire table as parameter to stored procedure...
--Parameters
#param1 int,
#Budgets xml,
#Param2 int
-- #Budgets = '<Values><Row><Val1>127</Val1><Val2>162</Val2></Row> <Row><Val1>204</Val1><Val2>170</Val2></Row></Values>'
SELECT #param1 as Param1,
x.query('Val1').value('.','int') as val1,
#param3 as Param3,
x.query('Val2').value('.','int') as val1,
into #NewTable
FROM #Budgets.nodes('/Values/Row') x1(x)