MS SQL - Joining on two tables with a substringed key in one column - sql

I have a 2 tables I need to join, however on one of the tables I need to extract a key from a varchar field in each row.
Table 1 Description (numeric 18,varchar 4000)
descriptionid description
1 Blah Blah: Queue 1Blah Blah
2 foobar:Queue 2
3 rem:Queue 2 -This is a note
4 Anotherrow: Queue 3
5 Something else
Table 2 Queue - (numeric 18, varchar 100)
queueid queue
123 Queue 1
124 Queue 2
127 Queue 3
129 Queue 4
So I need to produce the output like so
View 3 Queue-Description (numeric 18, numeric 18)
descriptionid queueid
1 123
2 124
3 124
4 127
5 null
So in table 1 row 1, I need to strip out the value Queue1 from the description, verify it is in the queue table, and lookup the queueid.
I am unable to change the structure of tables 1 and 2.
What ways can this be achieved in MSSQL?
What is the most efficient way to do this in SQL - using MSSQL 2005 here.

most efficient way
Well... don't know about that but it is a way.
select T1.descriptionid,
T2.queueid
from Table1 as T1
left outer join Table2 as T2
on T1.description like '%'+T2.queue+'%'
Another way
select T1.descriptionid,
T2.queueid
from Table1 as T1
left outer join Table2 as T2
on charindex(T2.queue, T1.description, 1) > 0
If there are more than one match (see comment by Ed Harper) you can use this to pick the one with the longest match.
select T1.descriptionid,
T2.queueid
from Table1 as T1
outer apply (
select top 1 T3.queueid
from Table2 as T3
where charindex(T3.queue, T1.description, 1) > 0
order by len(T3.queue) desc
) as T2(queueid)

The most efficient way to do this is to add an extra column to your table and insert the extracted the ID from the string. You can do this when rows are added and you can process the existing ones fairly easily. But trying to left join like this will be very slow.

In Sql Server 2005 you can extract your queue string using regex. The Data Extraction section on this page contains an example.
In a stored procedure you can then build an indexed temp table that contains a new column - this allows you to do this without changing the table metadata).
If you can change the table metadata you can:
Trigger the content into another column (on insert).
Or if the information is not needed immediately a daily sql job could extract the information.

Related

How do I compare whether same/intersection data is there in one and another row in same table when I have huge data in the table in sql server?

I have a table with the sample data below. Now, I just want to compare one record with all other records in the same table and we have to give ID if that record colloids with any other records in the remaining records. And column is with comma separated data, So if we have 'A,C' as Name in one record and 'A' in another record(Check the input from text) then it colloid each other because 'A' is common in both.
In the same way one of the record is not having anything in the Name it is NULL. When it is Null it should colloid with remaining other records. Like this Name column I have around 10 columns to verify data.
Input
ID
Name
1
A,C
2
B
3
A
4
NULL
OUTPUT
ID
ColloidID
1
3
1
4
2
4
3
1
3
4
4
1
4
2
4
3
Problem : I have implemented solution like below, and it working fine as expected. But the thing here is it is fine when less data in the table(<100k) but it's taking more time and space when dealing with millions of data(Ex : >20M Data)
SELECT DISTINCT A.ID,B.ID AS ColloidID
FROM #Temp1 A
CROSS APPLY #Temp1 B
WHERE A.ID<>B.ID
AND master.dbo.fIntersection(COALESCE(A.Name,B.Name,''),COALESCE(B.Name,A.Name,'')) = 1
Ideally you should not store multiple pieces of info in a single column.
Be that as it may, you can use a nested EXISTS with STRING_SPLIT to compare the two columns.
SELECT t1.ID, t2.ID
FROM #Temp1 t1
JOIN #Temp1 t2 ON t2.ID <> t1.ID
AND (t1.Name IS NULL OR t2.Name IS NULL
OR EXISTS (SELECT 1
FROM STRING_SPLIT(t1.Name, ',') s1
JOIN STRING_SPLIT(t2.Name, ',') s2 ON s2.value = s1.value
)
)
ORDER BY
t1.ID,
t2.ID;
db<>fiddle
20M isn't a lot of data, provided a good database design is used, with proper indexes. This is definitely not a good design. It violates the most basic design rule - one value per field. As a result, it's impossible to index Name, forcing 4*10^14 comparisons.
The only way to get acceptable performance is to fix the design. To do that Name has to be split into separate rows. The data needs to be stored in a table whose Name column is covered by an index or primary key:
create table #Id_Names (
ID bigint not null,
Name varchar(30) null,
INDEX IX_Id_Names (Name,ID)
);
GO
INSERT INTO #Id_Names (Id,Name)
select ID,value
from #Temp1 t
CROSS APPLY STRING_SPLIT(Name,',');
After that, the query is simplified to :
SELECT
t1.ID,t2.ID as ColloidID
FROM #Id_Names t1
INNER JOIN #Id_Names t2
ON t1.ID<>t2.ID
AND (t1.Name=t2.Name
OR t1.Name IS NULL
OR t2.Name IS NULL)
This can run a lot faster. The only real problem is the logic of treating NULL as a wildcard. This will return the entire table. And since the table joins itself, each null will result in (20M-1)^2 extra rows. The same relations will be repeated twice, eg (1,4) and (4,1)
If #Temp1 was a proper table, an alternative would be to create an indexed view. Creating an index over a VIEW essentially generates, stores and updates its results automatically.
Another option is to create a Clustered Columnstore index. This provides both compression and acceleration. The data is stored per column in buckets of roughly 1M rows. In each bucket, each column value is only stored once.
create table #Id_Names (
ID bigint not null,
Name varchar(30) null,
INDEX CCI_Id_Names CLUSTERED COLUMNSTORE
);

Including multiple columns in NOT IN

I have two tables as below.
Table 1
Book price
A 100
B 200
C 400
D 300
Table 2
Book price
A 100
B 200
C 400
Now I am executing below command as I want only the 4th record to get inserted into table 2. I want to add both the column names before NOT IN. what should I do?
Insert into table2 select * from table1 t1 where t1.book not in (select book from table2);
You can use NOT EXISTS along with matching the presumably primary key columns Book for both of the tables such as
INSERT INTO table2
SELECT *
FROM table1 t1
WHERE NOT EXISTS ( SELECT 0 FROM table2 WHERE Book=t1.Book);
Demo
P.S.: Should be careful about NULL values while using NOT IN operator. Moreover, using NOT IN is mostly less performant than using NOT EXISTS
I tried it and it looks fine to me.
may be you run the script twice so it will show 0 rows created in the second time.
make sure you committed the row.

merge two temp tables and add common columns as new row and add unmatch column using sql

I am using ms sql server, i have two tables below( table 1 and table 2):
table 1 table 2 result
name value ++ name data == name value data
test 10 test1 20 test 10 null
test1 null 20
I want to merge table 1 and table 2 and my expected result would be as result table , can anybody help me here ?
You can combine these using a full join:
select coalesce(t1.name, t2.name) as name, t1.value, t2.data
from t1 full join
t2
on t1.name = t2.name;
If you want to use * to select all columns in the tables, SQL Server does not offer simple way to choose unique columns (without listing all columns). SQL Server doesn't support USING.

Matching records with wild cards from two different tables

I have two tables with the following data (amongst other data).
Table 1
Value 1
'003232339639
'00264644106272
0026461226291#
I need to match the second column in the table below using column 1 as an identifier
Table 2
Value 1 Value 2
00264 1
0026485 2
0026481 3
00322889 4
00323283 5
00323288 6
So the results I need will be as follows:
Result
Table 1, Value 1 Table 2, Value 2
'003232339639......4
'00264644106272....1
0026461226291#.....1
Any help will be appreciated - very stuck here and doing it manually at the moment in excel.
I hope this format makes sense - first time I am using this forum.
Melany, the question is kind of confusing (not written correctly) perhaps that's why no one is responding. I'll make an attempt to explain how similar selects is done
SELECTING DATA FROM TABLE1 WHERE A MATCHING COLUMN (COL1) EXISTS IN BOTH TABLE
SELECT * FROM TABLE1
INNER JOIN TABLE2
ON TABLE1.COL1 = TABLE2.COL1
AND TABLE1.COL1 = 'XYZ'
USING A SUBSELECT FOR THE SAME
SELECT * FROM TABLE1
WHERE COL1 IN(SELECT COL1 FROM TABLE2
WHERE COL1 = 'XYZ')
In SQL, the wildcard for one or more characters is %, and is to be used with the keyword LIKE.
So I suggest the following (if your purpose is really to match rows in Table1 for which Value1 begins like a value in Table2.Value1):
SELECT Table1.Value1, Table2.Value2 WHERE Table1.Value1 LIKE CONCAT(Table2.Value1, '%');
Edit: replace CONCAT(x, y) with x || y for some DBMSs (SQLite for instance).

How to compare string data to table data in SQL Server - I need to know if a value in a string doesn't exist in a column

I have two tables, one an import table, the other a FK constraint on the table the import table will eventually be put into. In the import table a user can provide a list of semicolon separated values that correspond to values in the 2nd table.
So we're looking at something like this:
TABLE 1
ID | Column1
1 | A; B; C; D
TABLE 2
ID | Column2
1 | A
2 | B
3 | D
4 | E
The requirement is:
Rows in TABLE 1 with a value not in TABLE 2 (C in our example) should be marked as invalid for manual cleanup by the user. Rows where all values are valid are handled by another script that already works.
In production we'll be dealing with 6 columns that need to be checked and imports of AT LEAST 100k rows at a time. As a result I'd like to do all the work in the DB, not in another app.
BTW, it's SQL2008.
I'm stuck, anyone have any ideas. Thanks!
Seems to me you could pass ID & Column1 values from Table1 to a Table-Valued function (or a temp table in-line) which would parse the ;-delimited list, returning individual values per record.
Here are a couple options:
T-SQL: Parse a delimited string
Quick T-Sql to parse a delimited string
The result (ID, value) from the function could be used to compare (unmatched query) against values in Table 2.
SELECT tmp.ID
FROM tmp
LEFT JOIN Table2 ON Table2.id = tmp.ID
WHERE Table2.id is null
The ID results of the comparison would then be used to flag records in Table 1.
Perhaps inserting those composite values into 'TABLE 1' may have seemed like the most convenient solution at one time. However, unless your users are using SQL Server Management Studio or something similar to enter the values directly into the table then I assume there must be a software layer between the UI and the database. If so, you're going to save yourself a lot headaches both now and in the long run by investing a little time in altering your code to split the semi-colon delimited inputs into discrete values before inserting them into the database. This will result in 'TABLE 1' looking something like this
TABLE 1
ID | Column1
1 | A
1 | B
1 | C
1 | D
It's then trivial to write the SQL to find those IDs which are invalid.
If it is possible, try putting the values in separate rows when importing (instead of storing it as ; separated).
This might help.
Here is an easy and straightforward solution for the IDs of the invalid rows, despite its lack of performance because of string manipulations.
select T1.ID
from [TABLE 1] T1
left join [TABLE 2] T2
on ('; ' + T1.COLUMN1 + '; ') like ('%; ' + T2.COLUMN2 + '; %')
where T1.COLUMN1 is not null
group by T1.ID
having count(*) < len(T1.COLUMN1) - len(replace(T1.COLUMN1, ';', '')) + 1
There are two assumptions:
The semicolon-separated list does not contain duplicates
TABLE 2 does not contain duplicates in COLUMN2.
The second assumption can easily be fixed by using (select distinct COLUMN2 from [TABLE 2]) rather than [TABLE 2].