STRING_SPLIT skyrockets execution time of sql query - sql

For a SQL query I need to split an input string into its integer components and select values from a table according to the provided integers.
My problem: if there are a lot of integers (>300), the query gets slower and slower, i.e. for around 600 integers it uses more than one minute!
Here a small example of the executed query:
DECLARE #inputStr VARCHAR(MAX) = '234,2344,12,523,5667,9825,345'
SELECT
surname,
firstname
FROM Addresses
WHERE id IN (SELECT CAST(value AS INTEGER) FROM STRING_SPLIT(#inputStr, ',')))
Is there a known problem to this or any improvements I could do?
I'm glad for any help!

The probleme is that any explicit values in a "IN" operator is translate by a multiple OR in the WHERE clause by the algebrizer, before optimizing the query...
A great number of values in the IN operator will allways causes a lack of performances, whatever the manner to do it... !
By creating a temporay table, you will have another query execution plan that will boost your performances.
Si try this way :
SELECT DISTINCT CAST(value AS INTEGER)
INTO #T
FROM STRING_SPLIT(#inputStr, ',');
SELECT surname,
firstname
FROM Addresses
WHERE id IN (SELECT value
FROM #T);
Eventually you can add a UNIQUE index to the temp table to increase performances :
SELECT DISTINCT CAST("value" AS INTEGER)
INTO #T
FROM STRING_SPLIT(#inputStr, ',');
CREATE UNIQUE INDEX X123456789 ON #T ("value");
SELECT surname,
firstname
FROM Addresses
WHERE id IN (SELECT value
FROM #T);

Personally, I would move STRING_SPLIT to a JOIN, as it could well be that SQL Server is running STRING_SPLIT once for every row:
SELECT surname,
firstname
FROM dbo.Addresses A
JOIN STRING_SPLIT(#InputStr,',') SS ON A.id = SS.[value];
If that is still slow, I would suggest you are missing an index on id. Considering it is the id, then you'll likely want it to be clustered, and probably the primary key:
ALTER TABLE dbo.Addresses ADD CONSTRAINT PK_Addresses PRIMARY KEY (id) CLUSTERED;

Could you try this?
DECLARE #inputStr VARCHAR(MAX) = '234,2344,12,523,5667,9825,345'
DROP TABLE IF EXISTS #TEST;
CREATE TABLE #TEST
(
[value] INT
);
INSERT INTO #TEST ([value])
SELECT value
FROM STRING_SPLIT(#inputStr, ',')
SELECT
surname,
firstname
FROM Addresses A
INNER JOIN #TEST B
ON A.id = b.value

Related

How to join on columns that contain strings that aren't exact matches in SQL Server?

I am trying to create a simple table join on columns from two tables that are equivalent but not exact matches. For example, the row value in table A might be "Georgia Production" and the corresponding row value in table B might be "Georgia Independent Production Co".
I first tried a wild card in the join like this:
select BOLFlatFile.*, customers.City, customers.FEIN_Registration_No, customers.ST
from BOLFlatFile
Left Join Customers on (customers.Name Like '%'+BOLFlatFile.Customer+'%');
and this works great for 90% of the data. However, If the string in table A does not exactly appear in Table B, it returns null.
So back to the above example, if the value for table A were "Georgia Independent", it would work, but if it were "Georgia Production, it would not.
This might be a complicated way of still being wrong, but this works with the sample I've mocked up.
The assumption is that because you are "wildcard searching" a string from one table to another, I am assuming that all of the words in the first table column appear in the second table column, which means by default that the second table column will always have a longer string in it than the first table column.
the second assumption is that there is a unique id on the first table, if there is not then you can create one by using the row_number function and ordering on your string column.
The approach below firstly creates some sample data (I've used tablea and tableb to represent your tables).
Then a dummy table is created to store the uniqueid for your first table and the string column.
Next a loop is invoked to iterate across the string in the dummy table and insert the unique id and the first section of the string followed by a space into the handler table which is what you will use to join the 2 target tables together.
The next section joins the first table to the handler table using the unique id and then joins the second table to the handler table on the key words longer than 3 letters (avoiding "the" "and" etc) joining back to the first table using the assumption that the string in table b is longer than table a (because you are looking for instances of each word in table a column in the corresponding column of table b hence the assumption).
declare #tablea table (
id int identity(1,1),
helptext nvarchar(50)
);
declare #tableb table (
id int identity(1,1),
helptext nvarchar(50)
);
insert #tablea (helptext)
values
('Text to find'),
('Georgia Production'),
('More to find');
insert #tableb (helptext)
values
('Georgia Independent Production'),
('More Text to Find'),
('something Completely different'),
('Text to find');
declare #stringtable table (
id int,
string nvarchar(50)
);
declare #stringmatch table (
id int,
stringmatch nvarchar(20)
);
insert #stringtable (id, string)
select id, helptext from #tablea;
update #stringtable set string = string + ' ';
while exists (select 1 from #stringtable)
begin
insert #stringmatch (id, stringmatch)
select id, substring(string,1,charindex(' ',string)) from #stringtable;
update #stringmatch set stringmatch = ltrim(rtrim(stringmatch));
update #stringtable set string=replace(string, stringmatch, '') from #stringtable tb inner join #stringmatch ma
on tb.id=ma.id and charindex(ma.stringmatch,tb.string)>0;
update #stringtable set string=LTRIM(string);
delete from #stringtable where string='' or string is null;
end
select a.*, b.* from #tablea a inner join #stringmatch m on a.id=m.id
inner join #tableb b on CHARINDEX(m.stringmatch,b.helptext)>0 and len(b.helptext)>len(a.helptext);
It all depends how complex you want to make this matching. There is various ways of matching these strings and some may work better than others. Below is an example of how you can split the names in your BOLFlatFile and Customers tables into separate words by using string_split.
The example below will match anything where all the words in the BOLFlatFile customer field are contained within the customers name field (note: it won't take into account ordering of the strings).
The code below will match the first two strings as expected, but not the last two sample strings.
CREATE TABLE BOLFlatFile
(
[customer] NVARCHAR(500)
)
CREATE TABLE Customers
(
[name] NVARCHAR(500)
)
INSERT INTO Customers VALUES ('Georgia Independent Production Co')
INSERT INTO BOLFlatFile VALUES ('Georgia Production')
INSERT INTO Customers VALUES ('Test String 1')
INSERT INTO BOLFlatFile VALUES ('Test 1')
INSERT INTO Customers VALUES ('Test String 2')
INSERT INTO BOLFlatFile VALUES ('Test 3')
;with BOLFlatFileSplit
as
(
SELECT *,
COUNT(*) OVER(PARTITION BY [customer]) as [WordsInName]
FROM
BOLFlatFile
CROSS APPLY
STRING_SPLIT([customer], ' ')
),
CustomerSplit as
(
SELECT *
FROM
Customers
CROSS APPLY
STRING_SPLIT([name], ' ')
)
SELECT
a.Customer,
b.name
FROM
CustomerSplit b
INNER JOIN
BOLFlatFileSplit a
ON
a.value = b.value
GROUP BY
a.Customer, b.name
HAVING
COUNT(*) = MAX([WordsInName])

Generate ID for duplicate values in sql server

I found following link to assign identical ID to duplicates in SQL server,
my understanding there is no sql server function to automatically generate it rather than using insert and update queries in link attached, is that statement True, if yes, then what would be the trigger if for example someone insert data to MyTable then run insert and update query from link:
Assign identical ID to duplicates in SQL server
INSERT INTO secondTable (word) SELECT distinct word FROM MyTable;
UPDATE MyTable SET ID = (SELECT id from secondTable where MyTable.word = secondTable.word)
thanks,
S
Is this what you want? I can't think of an "automatic" solution that would just increase the Id for new words.
CREATE TABLE MyTable (
Id INT NOT NULL,
Word NVARCHAR(255) NOT NULL
PRIMARY KEY (Id, Word)); -- primary key will make it impossible to have more than one combination of word and id
DECLARE #word NVARCHAR(255) = 'Hello!';
-- Get existing id or calculate a new id
DECLARE #Id INT = (SELECT Id FROM MyTable WHERE Word = #word);
IF(#id IS NULL) SET #Id = (SELECT MAX(Id) + 1 FROM MyTable);
INSERT INTO MyTable (Id, Word)
VALUES (#id, #word)
SELECT * FROM MyTable
If you can't for some reason have id and word as a combined primary key, you may use an unique index to make sure that there is only one combination

Convert a letter into a number

I am building the back end of a web application which is processing a significant portion of data and the front end developers are looking for a stable integer code to use in joining data.
The current integer values they are trying to use are surrogate keys which will change going forward leading to a number of problems.
Each table has a alphanumeric code and I am looking for a way in which I could convert this into a stable int.
EG convert a code 'AAAA' into 1111 or MMMM into 13131313
Could anyone tell me if this is at all possible.
Thanks,
McNets' comment seems to be a very good approach...
If you can be sure, that you have
plain ASCII characters
Not more than 4 letters
You might cast the string to VARBINARY(4) and cast this to INT:
DECLARE #dummy TABLE(StrangeCode VARCHAR(10));
INSERT INTO #dummy VALUES
('AAAA'),('MMMM'),('ACAC'),('CDEF'),('ABCD');
SELECT CAST(CAST(StrangeCode AS VARBINARY(4)) AS INT)
FROM #dummy;
The result
1094795585
1296911693
1094926659
1128547654
1094861636
If you need bigger number, you might go up to BIGINT
A way is using CTE like this:
;with tt(i, c1, c2) as (
select 1, c, replace(c,char(65), 1)
from yourTable
union all
select i+1, c1, c2= replace(c2,char(65+i), i+1)
from tt
where i < 26
)
select c1, cast(c2 as bigint) num
from tt
where i = 26;
As McNets suggests, create a second table:
create table IntCodes (id INT IDENTITY(1,1), UserCode VARCHAR(50) NOT NULL)
insert into IntCodes (UserCode)
select distinct UserCode
from MyTable
You'll need a trigger:
create trigger Trg_UserCode
on MyTable
after insert as
begin
insert into IntCodes (UserCode)
select i1.UserCode
from INSERTED i1
where i1.UserCode not in (select ic.Usercode from IntCodes ic)
end
Now, as part of the query:
select t1.*, t2.id as IntCode
from MyTable t1
inner join IntCodes t2
on t1.UserCode = t2.UserCode
This means that you won't need to worry about updating the existing columns

How do I return the column name in table where a null value exists?

I have a table of more than 2 million rows and over 100 columns. I need to run a query that checks if there are any null values in any row or column of the table and return an ID number where there is a null. I've thought about doing the following, but I was wondering if there is a more concise way of checking this?
SELECT [ID]
from [TABLE_NAME]
where
[COLUMN_1] is null
or [COLUMN_2] is null
or [COLUMN_3] is null or etc.
Your method is fine. If your challenge is writing out the where statement, then you can run a query like this:
select column_name+' is null or '
from information_schema.columns c
where c.table_name = 'table_name'
Then copy the results into a query window and use them for building the query.
I used SQL Server syntax for the query, because it looks like you are using SQL Server. Most databases support the INFORMATION_SCHEMA tables, but the syntax for string concatenation varies among databases. Remember to remove the final or at the end of the last comparison.
You can also copy the column list into Excel and use Excel formulas to create the list.
You can use something similar to the following:
declare #T table
(
ID int,
Name varchar(10),
Age int,
City varchar(10),
Zip varchar(10)
)
insert into #T values
(1, 'Alex', 32, 'Miami', NULL),
(2, NULL, 24, NULL, NULL)
;with xmlnamespaces('http://www.w3.org/2001/XMLSchema-instance' as ns)
select ID,
(
select *
from #T as T2
where T1.ID = T2.ID
for xml path('row'), elements xsinil, type
).value('count(/row/*[#ns:nil = "true"])', 'int') as NullCount
from #T as T1

How do I do an Upsert Into Table?

I have a view that has a list of jobs in it, with data like who they're assigned to and the stage they are in. I need to write a stored procedure that returns how many jobs each person has at each stage.
So far I have this (simplified):
DECLARE #ResultTable table
(
StaffName nvarchar(100),
Stage1Count int,
Stage2Count int
)
INSERT INTO #ResultTable (StaffName, Stage1Count)
SELECT StaffName, COUNT(*) FROM ViewJob
WHERE InStage1 = 1
GROUP BY StaffName
INSERT INTO #ResultTable (StaffName, Stage2Count)
SELECT StaffName, COUNT(*) FROM ViewJob
WHERE InStage2 = 1
GROUP BY StaffName
The problem with that is that the rows don't combine. So if a staff member has jobs in stage1 and stage2 there's two rows in #ResultTable. What I would really like to do is to update the row if one exists for the staff member and insert a new row if one doesn't exist.
Does anyone know how to do this, or can suggest a different approach?
I would really like to avoid using cursors to iterate on the list of users (but that's my fall back option).
I'm using SQL Server 2005.
Edit: #Lee: Unfortunately the InStage1 = 1 was a simplification. It's really more like WHERE DateStarted IS NOT NULL and DateFinished IS NULL.
Edit: #BCS: I like the idea of doing an insert of all the staff first so I just have to do an update every time. But I'm struggling to get those UPDATE statements correct.
Actually, I think you're making it much harder than it is. Won't this code work for what you're trying to do?
SELECT StaffName, SUM(InStage1) AS 'JobsAtStage1', SUM(InStage2) AS 'JobsAtStage2'
FROM ViewJob
GROUP BY StaffName
You could just check for existence and use the appropriate command. I believe this really does use a cursor behind the scenes, but it's the best you'll likely get:
IF (EXISTS (SELECT * FROM MyTable WHERE StaffName = #StaffName))
begin
UPDATE MyTable SET ... WHERE StaffName = #StaffName
end
else
begin
INSERT MyTable ...
end
SQL2008 has a new MERGE capability which is cool, but it's not in 2005.
IIRC there is some sort of "On Duplicate" (name might be wrong) syntax that lets you update if a row exists (MySQL)
Alternately some form of:
INSERT INTO #ResultTable (StaffName, Stage1Count, Stage2Count)
SELECT StaffName,0,0 FROM ViewJob
GROUP BY StaffName
UPDATE #ResultTable Stage1Count= (
SELECT COUNT(*) AS count FROM ViewJob
WHERE InStage1 = 1
#ResultTable.StaffName = StaffName)
UPDATE #ResultTable Stage2Count= (
SELECT COUNT(*) AS count FROM ViewJob
WHERE InStage2 = 1
#ResultTable.StaffName = StaffName)
To get a real "upsert" type of query you need to use an if exists... type of thing, and this unfortunately means using a cursor.
However, you could run two queries, one to do your updates where there is an existing row, then afterwards insert the new one. I'd think this set-based approach would be preferable unless you're dealing exclusively with small numbers of rows.
The following query on your result table should combine the rows again. This is assuming that InStage1 and InStage2 are never both '1'.
select distinct(rt1.StaffName), rt2.Stage1Count, rt3.Stage2Count
from #ResultTable rt1
left join #ResultTable rt2 on rt1.StaffName=rt2.StaffName and rt2.Stage1Count is not null
left join #ResultTable rt3 on rt1.StaffName=rt2.StaffName and rt3.Stage2Count is not null
I managed to get it working with a variation of BCS's answer. It wouldn't let me use a table variable though, so I had to make a temp table.
CREATE TABLE #ResultTable
(
StaffName nvarchar(100),
Stage1Count int,
Stage2Count int
)
INSERT INTO #ResultTable (StaffName)
SELECT StaffName FROM ViewJob
GROUP BY StaffName
UPDATE #ResultTable SET
Stage1Count= (
SELECT COUNT(*) FROM ViewJob V
WHERE InStage1 = 1 AND
V.StaffName = #ResultTable.StaffName COLLATE Latin1_General_CI_AS
GROUP BY V.StaffName),
Stage2Count= (
SELECT COUNT(*) FROM ViewJob V
WHERE InStage2 = 1 AND
V.StaffName = #ResultTable.StaffName COLLATE Latin1_General_CI_AS
GROUP BY V.StaffName)
SELECT StaffName, Stage1Count, Stage2Count FROM #ResultTable
DROP TABLE #ResultTable