SQL query takes more than an hour to execute for 200k rows - sql

I have two tables each with around 200,000 rows. I have run the query below and it still hasn't completed after running for more than an hour. What could be the explanation for this?
SELECT
dbo.[new].[colom1],
dbo.[new].[colom2],
dbo.[new].[colom3],
dbo.[new].[colom4],
dbo.[new].[Value] as 'nieuwe Value',
dbo.[old].[Value] as 'oude Value'
FROM dbo.[new]
JOIN dbo.[old]
ON dbo.[new].[colom1] = dbo.[old].[colom1]
and dbo.[new].[colom2] = dbo.[old].[colom2]
and dbo.[new].[colom3] = dbo.[old].[colom3]
and dbo.[new].[colom4] = dbo.[old].[colom4]
where dbo.[new].[Value] <> dbo.[old].[Value]
from comment;

It seems that for an equality join on a single column, the rows with NULL value in the join key are being filtered out, but this is not the case for joins on multiple columns.
As a result, the hash join complexity is changed from O(N) to O(N^2).
======================================================================
In that context I would like to recommend a great article written by Paul White on similar issues -
Hash Joins on Nullable Columns
======================================================================
I have generated a small simulation of this use-case and I encourage you to test your solutions.
create table mytab1 (c1 int null,c2 int null)
create table mytab2 (c1 int null,c2 int null)
;with t(n) as (select 1 union all select n+1 from t where n < 10)
insert into mytab1 select null,null from t t0,t t1,t t2,t t3,t t4
insert into mytab2 select null,null from mytab1
insert into mytab1 values (111,222);
insert into mytab2 values (111,222);
select * from mytab1 t1 join mytab2 t2 on t1.c1 = t2.c1 and t1.c2 = t2.c2
For the OP query we should remove rows with NULL values in any of the join key columns.
SELECT
dbo.[new].[colom1],
dbo.[new].[colom2],
dbo.[new].[colom3],
dbo.[new].[colom4],
dbo.[new].[Value] as 'nieuwe Value',
dbo.[old].[Value] as 'oude Value'
FROM dbo.[new]
JOIN dbo.[old]
ON dbo.[new].[colom1] = dbo.[old].[colom1]
and dbo.[new].[colom2] = dbo.[old].[colom2]
and dbo.[new].[colom3] = dbo.[old].[colom3]
and dbo.[new].[colom4] = dbo.[old].[colom4]
where dbo.[new].[Value] <> dbo.[old].[Value]
and dbo.[new].[colom1] is not null
and dbo.[new].[colom2] is not null
and dbo.[new].[colom3] is not null
and dbo.[new].[colom4] is not null
and dbo.[old].[colom1] is not null
and dbo.[old].[colom2] is not null
and dbo.[old].[colom3] is not null
and dbo.[old].[colom4] is not null

Using EXCEPT join, you only have to make the larger HASH join on those values that have changed, so much faster:
/*
create table [new] ( colom1 int, colom2 int, colom3 int, colom4 int, [value] int)
create table [old] ( colom1 int, colom2 int, colom3 int, colom4 int, [value] int)
insert old values (1,2,3,4,10)
insert old values (1,2,3,5,10)
insert old values (1,2,3,6,10)
insert old values (1,2,3,7,10)
insert old values (1,2,3,8,10)
insert old values (1,2,3,9,10)
insert new values (1,2,3,4,11)
insert new values (1,2,3,5,10)
insert new values (1,2,3,6,11)
insert new values (1,2,3,7,10)
insert new values (1,2,3,8,10)
insert new values (1,2,3,9,11)
*/
select n.colom1, n.colom2 , n.colom3, n.colom4, n.[value] as newvalue, o.value as oldvalue
from new n
inner join [old] o on n.colom1=o.colom1 and n.colom2=o.colom2 and n.colom3=o.colom3 and n.colom4=o.colom4
inner join
(
select colom1, colom2 , colom3, colom4, [value] from new
except
select colom1, colom2 , colom3, colom4, [value] from old
) i on n.colom1=i.colom1 and n.colom2=i.colom2 and n.colom3=i.colom3 and n.colom4=i.colom4

Related

How do I join on a column that contains a string that I'm trying to search through using a substring?

I'm trying to join a table onto another table. The gimmick here is that the column from the table contains a long string. Something like this:
PageNumber-190-ChapterTitle-HelloThere
PageNumber-19-ChapterTitle-NotToday
I have another table that has a list of page numbers and whether or not I want to keep those pages, for example:
Page Number
Keep Flag
190
Y
19
N
I want to be able to return a query that contains the long string but only if the page number exists somewhere in the string. The problem I have is that, when using a LIKE statement to join:
JOIN t2 ON t1.string LIKE '%' + t2.page_number + '%' WHERE keep_flag = 'Y'
It will still return both results for whatever reason. The column of "Keep Flag" in the results query will change to "Y" for page 19 even though it shouldn't be in the results.
I obviously don't think LIKE is the best way to JOIN given that '19' is LIKE '190'. What else can I do here?
if Page_number is a number you must cats it as varchar, so that the types fit.
You can read on thehome page ,ot about cast and convert see https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-ver16
CREATE TABLE tab1
([str] varchar(38))
;
INSERT INTO tab1
([str])
VALUES
('PageNumber-190-ChapterTitle-HelloThere'),
('PageNumber-19-ChapterTitle-NotToday')
;
2 rows affected
CREATE TABLE tab2
([Page Number] int, [Keep Flag] varchar(1))
;
INSERT INTO tab2
([Page Number], [Keep Flag])
VALUES
(190, 'Y'),
(19, 'N')
;
2 rows affected
SELECT str
FROM tab1 JOIN tab2 ON tab1.str LIKE '%' + CAST(tab2.[Page Number]AS varchar) + '%' AND tab2.[Keep Flag] = 'Y'
str
PageNumber-190-ChapterTitle-HelloThere
fiddle
Please try the following solution.
It is doing JOIN on the exact match.
SQL
-- DDL and sample data population, start
DECLARE #tbl1 TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(100));
INSERT #tbl1 (tokens) VALUES
('PageNumber-190-ChapterTitle-HelloThere'),
('PageNumber-19-ChapterTitle-NotToday');
DECLARE #tbl2 TABLE (ID INT IDENTITY PRIMARY KEY, Page_Number INT, Keep_Flag CHAR(1));
INSERT #tbl2 (Page_Number, Keep_Flag) VALUES
(190, 'Y'),
(19 , 'N');
-- DDL and sample data population, end
SELECT *
FROM #tbl1 AS t1 INNER JOIN #tbl2 AS t2
ON PARSENAME(REPLACE(t1.tokens,'-','.'),3) = t2.Page_Number
WHERE t2.keep_flag = 'Y';
Output
ID
tokens
ID
Page_Number
Keep_Flag
1
PageNumber-190-ChapterTitle-HelloThere
1
190
Y

Match the codes and copy columns

I am working in SQL Server 2008. I have 2 tables Table1 & Table2.
Table1 has columns
SchoolCode, District, Type, SchoolName
and Table2 has columns
SchoolCode1, District1, Type1, SchoolName1
SchoolCode columns in both tables have the same codes like "1234"; code is the same in both schoolcode columns.
Now I want to copy the District, Type and SchoolName column values from Table1 to Table2 if SchoolCode in both tables is same.
I think the query will use join but I don't know how it works. Any help on how I can do this task?
Maybe use an update statement in join if by copying over you mean updating rows
update t2
set
District1= District,
Type1= Type,
SchoolName1= SchoolName
from Table1 t1
join
Table2 t2
on t1.SchoolCode=t2.SchoolCode1
I could give you a little bit of idea. here is it:
Insert into table2 (District1, Type1, SchoolName1)
SELECT District, Type, SchoolName
FROM table1
where table1.Schoolcode=table2.Schoolcode1
You have to use Inner join to update data from table 1 to table 2, Inner join will join values that are equal. . To learn more about joins, I highly recommend you to read the below article
SQLServer Joins Explained - W3Schools
Please refer the below code, for the convenience I have used the temporary tables..
DECLARE #Table1 TABLE
(
SchoolCode INT,
District VARCHAR(MAX),
Type VARCHAR(MAX),
SchoolName VARCHAR(MAX)
)
DECLARE #Table2 TABLE
(
SchoolCode1 INT,
District1 VARCHAR(MAX),
Type1 VARCHAR(MAX),
SchoolName1 VARCHAR(MAX)
)
INSERT INTO #Table1
( SchoolCode ,District , Type , SchoolName
)
VALUES ( 1 ,'DIS1' ,'X' ,'A'),
( 2 ,'DIS2' ,'Y' ,'B'),
( 3 ,'DIS3' ,'Z' ,'C'),
( 4 ,'DIS4' ,'D' ,'D'),
( 5 ,'DIS5' ,'K' ,'E')
INSERT INTO #Table2
( SchoolCode1 ,District1 , Type1 , SchoolName1
)
VALUES ( 1 ,'DIS1' ,'X' ,'A'),
( 2 ,NULL ,'Z' ,NULL),
( 3 ,'DIS3' ,'Z' ,'C'),
( 4 ,NULL ,'Z' ,'S'),
( 5 ,'DIS5' ,'K' ,'E')
--BEFORE
SELECT * FROM #Table1
SELECT * FROM #Table2
--Logic UPDATE Table 2
UPDATE t2 SET t2.District1 = t1.District,
t2.Type1 = t1.Type,
t2.SchoolName1 = t1.SchoolName
FROM #Table1 t1
INNER JOIN #Table2 t2 ON t1.SchoolCode = t2.SchoolCode1
-- End Logic UPDATE Table 2
--AFTER
SELECT * FROM #Table1
SELECT * FROM #Table2
You can join tables in an UPDATE statement.
Note, I have aliased the tables, table1 and table2 as t1 and t2 respectively.
This is what I did:
create table Table1
(SchoolCode varchar(50),
District varchar(50),[Type] varchar(50),SchoolName varchar(50))
go
create table Table2
(SchoolCode1 varchar(50), District1 varchar(50),[Type1] varchar(50),SchoolName1 varchar(50))
go
insert into table1 values ('1234','District1','High','Cool School')
insert into table1 values ('2222','District2','Lower','Leafy School')
insert into table2 (SchoolCode1) values ('1234')
go
update t2
set District1 = District,
Type1 = [Type],
SchoolName1 = SchoolName
from table1 t1
join table2 t2
on t2.SchoolCode1 = t1.SchoolCode
go
select * from table2
go

Constructing single select statement that returns order depends on the value of a column in SQL Server

Table1
Id bigint primary key identity(1,1)
Status nvarchar(20)
Insert dummy data
Insert into Table1 values ('Open') --1
Insert into Table1 values ('Open') --2
Insert into Table1 values ('Grabbed') --3
Insert into Table1 values ('Closed') --4
Insert into Table1 values ('Closed') --5
Insert into Table1 values ('Open') --6
How would I construct a single select statement which orders the data where records with 'Grabbed' status is first, followed by 'Closed', followed by 'Open' in SQL Server
Output:
Id Status
3 Grabbed
4 Closed
5 Closed
1 Open
2 Open
6 Open
I think you need something like this:
select *
from yourTable
order by case when Status = 'Grabbed' then 1
when Status = 'Closed' then 2
when Status = 'Open' then 3
else 4 end
, Id;
[SQL Fiddle Demo]
Another way is to using CTE like this:
;with cte as (
select 'Grabbed' [Status], 1 [order]
union all select 'Closed', 2
union all select 'Open', 3
)
select t.*
from yourTable t
left join cte
on t.[Status] = cte.[Status]
order by cte.[order], Id;
[SQL Fiddle Demo]
This could be done much better with a properly normalized design:
Do not store your Status as a textual content. Just imagine a typo (a row with Grabed)...
Further more a lookup table allows you to add side data, e.g. a sort order.
CREATE TABLE StatusLookUp(StatusID INT IDENTITY PRIMARY KEY /*you should name your constraints!*/
,StatusName VARCHAR(100) NOT NULL
,SortRank INT NOT NULL)
INSERT INTO StatusLookUp VALUES
('Open',99) --ID=1
,('Closed',50)--ID=2
,('Grabbed',10)--ID=3
CREATE TABLE Table1(Id bigint primary key identity(1,1) /*you should name your constraints!*/
,StatusID INT FOREIGN KEY REFERENCES StatusLookUp(StatusID));
Insert into Table1 values (1) --1
Insert into Table1 values (1) --2
Insert into Table1 values (3) --3
Insert into Table1 values (2) --4
Insert into Table1 values (2) --5
Insert into Table1 values (1) --6
SELECT *
FROM Table1 AS t1
INNER JOIN StatusLookUp AS s ON t1.StatusID=s.StatusID
ORDER BY s.SortRank;
I find that the simplest method uses a string:
order by charindex(status, 'Grabbed,Closed,Open')
or:
order by charindex(',' + status + ',', ',Grabbed,Closed,Open,')
If you are going to put values in the query, I think the easiest way uses values():
select t1.*
from t1 left join
(values ('Grabbed', 1), ('Closed', 2), ('Open', 3)) v(status, priority)
on t1.status = v.status
order by coalesce(v.priority, 4);
Finally. This need suggests that you should have a reference table for statuses. Rather than putting the string name in other tables, put an id. The reference table can have the priority as well as other information.
Try this:
select Id,status from tablename where status='Grabbed'
union
select Id,status from tablename where status='Closed'
union
select Id,status from tablename where status='Open'

Filtering a group of records

Please see the SQL structure below:
CREATE table TestTable (id int not null identity, [type] char(1), groupid int)
INSERT INTO TestTable ([type]) values ('a',1)
INSERT INTO TestTable ([type]) values ('a',1)
INSERT INTO TestTable ([type]) values ('b',1)
INSERT INTO TestTable ([type]) values ('b',1)
INSERT INTO TestTable ([type]) values ('a',2)
INSERT INTO TestTable ([type]) values ('a',2)
The first four records are part of group 1 and the fifth and sixth records are part of group 2.
If there is at least one b in the group then I want the query to only return b's for that group. If there are no b's then the query should return all records for that group.
Here you go
SELECT *
FROM testtable
LEFT JOIN (SELECT distinct groupid FROM TestTable WHERE type = 'b'
) blist ON blist.groupid = testtable.groupid
WHERE (blist.groupid = testtable.groupid and type = 'b') OR
(blist.groupid is null)
How it works
join to a list of items that contain b.
Then in where statement... if we exist in that list just take type b. Otherwise take everything.
As an after-note you could be cute with the where clause like this
WHERE ISNULL(blist.groupid,testtable.groupid) = testtable.groupid
I think this is less clear -- but is often how advanced users will do it.

Select records with order of IN clause

I have
SELECT * FROM Table1 WHERE Col1 IN(4,2,6)
I want to select and return the records with the specified order which i indicate in the IN clause
(first display record with Col1=4, Col1=2, ...)
I can use
SELECT * FROM Table1 WHERE Col1 = 4
UNION ALL
SELECT * FROM Table1 WHERE Col1 = 6 , .....
but I don't want to use that, cause I want to use it as a stored procedure and not auto generated.
I know it's a bit late but the best way would be
SELECT *
FROM Table1
WHERE Col1 IN( 4, 2, 6 )
ORDER BY CHARINDEX(CAST(Col1 AS VARCHAR), '4,2,67')
Or
SELECT CHARINDEX(CAST(Col1 AS VARCHAR), '4,2,67')s_order,
*
FROM Table1
WHERE Col1 IN( 4, 2, 6 )
ORDER BY s_order
You have a couple of options. Simplest may be to put the IN parameters (they are parameters, right) in a separate table in the order you receive them, and ORDER BY that table.
The solution is along this line:
SELECT * FROM Table1
WHERE Col1 IN(4,2,6)
ORDER BY
CASE Col1
WHEN 4 THEN 1
WHEN 2 THEN 2
WHEN 6 THEN 3
END
select top 0 0 'in', 0 'order' into #i
insert into #i values(4,1)
insert into #i values(2,2)
insert into #i values(6,3)
select t.* from Table1 t inner join #i i on t.[in]=t.[col1] order by i.[order]
Replace the IN values with a table, including a column for sort order to used in the query (and be sure to expose the sort order to the calling application):
WITH OtherTable (Col1, sort_seq)
AS
(
SELECT Col1, sort_seq
FROM (
VALUES (4, 1),
(2, 2),
(6, 3)
) AS OtherTable (Col1, sort_seq)
)
SELECT T1.Col1, O1.sort_seq
FROM Table1 AS T1
INNER JOIN OtherTable AS O1
ON T1.Col1 = O1.Col1
ORDER
BY sort_seq;
In your stored proc, rather than a CTE, split the values into table (a scratch base table, temp table, function that returns a table, etc) with the sort column populated as appropriate.
I have found another solution. It's similar to the answer from onedaywhen, but it's a little shorter.
SELECT sort.n, Table1.Col1
FROM (VALUES (4), (2), (6)) AS sort(n)
JOIN Table1
ON Table1.Col1 = sort.n
I am thinking about this problem two different ways because I can't decide if this is a programming problem or a data architecture problem. Check out the code below incorporating "famous" TV animals. Let's say that we are tracking dolphins, horses, bears, dogs and orangutans. We want to return only the horses, bears, and dogs in our query and we want bears to sort ahead of horses to sort ahead of dogs. I have a personal preference to look at this as an architecture problem, but can wrap my head around looking at it as a programming problem. Let me know if you have questions.
CREATE TABLE #AnimalType (
AnimalTypeId INT NOT NULL PRIMARY KEY
, AnimalType VARCHAR(50) NOT NULL
, SortOrder INT NOT NULL)
INSERT INTO #AnimalType VALUES (1,'Dolphin',5)
INSERT INTO #AnimalType VALUES (2,'Horse',2)
INSERT INTO #AnimalType VALUES (3,'Bear',1)
INSERT INTO #AnimalType VALUES (4,'Dog',4)
INSERT INTO #AnimalType VALUES (5,'Orangutan',3)
CREATE TABLE #Actor (
ActorId INT NOT NULL PRIMARY KEY
, ActorName VARCHAR(50) NOT NULL
, AnimalTypeId INT NOT NULL)
INSERT INTO #Actor VALUES (1,'Benji',4)
INSERT INTO #Actor VALUES (2,'Lassie',4)
INSERT INTO #Actor VALUES (3,'Rin Tin Tin',4)
INSERT INTO #Actor VALUES (4,'Gentle Ben',3)
INSERT INTO #Actor VALUES (5,'Trigger',2)
INSERT INTO #Actor VALUES (6,'Flipper',1)
INSERT INTO #Actor VALUES (7,'CJ',5)
INSERT INTO #Actor VALUES (8,'Mr. Ed',2)
INSERT INTO #Actor VALUES (9,'Tiger',4)
/* If you believe this is a programming problem then this code works */
SELECT *
FROM #Actor a
WHERE a.AnimalTypeId IN (2,3,4)
ORDER BY case when a.AnimalTypeId = 3 then 1
when a.AnimalTypeId = 2 then 2
when a.AnimalTypeId = 4 then 3 end
/* If you believe that this is a data architecture problem then this code works */
SELECT *
FROM #Actor a
JOIN #AnimalType at ON a.AnimalTypeId = at.AnimalTypeId
WHERE a.AnimalTypeId IN (2,3,4)
ORDER BY at.SortOrder
DROP TABLE #Actor
DROP TABLE #AnimalType
ORDER BY CHARINDEX(','+convert(varchar,status)+',' ,
',rejected,active,submitted,approved,')
Just put a comma before and after a string in which you are finding the substring index or you can say that second parameter.
And first parameter of CHARINDEX is also surrounded by , (comma).