Query Slow down due to structure of WHERE clause - sql

I have a query that creates an #TABLE of a population of interest. It's structure is like this:
DECLARE #SepsisTbl TABLE (
PK INT IDENTITY(1, 1) PRIMARY KEY
, Name VARCHAR(500)
, MRN INT
, Account INT
, Age INT -- Age at arrival
, Arrival DATETIME
, Triage_StartDT DATETIME
, Left_ED_DT DATETIME
, Disposition VARCHAR(500)
, Mortality CHAR(1)
);
WITH Patients AS (
SELECT UPPER(Patient) AS [Name]
, MR#
, Account
, DATEDIFF(YEAR, AgeDob, Arrival) AS [Age_at_Arrival]
, Arrival
, Triage_Start
, TimeLeftED
, Disposition
, CASE
WHEN Disposition IN (
'Medical Examiner', 'Morgue'
)
THEN 'Y'
ELSE 'N'
END AS [Mortality]
FROM SMSDSS.c_Wellsoft_Rpt_tbl
WHERE Triage_Start IS NOT NULL
AND (
Diagnosis LIKE '%SEPSIS%'
OR
Diagnosis LIKE '%SEPTIC%'
)
)
INSERT INTO #SepsisTbl
SELECT * FROM Patients
From this point forward I have 5 more queries of the same sort that are looking for different types of orders that I then LEFT OUTER JOIN onto this table. My question is, why does my performance degrade so much when I change the where clause of the tables from this:
AND A.Account IN (
SELECT Account
FROM SMSDSS.c_Wellsoft_Rpt_tbl
WHERE (
Diagnosis LIKE '%SEPSIS%'
OR
Diagnosis LIKE '%SEPTIC%'
)
to this:
AND A.Account IN (
SELECT Account
FROM #SepsisTbl
)
The run time goes from 2.5 minutes to over 10 minutes with still no results. The CTE itself runs as fast as I can press F5.
Thank you,

I suspect that the problem is because the table variable doesn't have an index on Account. If you add an index on Account then I would expect better performance.
See the answer to this question for details on how to add an index: Creating an index on a table variable

Related

Create column with values based on join

I have two tables 1) a customer table 2)Account table. I want to see what accounts are primary and which are secondary accounts.
In one table I have accountRowId. In the other table I have PrimaryAccountRowId and SecondaryAccountRowId and ‘AccountNumber’.
For my output I would like to have all AccountNumbers in one column with all the AccountRelationship(primary or seconday) in another column beside each AccountNumber.
In order to join table, for PrimaryAccounts I would join AccountRowId on PrimaryAccountRowId and for secondary Accounts I would just flip flop and instead of having the primaryAccountRowId it would be SecondaryAccountRowId.
My Account table:
AccountRowId = 256073
AccountRowId = 342300
Customer table:
PrimaryAccountRowId = 256073
SecondaryAccountRowId = 342300
AccountNumber = 8003564
AccountNumber = 2034666
What I want to see my table look like
AccoundNumber AccountRelationship
8003564 Primary
2034666 Secondary
Please provide some helpful logic/code of how I would achieve these results.
From the OP's comments here is the table structure.
Create table Customer
(
AccountNumber Varchar(50)
, PrimaryAccountRowId Varchar(15)
, SecondaryAccountRowId Varchar(15)
);
Create table Account
(
AccountRowId Varchar(15)
);
I am still somewhat guessing here. You need to provide table structure, sample data and desired output to make it easy for people to help you. Something along these lines.
declare #Customer table
(
AccountNumber Varchar(50)
, PrimaryAccountRowId Varchar(15)
, SecondaryAccountRowId Varchar(15)
)
insert #Customer values
('8003564', '256073', null)
, ('2034666', null, '342300')
declare #Account table
(
AccountRowid Varchar(15)
)
INSERT #Account values
('256073'), ('342300')
Now that we have some tables and data to work with this just becomes a case of conditional aggregation. This should return the data you are looking for as I understand the need.
select c.AccountNumber
, AccountRelationship = max(case when p.AccountRowId is not null then 'Primary' when c.SecondaryAccountRowId is not null then 'Secondary' end)
from #Customer c
left join #Account p on p.AccountRowid = c.PrimaryAccountRowId
left join #Account s on s.AccountRowid = c.SecondaryAccountRowId
group by c.AccountNumber
order by AccountRelationship

Azure SQL DWH Date Partitioning

Just keen to see a working example of how to partition a large table of (150 million rows with 30 columns), what are the best practices to partition such a big table by date (sample code please)
Also, want to know how are these partitions are merged, switched out, and archived. Any TSQL code based implementation example is much appreciated.
Below how you partition a table by a date field:
CREATE TABLE [dbo].[FactInternetSales]
(
[ProductKey] int NOT NULL
, [OrderDateKey] int NOT NULL
, [CustomerKey] int NOT NULL
, [PromotionKey] int NOT NULL
, [SalesOrderNumber] nvarchar(20) NOT NULL
, [OrderQuantity] smallint NOT NULL
, [UnitPrice] money NOT NULL
, [SalesAmount] money NOT NULL
)
WITH
( CLUSTERED COLUMNSTORE INDEX
, DISTRIBUTION = HASH([ProductKey])
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101,20010101,20020101
,20030101,20040101,20050101
)
)
)
;
Below is a sample partitioned columnstore table containing one row in each partition:
CREATE TABLE [dbo].[FactInternetSales]
(
[ProductKey] int NOT NULL
, [OrderDateKey] int NOT NULL
, [CustomerKey] int NOT NULL
, [PromotionKey] int NOT NULL
, [SalesOrderNumber] nvarchar(20) NOT NULL
, [OrderQuantity] smallint NOT NULL
, [UnitPrice] money NOT NULL
, [SalesAmount] money NOT NULL
)
WITH
( CLUSTERED COLUMNSTORE INDEX
, DISTRIBUTION = HASH([ProductKey])
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101
)
)
)
;
INSERT INTO dbo.FactInternetSales
VALUES (1,19990101,1,1,1,1,1,1);
INSERT INTO dbo.FactInternetSales
VALUES (1,20000101,1,1,1,1,1,1);
CREATE STATISTICS Stat_dbo_FactInternetSales_OrderDateKey ON dbo.FactInternetSales(OrderDateKey);
SQL Data Warehouse supports partition splitting, merging, and switching. Each of these functions is excuted using the ALTER TABLE statement.
To create a partitioned table on Azure SQL Data Warehouse from data coming from another table, you can use CTAS as shown below:
CREATE TABLE dbo.FactInternetSales_20000101
WITH ( DISTRIBUTION = HASH(ProductKey)
, CLUSTERED COLUMNSTORE INDEX
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101
)
)
)
AS
SELECT *
FROM FactInternetSales
WHERE 1=2
;
For more information, please visit this documentation.

Insert where not exists Violation of PRIMARY KEY

I'm having troubles with an Insert where not exists and I'm not sure if a MERGE statement would be more efficient or what's wrong with my statement.
I have en existing View and need to insert the new records of this View into a Table.
The Table looks like:
CREATE TABLE [dbo].[ser_number_all]
(Serialnumber nvarchar(100) PRIMARY KEY,
TypeName nvarchar(max),
Date datetime,
Parent_Serialnumber nvarchar(100),
JobNumber nvarchar(30),
ProductNode hierarchyid,
);
The Insert statement looks like this:
insert into [dbo].[ser_number_all]
( Serialnumber
, TypeName
, Date
, Parent_Serialnumber
, JobNumber
, ProductNode)
select Serialnumber
, TypeName
, Date
, Parent_Serialnumber
, JobNumber
, ProductNode
from dbo.Hierachical_View_with_Jobnumbers as ser_number_all
where not exists (select 1
from Hierachical_View_with_Jobnumbers as hv
where hv. Serialnumber = ser_number_all.Serialnumber
and hv. TypeName = ser_number_all.TypeName
and hv. Date = ser_number_all.Date
and hv. Parent_Serialnumber = ser_number_all.Parent_Serialnumber
and hv. JobNumber = ser_number_all.JobNumber
and hv. ProductNode = ser_number_all.ProductNode);
As long the View has not any new records, it looks ok and I'm not getting any error, the output is 0 records as it should be.
When I add a new record to the origin table and the view has 1 record more, I'm always getting this error:
Msg 2627, Level 14, State 1, Line 4
Violation of PRIMARY KEY constraint 'PK__ser_numb__F2753A12C4ABA976'. Cannot insert duplicate key in object 'dbo.ser_number_all'. The duplicate key value is (.x3666AB05).
The statement has been terminated.
I don't get it why it will insert a duplicate value in the primary key column because in my WHERE clause I can't see any mistake.
I have also tried with IS NULL instead = ser_number_all.TypeName and for all other columns where it could have a NULL value, but still the same.
Again, I'm coming from Oracle and it looks like I have to learn many diversities with MS SQL compared to Oracle.
Appreciate any suggestion :-)
Thx
EDIT:
Here the code of the View:
CREATE VIEW [dbo].[Hierachical_View_with_Jobnumbers]
AS
WITH ProductList
AS
(
SELECT p.Serialnumber,
p.Type_Id,
p.Parent_Serialnumber,
p.ActiveJob_Jobnumber as JobNumber,
N'/' + CONVERT(NVARCHAR(4000), ROW_NUMBER() OVER (ORDER BY p.Serialnumber)) + N'/' AS ProductNode_AsChar
FROM Products AS p
WHERE p.Parent_Serialnumber IS NULL
UNION ALL
SELECT p.Serialnumber,
p.Type_Id,
p.Parent_Serialnumber,
JobNumber,
pl.ProductNode_AsChar + CONVERT(NVARCHAR(4000), ROW_NUMBER() OVER (ORDER BY p.Serialnumber)) + N'/'
FROM Products AS p
INNER JOIN ProductList AS pl ON p.Parent_Serialnumber = pl.Serialnumber
)
SELECT Serialnumber,
pt.Name as TypeName,
Parent_Serialnumber,
JobNumber,
CONVERT(HIERARCHYID, ProductNode_AsChar) AS ProductNode
FROM ProductList as pl
INNER JOIN ProductTypes as pt on pl.Type_Id = pt.Id;
#TheGameiswar
Sorry, now I got it what you meant ;-) Stupid me...
Here the solution which works now with correctly correlating:
insert into [dbo].[ser_number_all]
( Serialnumber
, TypeName
, Date
, Parent_Serialnumber
, JobNumber
, ProductNode)
select Serialnumber
, TypeName
, Date
, Parent_Serialnumber
, JobNumber
, ProductNode
from dbo.Hierachical_View_with_Jobnumbers as hv
where not exists (select 1
from ser_number_all as sna
where hv. Serialnumber = sna.Serialnumber);
Thank you all for your time and guiding me to the right direction :-)

Why SQL Server Optimizer do not use CHECK constraint definitions to find which table contains the rows?

I use SQL Server 2012 and I have a large table and I divided my table in some tables like below :
Create Table A2013
(
Id int identity(1,1),
CountA int ,
Name varchar(50),
ADate DATETIME NULL
CHECK (DATEPART(yy, ADate) = 2013)
)
Create Table A2014
(
Id int identity(1,1),
CountA int ,
Name varchar(50),
ADate DATETIME NULL
CHECK (DATEPART(yy, ADate) = 2014)
)
Insert Into A2013 Values ( 102 , 'A','20131011' )
Insert Into A2013 Values (15 , 'B' ,'20130211' )
Insert Into A2013 Values ( 54, 'C' ,'20131211' )
Insert Into A2013 Values ( 54, 'D' ,'20130611' )
Insert Into A2013 Values ( 95, 'E' ,'20130711' )
Insert Into A2013 Values (8754 , 'F' ,'20130310' )
Insert Into A2014 Values ( 102 , 'A','20141011' )
Insert Into A2014 Values (15 , 'B' ,'20140911' )
Insert Into A2014 Values ( 54, 'C' ,'20140711' )
Insert Into A2014 Values ( 54, 'D' ,'20141007' )
Insert Into A2014 Values ( 95, 'E' ,'20140411' )
Insert Into A2014 Values (8754 , 'F' ,'20140611' )
I created a partition view like below:
Create View A
As
Select * From A2013
Union
Select * From A2014
I hope SQL Optimizer use a good plan and use my CHECK constraint definitions to determine which member table contains the rows but it scan two table when run this query :
Select * From A Where A.ADate = '20140611'
I expected that SQL Optimiser do not use table A2013?!?
The CHECK CONSTRAINT expression must be sargable in order for the optimizer to eliminate the unneeded tables in the execution plan. The constraints below avoid applying a function to the column and are sargable:
CREATE TABLE dbo.A2013
(
Id int IDENTITY(1, 1)
, CountA int
, Name varchar(50)
, ADate datetime NULL
CONSTRAINT CK_A2013_ADate
CHECK ( ADate >= '20130101'
AND ADate < '20140101' )
);
CREATE TABLE dbo.A2014
(
Id int IDENTITY(1, 1)
, CountA int
, Name varchar(50)
, ADate datetime NULL
CONSTRAINT CK_A2014_ADate
CHECK ( ADate >= '20140101'
AND ADate < '20150101' )
);
The issue is not whether the expression is sargable. As far as I know, the term "sargable" applies to the use of indexes in queries. The question is whether SQL Server recognizes the where clause as matching the check constraint.
The check constraint you have is:
CHECK (DATEPART(yy, ADate) = 2014)
The where clause is:
Where A.ADate = '20140611'
The problem is that the second is not recognized as a subset of the first. You could fix this by adding redundancy:
Where A.ADate = '20140611' and DATEPART(yy, A.ADate) = 2014
Or, you could fix this by using ranges -- but be careful about data types, because data type conversion can definitely confuse the optimizer. I think the following will work:
CHECK ADate BETWEEN '2014-01-01' and '2014-12-31'
WHERE A.ADate = '2014-06-11'
(The hyphens are optional and can be dropped.)
The documentation (as far as I can tell) is not really explicit about the cause:
The SQL Server query optimizer recognizes that the search condition in
this SELECT statement references only rows in the May1998Sales and
Jun1998Sales tables. Therefore, it limits its search to those tables.
. . .
CHECK constraints are not needed for the partitioned view to return
the correct results. However, if the CHECK constraints have not been
defined, the query optimizer must search all the tables instead of
only those that cover the search condition on the partitioning column.
Without the CHECK constraints, the view operates like any other view
with UNION ALL. The query optimizer cannot make any assumptions about
the values stored in different tables and it cannot skip searching the
tables that participate in the view definition.

select with count for years

I have pickup table which looks like this
create table Pickup
(
PickupID int IDENTITY,
ClientID int ,
PickupDate date ,
PickupProxy varchar (200) ,
PickupHispanic bit default 0,
EthnCode varchar(5) ,
CategCode varchar (2) ,
AgencyID int,
Primary Key (PickupID),
);
and it containe pickups for clients
I need to create report based on this table which should looks like this
I know i need to use CASE but really do not know how to put years and calculate
average pickups for each year. and how to count pickups for specific year.so far i have only this
SELECT
DATEPART(YEAR, PickupDate)as 'Month'
FROM dbo.Pickup
group by DATEPART(YEAR, PickupDate)
WITH ROLLUP
Any ideas?
Best way to structure a query like this is to use an Index table. Ideally create this in your master database so it is available to all dbs on the server but can be created in the local db too.
You can create one like this
create table IndexTable
(
IndexID int NOT NULL,
Primary Key (IndexID),
);
Then fill it with the numbers 1 - n where n is big enough, say 1,000,000.
Like this
INSERT IndexTable
VALUES (1)
WHILE (SELECT MAX(IndexID) FROM IndexTable) < 1000000
INSERT IndexTable
SELECT IndexID + (SELECT MAX(IndexID) FROM IndexTable)
FROM IndexTable
Your query then uses this table to treat months as integers
SELECT DATEADD(month, 0, i.IndexID) Months
,COUNT(p.PickupDate)
,AVERAGE(*Whatever*)
FROM Pickup p
INNER JOIN
IndexTable i ON DATEDIFF(month, p.PickupDate, 0) = i.IndexID
GROUP BY i.IndexID
WITH ROLLUP