RANK() SQL Server execution plan issue - sql

What is driving SQL Server to use less optimal execution plan for queries where 6000+ rows are returned? I need to improve query performance for scenario where all rows are returned.
I select all fields and add rank over same three columns included in index. Depending on number of returned rows, query has two different execution plans, hence execution takes 0.2s or 3s respectively.
From 1 row returned up to ca. 5000 query runs fast. From 6000 rows returned up to all, query runs slow.
Table1 has ca. 38000 rows. Database runs on Azure SQL v12.
Table:
CREATE TABLE [dbo].[Table1](
[ID] [int] IDENTITY(1,1) NOT NULL,
[KOD_ID] [int] NULL,
[SYM] [nvarchar](20) NULL,
[AN] [nvarchar](35) NULL,
[A] [nvarchar](10) NULL,
[B] [nvarchar](2) NULL,
[C] [datetime] NULL,
[D] [datetime] NULL,
CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
GO
CREATE NONCLUSTERED INDEX [IX_Table1] ON [dbo].[Table1]
(
[KOD_ID] ASC,
[SYM] ASC,
[AN] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Queries:
SELECT TOP 6000 *, RANK() OVER(ORDER BY KOD_ID ASC, SYM ASC, AN ASC) AS Rank#
FROM [dbo].[Table1]
SELECT TOP 7000 *, RANK() OVER(ORDER BY KOD_ID ASC, SYM ASC, AN ASC) AS Rank#
FROM [dbo].[Table1]
Execution plans for both queries

CREATE NONCLUSTERED INDEX [IX_Table1] ON [dbo].[Table1]
(
[KOD_ID] ASC,
[SYM] ASC,
[AN] ASC
) INCLUDE ([A], [B], [C], [D]);
Create such kind of a covering index and it should scan this index and most likely sort won't even be needed because it's data is already sorted in index.
The key points in your queries are:
First plan has a key lookup, avoid them as much as possible (key lookup is additional scan for each row because index does not have them) create covering indexes with INCLUDED columns
Avoid sort operations too, they're costly to SQL Server
If you're alright with index rebuilds and favor reads over inserts, these could be alternate DDLs for your table considering that and KOD_ID, SYM, AN are not null-able:
If ID is needed to ensure uniqueness:
CREATE TABLE [dbo].[Table1] (
[KOD_ID] [int] NOT NULL
, [SYM] [nvarchar](20) NOT NULL
, [AN] [nvarchar](35) NOT NULL
, [ID] [int] IDENTITY(1, 1) NOT NULL
, [A] [nvarchar](10) NULL
, [B] [nvarchar](2) NULL
, [C] [datetime2] NULL
, [D] [datetime2] NULL
, CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ([KOD_ID], [SYM], [AN], [ID])
);
GO
If ID is not needed to ensure uniqueness:
CREATE TABLE [dbo].[Table1] (
[KOD_ID] [int] NOT NULL
, [SYM] [nvarchar](20) NOT NULL
, [AN] [nvarchar](35) NOT NULL
, [A] [nvarchar](10) NULL
, [B] [nvarchar](2) NULL
, [C] [datetime2] NULL
, [D] [datetime2] NULL
, CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ([KOD_ID], [SYM], [AN])
);
GO
Also, note that I use datetime2 instead of datetime, that's what Microsoft recommends: https://learn.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.

Related

Improve performance of slow sub query in SQL Server

I have a table with 1.3 million rows.
Query #1 takes 29 seconds to run in SQL Server 2016 Management Studio.
Query #1:
select
*,
(select Count(*)
from [dbo].[Results] t2
where t2.RaceDate < t1.RaceDate
and t1.HorseName = t2.HorseName
and t2.Position = '1'
and t1.CourseName = t2.CourseName
and t2.CountryCode = 'GB') as [CourseDistanceWinners]
from
[dbo].[Results] t1
But query #2 takes takes several hours with the only difference being t1.HorseName = t2.HorseName vs t1.TrainerName = t2.TrainerName. There will be many more matches but on TrainerName than HorseName but I wasn't expecting several hours.
Query #2:
select
*,
(select Count(*)
from [dbo].[Results] t2
where t2.RaceDate < t1.RaceDate
and t1.TrainerName = t2.TrainerName
and t2.Position = '1'
and t1.CourseName = t2.CourseName
and t2.CountryCode = 'GB') as [CourseDistanceWinners]
from
[dbo].[Results] t1
I've managed to get the query down to 15 minutes using the techniques below but I still think this is a very long time. Is there anything else I can do to improve performance of Query2 or a way to rewrite it for performance?
What I have tried so far
I've changed [TrainerName] [nvarchar](255) NULL, to [TrainerName] [nvarchar](50) NULL,
I've added a composite index and several non clustered indexes
CREATE INDEX idx_HorseName
ON [dbo].[Results] (HorseName);
CREATE INDEX idx_TrainerName
ON [dbo].[Results] (TrainerName);
CREATE INDEX idx_CourseName
ON [dbo].[Results] (CourseName);
CREATE INDEX idx_Position
ON [dbo].[Results] (Position);
CREATE INDEX idx_JockeyName
ON [dbo].[Results] (JockeyName);
CREATE INDEX idx_RaceDate
ON [dbo].[Results] (RaceDate);
CREATE INDEX idx_TrainerComposite
ON [dbo].[Results] (TrainerName, RaceDate, CourseName);
Further info:
Table structure:
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Results]
(
[CountryCode] [NVARCHAR](50) NULL,
[CourseName] [NVARCHAR](50) NULL,
[HorseName] [NVARCHAR](50) NOT NULL,
[HorseSuffix] [NVARCHAR](5) NOT NULL,
[JockeyName] [NVARCHAR](255) NULL,
[OwnerName] [NVARCHAR](255) NULL,
[Position] [NVARCHAR](255) NULL,
[PublishedTime] [NVARCHAR](6) NOT NULL,
[RaceDate] [DATETIME] NOT NULL,
[RaceTitle] [NVARCHAR](255) NULL,
[StallPosition] [NVARCHAR](255) NULL,
[TrainerName] [NVARCHAR](50) NULL,
[Rating] [INT] NULL,
CONSTRAINT [PK_Results_1]
PRIMARY KEY CLUSTERED ([HorseName] ASC,
[HorseSuffix] ASC,
[PublishedTime] ASC,
[RaceDate] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Query #1 execution plan:
Query #2 execution plan:
Use a window function!
select r.*,
sum(case when position = 1 and country_code = 'GB' then 1 else 0 end) over
(partition by horsename, coursename
order by racedate
rows between unbounded preceding and 1 preceding
) as CourseDistanceWinners
from [dbo].[Results] r

Inserto into ( Select ) with not auto increment

Have two tables
CREATE TABLE [dbo].[TABELAA]
(
[ID] [bigint] NOT NULL,
[PodatakA] [nvarchar](50) NULL,
[PodatakB] [nvarchar](50) NULL,
CONSTRAINT [PK_TABELAA]
PRIMARY KEY CLUSTERED ([ID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TABELAB]
(
[PodatakX] [nvarchar](50) NULL,
[PodatakY] [nvarchar](50) NULL
) ON [PRIMARY]
GO
I need to insert value from tabelaB to tabelaA with autogenerating ID in tabelaA so I need something like this. But this would be great if there is only one row. I'm talking about thousands of rows where it should auto generate id exact like AutoIncrement (1)
Useless try where I think I should use OVER
INSERT INTO TABELAA
SELECT
(SELECT MAX(id) + 1 FROM TabelaA) AS Id, *
FROM
tabelaB
You are looking for the IDENTITY:
CREATE TABLE [dbo].[TABLAAA](
[ID] [bigint] IDENTITY(1, 1) PRIMARY KEY, -- NOT NULL is handled by PRIMARY KEY
[PodatakA] [nvarchar](50) NULL,
[PodatakB] [nvarchar](50) NULL
);
INSERT INTO TABLEAA (PodatakA, PodatakB)
SELECT PodatakA, PodatakB
FROM TABLEBB;
I agree with Rahul's comment and Gordon that if you can modify your schema it would make the most sense to add an Identity Column. However if you cannot you can still accomplish what you want using a couple of methods.
One method is get the MAX ID of TableAA and then add a ROW_NUMBER() to it like so:
INSERT INTO TableAA (ID, PodatakA, PodatakB)
SELECT
m.CurrentMaxId + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
,b.PodatakA
,b.PodatakB
FROM
TableAB b
CROSS APPLY (
SELECT ISNULL(MAX(Id),0) as CurrentMaxId
FROM
TableAA) m
Again this would be work around the most ideal solution is to specify IDENTITY
Also this is susceptible to problems due to simultaneous writes and other scenarios in a heavy traffic DB.

Relations and joins between tables with primary key clustered index

Created two tables using PRIMARY KEY CLUSTERED with identity and Date ON partition because a task splits and truncates partitions older that 30 days. keeping only the recent records.
I have a special field to correlate the tables, but the joins are painfully slow even with indexes. Could you suggest how to optimize?
Next the tables and the Join statement:
CREATE TABLE [dbo].[Redeem](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Ticket] [nvarchar](64) NOT NULL,
[CorrelationTicket] [nvarchar](64) NOT NULL,
[CreatedUTC] [datetime] NOT NULL,
[CreatedDate] [date] NOT NULL,
[Redeem fields here...]
CONSTRAINT [PK_Redeem] PRIMARY KEY CLUSTERED
(
[CreatedDate] ASC,
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
ON myPS([CreatedDate]);
CREATE NONCLUSTERED INDEX [IX_Redeem_CorrelationTicket]
ON [dbo].[Redeem]([CreatedDate] ASC, [CorrelationTicket] ASC)
ON [myPS] ([CreatedDate]);
CREATE TABLE [dbo].[Validate](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Ticket] [nvarchar](64) NOT NULL,
[CorrelationTicket] [nvarchar](64) NOT NULL,
[CreatedUTC] [datetime] NOT NULL,
[CreatedDate] [date] NOT NULL,
[Validate fields here...]
CONSTRAINT [PK_Validate] PRIMARY KEY CLUSTERED
(
[CreatedDate] ASC,
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
ON myPS([CreatedDate]);
CREATE NONCLUSTERED INDEX [IX_Validate_CorrelationTicket]
ON [dbo].[Validate]([CreatedDate] ASC, [CorrelationTicket] ASC)
ON [myPS] ([CreatedDate]);
And this is the Join:
SELECT top 100
v.*,
r.*
from
Validate v
LEFT OUTER join Redeem r
on v.CorrelationTicket = r.CorrelationTicket
ORDER BY v.CreatedDate DESC
Thank you Krintner!
The problem was the ORDER BY as you suggested. it was sorting the entire result set (not sure why).
I do need to sort, but changing the Index to be DESC did the trick.
I also follow the recommendation and use only CorrelationTicket in the INDEX.
CONSTRAINT [PK_Redeem] PRIMARY KEY CLUSTERED
(
[CreatedDate] DESC,
[Id] DESC
)

Why is my simple SQL statement taking so long to execute and how do i go about finding the issue?

I have a very simple SQL query:
select o.Visit_ID
from Datamart.dbo.ww_Orders o
inner join Datamart.dbo.ww_Order_Details on o.Visit_ID = ww_Order_Details.Visit_ID
where o.runstamp = '20160422'
this query takes < 0 seconds to return 11173 rows
When I add the GROUP BY statement:
select o.Visit_ID
from Datamart.dbo.ww_Orders o
inner join Datamart.dbo.ww_Order_Details on o.Visit_ID = ww_Order_Details.Visit_ID
where o.runstamp = '20160422'
group by o.Visit_ID
the server takes 6min 30 sec to retrieve the 3047 rows.
I would expect the GROUP BY query to take not that much longer than the original. How do I go about finding what the issues are? thanks
here are the table definitions:
Orders:
CREATE TABLE [dbo].[ww_Orders](
[Visit_ID] [int] NOT NULL,
[Member_ID] [int] NOT NULL,
[Membership_no] [varchar](20) NULL,
[Member_Card_Num_Orig] [varchar](16) NULL,
[SCV_ID] [int] NULL,
[Meeting_No] [int] NULL,
[Location_Name] [varchar](128) NULL,
[Leader_No] [int] NULL,
[CashAmt] [decimal](18, 2) NULL,
[EFTAmt] [decimal](18, 2) NULL,
[VouchAmt] [decimal](18, 2) NULL,
[Meet_Date] [datetime] NULL,
[runstamp] [varchar](50) NULL,
CONSTRAINT [PK_dbo.ww_Orders] PRIMARY KEY CLUSTERED
(
[Visit_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Order Details:
CREATE TABLE [dbo].[ww_Order_Details](
[ord_det_pk] [int] IDENTITY(1,1) NOT NULL,
[Visit_ID] [int] NOT NULL,
[Item_Code] [nvarchar](50) NULL,
[Item_Name] [nvarchar](50) NULL,
[Qty] [int] NULL,
[Amt] [decimal](18, 2) NULL,
[Category_Code] [nvarchar](20) NULL,
CONSTRAINT [PK_ww_Order_Details] PRIMARY KEY CLUSTERED
(
[ord_det_pk] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ww_Order_Details] WITH CHECK ADD CONSTRAINT [FK_ww_Order_Details_ww_Orders] FOREIGN KEY([Visit_ID])
REFERENCES [dbo].[ww_Orders] ([Visit_ID])
GO
ALTER TABLE [dbo].[ww_Order_Details] CHECK CONSTRAINT [FK_ww_Order_Details_ww_Orders]
GO
I'd add an index onto ww_Order_Details Visit_ID, probably make it the clustered index and drop the index on ord_det_pk. Also it might make more sense as an exists?
select o.Visit_ID
from Datamart.dbo.ww_Orders o
where exists (select 0 from Datamart.dbo.ww_Order_Details where o.Visit_ID = ww_Order_Details.Visit_ID)
and o.runstamp = '20160422'
I usually try to use a common table expression in cases like these, when a part of the query is really fast but an addition of a simple operation makes it really slow, usually it helps.
Try:
WITH CTE AS (
select o.Visit_ID
from Datamart.dbo.ww_Orders o
inner join Datamart.dbo.ww_Order_Details od on o.Visit_ID = od.Visit_ID
where o.runstamp = '20160422'
)
SELECT Visit_ID FROM CTE
group by Visit_ID
If this helps you can try to compare execution plans for your original query and this version to see whats going on

How can I improve the following query (filtered on XML with LIKE)?

90% of the cost for following query is according to execution plan related to clustered scan against the primary key index. The average duration is around 2 seconds per execution. The execution count for this in our applications is very high and therefore this results in large load. Can you help me improve this either with index or restructure of query?
CREATE TABLE [dbo].[EventLog](
[Id] [uniqueidentifier] NOT NULL,
[StartTime] [datetime] NOT NULL,
[StopTime] [datetime] NULL,
[executionStatus] [smallint] NOT NULL,
[executionType] [smallint] NOT NULL,
[Info] [xml] NULL,
CONSTRAINT [PK_Log] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY];
GO
SELECT TOP(1) execution.Id,
StartTime,
StopTime,
executionStatus,
executionType,
execution.Info
FROM dbo.EventLog execution INNER JOIN
(SELECT Id,
cast(Info as VARCHAR(MAX)) as Info
FROM dbo.EventLog
WHERE (executionType=1 OR executionType=4)
AND executionStatus=1
AND StopTime IS NOT NULL) as SUBQ
on execution.Id=SUBQ.Id
WHERE SUBQ.Info LIKE '%<Name>For Trial</Name>%'
AND SUBQ.Info LIKE '%<Type>2</Type>%'
ORDER BY StartTime DESC;
GO
Thanks in advance!
What's the use of Join at all? Just drop it and move all conditions inside 'where' branch.