Relations and joins between tables with primary key clustered index

Relations and joins between tables with primary key clustered index - sql

Created two tables using PRIMARY KEY CLUSTERED with identity and Date ON partition because a task splits and truncates partitions older that 30 days. keeping only the recent records.
I have a special field to correlate the tables, but the joins are painfully slow even with indexes. Could you suggest how to optimize?
Next the tables and the Join statement:
CREATE TABLE [dbo].[Redeem](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Ticket] [nvarchar](64) NOT NULL,
[CorrelationTicket] [nvarchar](64) NOT NULL,
[CreatedUTC] [datetime] NOT NULL,
[CreatedDate] [date] NOT NULL,
[Redeem fields here...]
CONSTRAINT [PK_Redeem] PRIMARY KEY CLUSTERED
(
[CreatedDate] ASC,
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
ON myPS([CreatedDate]);
CREATE NONCLUSTERED INDEX [IX_Redeem_CorrelationTicket]
ON [dbo].[Redeem]([CreatedDate] ASC, [CorrelationTicket] ASC)
ON [myPS] ([CreatedDate]);
CREATE TABLE [dbo].[Validate](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Ticket] [nvarchar](64) NOT NULL,
[CorrelationTicket] [nvarchar](64) NOT NULL,
[CreatedUTC] [datetime] NOT NULL,
[CreatedDate] [date] NOT NULL,
[Validate fields here...]
CONSTRAINT [PK_Validate] PRIMARY KEY CLUSTERED
(
[CreatedDate] ASC,
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
ON myPS([CreatedDate]);
CREATE NONCLUSTERED INDEX [IX_Validate_CorrelationTicket]
ON [dbo].[Validate]([CreatedDate] ASC, [CorrelationTicket] ASC)
ON [myPS] ([CreatedDate]);
And this is the Join:
SELECT top 100
v.*,
r.*
from
Validate v
LEFT OUTER join Redeem r
on v.CorrelationTicket = r.CorrelationTicket
ORDER BY v.CreatedDate DESC

Thank you Krintner!
The problem was the ORDER BY as you suggested. it was sorting the entire result set (not sure why).
I do need to sort, but changing the Index to be DESC did the trick.
I also follow the recommendation and use only CorrelationTicket in the INDEX.
CONSTRAINT [PK_Redeem] PRIMARY KEY CLUSTERED
(
[CreatedDate] DESC,
[Id] DESC
)

Related

How to properly create table with 3 columns as primary key?

I have been working with SQL for 3 month, I would like to know create a table with 3 columns as primary key, Any help would be great thank you.
This code below is a Scrip to Create table generated my SQL Server Management Studio of the table
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[REP]
(
[id] [UNIQUEIDENTIFIER] ROWGUIDCOL NOT NULL,
[id_client] [NCHAR](30) NULL,
[id_representant] [UNIQUEIDENTIFIER] NULL,
[date_debut] [DATE] NULL,
[date_fin] [DATE] NULL,
CONSTRAINT [PK_REP]
PRIMARY KEY CLUSTERED ([id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[REP]
ADD CONSTRAINT [DF_REP_id] DEFAULT (NEWID()) FOR [id]
GO

First all the columns that are part of the primary key have to be not null.
then defining the key just add all columns as comma separated list.
CREATE TABLE [dbo].[REP](
[id] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[id_client] [nchar](30) NOT NULL,
[id_representant] [uniqueidentifier] NULL,
[date_debut] [date] NULL,
[date_fin] [date] NULL,
CONSTRAINT [PK_REP] PRIMARY KEY CLUSTERED
( [id] ASC,id_client ASC )
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
In the code above I changed the column id_client to be not null and added it to the definition of the primary key ... ([id] ASC, id_client ASC).

To add more columns to a primary Key contraint you'd add the additional fields after the first field.
[id] ASC, nextfield ASC, nextfield2 asc

You can have a one composite key that comprise of more than one primary key.
CREATE TABLE xyz (
primary key (id,id_client,id_representer)
);

RANK() SQL Server execution plan issue

What is driving SQL Server to use less optimal execution plan for queries where 6000+ rows are returned? I need to improve query performance for scenario where all rows are returned.
I select all fields and add rank over same three columns included in index. Depending on number of returned rows, query has two different execution plans, hence execution takes 0.2s or 3s respectively.
From 1 row returned up to ca. 5000 query runs fast. From 6000 rows returned up to all, query runs slow.
Table1 has ca. 38000 rows. Database runs on Azure SQL v12.
Table:
CREATE TABLE [dbo].[Table1](
[ID] [int] IDENTITY(1,1) NOT NULL,
[KOD_ID] [int] NULL,
[SYM] [nvarchar](20) NULL,
[AN] [nvarchar](35) NULL,
[A] [nvarchar](10) NULL,
[B] [nvarchar](2) NULL,
[C] [datetime] NULL,
[D] [datetime] NULL,
CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
GO
CREATE NONCLUSTERED INDEX [IX_Table1] ON [dbo].[Table1]
(
[KOD_ID] ASC,
[SYM] ASC,
[AN] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Queries:
SELECT TOP 6000 *, RANK() OVER(ORDER BY KOD_ID ASC, SYM ASC, AN ASC) AS Rank#
FROM [dbo].[Table1]
SELECT TOP 7000 *, RANK() OVER(ORDER BY KOD_ID ASC, SYM ASC, AN ASC) AS Rank#
FROM [dbo].[Table1]
Execution plans for both queries

CREATE NONCLUSTERED INDEX [IX_Table1] ON [dbo].[Table1]
(
[KOD_ID] ASC,
[SYM] ASC,
[AN] ASC
) INCLUDE ([A], [B], [C], [D]);
Create such kind of a covering index and it should scan this index and most likely sort won't even be needed because it's data is already sorted in index.
The key points in your queries are:
First plan has a key lookup, avoid them as much as possible (key lookup is additional scan for each row because index does not have them) create covering indexes with INCLUDED columns
Avoid sort operations too, they're costly to SQL Server
If you're alright with index rebuilds and favor reads over inserts, these could be alternate DDLs for your table considering that and KOD_ID, SYM, AN are not null-able:
If ID is needed to ensure uniqueness:
CREATE TABLE [dbo].[Table1] (
[KOD_ID] [int] NOT NULL
, [SYM] [nvarchar](20) NOT NULL
, [AN] [nvarchar](35) NOT NULL
, [ID] [int] IDENTITY(1, 1) NOT NULL
, [A] [nvarchar](10) NULL
, [B] [nvarchar](2) NULL
, [C] [datetime2] NULL
, [D] [datetime2] NULL
, CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ([KOD_ID], [SYM], [AN], [ID])
);
GO
If ID is not needed to ensure uniqueness:
CREATE TABLE [dbo].[Table1] (
[KOD_ID] [int] NOT NULL
, [SYM] [nvarchar](20) NOT NULL
, [AN] [nvarchar](35) NOT NULL
, [A] [nvarchar](10) NULL
, [B] [nvarchar](2) NULL
, [C] [datetime2] NULL
, [D] [datetime2] NULL
, CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ([KOD_ID], [SYM], [AN])
);
GO
Also, note that I use datetime2 instead of datetime, that's what Microsoft recommends: https://learn.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.

Why is my simple SQL statement taking so long to execute and how do i go about finding the issue?

I have a very simple SQL query:
select o.Visit_ID
from Datamart.dbo.ww_Orders o
inner join Datamart.dbo.ww_Order_Details on o.Visit_ID = ww_Order_Details.Visit_ID
where o.runstamp = '20160422'
this query takes < 0 seconds to return 11173 rows
When I add the GROUP BY statement:
select o.Visit_ID
from Datamart.dbo.ww_Orders o
inner join Datamart.dbo.ww_Order_Details on o.Visit_ID = ww_Order_Details.Visit_ID
where o.runstamp = '20160422'
group by o.Visit_ID
the server takes 6min 30 sec to retrieve the 3047 rows.
I would expect the GROUP BY query to take not that much longer than the original. How do I go about finding what the issues are? thanks
here are the table definitions:
Orders:
CREATE TABLE [dbo].[ww_Orders](
[Visit_ID] [int] NOT NULL,
[Member_ID] [int] NOT NULL,
[Membership_no] [varchar](20) NULL,
[Member_Card_Num_Orig] [varchar](16) NULL,
[SCV_ID] [int] NULL,
[Meeting_No] [int] NULL,
[Location_Name] [varchar](128) NULL,
[Leader_No] [int] NULL,
[CashAmt] [decimal](18, 2) NULL,
[EFTAmt] [decimal](18, 2) NULL,
[VouchAmt] [decimal](18, 2) NULL,
[Meet_Date] [datetime] NULL,
[runstamp] [varchar](50) NULL,
CONSTRAINT [PK_dbo.ww_Orders] PRIMARY KEY CLUSTERED
(
[Visit_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Order Details:
CREATE TABLE [dbo].[ww_Order_Details](
[ord_det_pk] [int] IDENTITY(1,1) NOT NULL,
[Visit_ID] [int] NOT NULL,
[Item_Code] [nvarchar](50) NULL,
[Item_Name] [nvarchar](50) NULL,
[Qty] [int] NULL,
[Amt] [decimal](18, 2) NULL,
[Category_Code] [nvarchar](20) NULL,
CONSTRAINT [PK_ww_Order_Details] PRIMARY KEY CLUSTERED
(
[ord_det_pk] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ww_Order_Details] WITH CHECK ADD CONSTRAINT [FK_ww_Order_Details_ww_Orders] FOREIGN KEY([Visit_ID])
REFERENCES [dbo].[ww_Orders] ([Visit_ID])
GO
ALTER TABLE [dbo].[ww_Order_Details] CHECK CONSTRAINT [FK_ww_Order_Details_ww_Orders]
GO

I'd add an index onto ww_Order_Details Visit_ID, probably make it the clustered index and drop the index on ord_det_pk. Also it might make more sense as an exists?
select o.Visit_ID
from Datamart.dbo.ww_Orders o
where exists (select 0 from Datamart.dbo.ww_Order_Details where o.Visit_ID = ww_Order_Details.Visit_ID)
and o.runstamp = '20160422'

I usually try to use a common table expression in cases like these, when a part of the query is really fast but an addition of a simple operation makes it really slow, usually it helps.
Try:
WITH CTE AS (
select o.Visit_ID
from Datamart.dbo.ww_Orders o
inner join Datamart.dbo.ww_Order_Details od on o.Visit_ID = od.Visit_ID
where o.runstamp = '20160422'
)
SELECT Visit_ID FROM CTE
group by Visit_ID
If this helps you can try to compare execution plans for your original query and this version to see whats going on

How can I improve the following query (filtered on XML with LIKE)?

90% of the cost for following query is according to execution plan related to clustered scan against the primary key index. The average duration is around 2 seconds per execution. The execution count for this in our applications is very high and therefore this results in large load. Can you help me improve this either with index or restructure of query?
CREATE TABLE [dbo].[EventLog](
[Id] [uniqueidentifier] NOT NULL,
[StartTime] [datetime] NOT NULL,
[StopTime] [datetime] NULL,
[executionStatus] [smallint] NOT NULL,
[executionType] [smallint] NOT NULL,
[Info] [xml] NULL,
CONSTRAINT [PK_Log] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY];
GO
SELECT TOP(1) execution.Id,
StartTime,
StopTime,
executionStatus,
executionType,
execution.Info
FROM dbo.EventLog execution INNER JOIN
(SELECT Id,
cast(Info as VARCHAR(MAX)) as Info
FROM dbo.EventLog
WHERE (executionType=1 OR executionType=4)
AND executionStatus=1
AND StopTime IS NOT NULL) as SUBQ
on execution.Id=SUBQ.Id
WHERE SUBQ.Info LIKE '%<Name>For Trial</Name>%'
AND SUBQ.Info LIKE '%<Type>2</Type>%'
ORDER BY StartTime DESC;
GO
Thanks in advance!

What's the use of Join at all? Just drop it and move all conditions inside 'where' branch.

Select highest rated, oldest track

I have several tables:
CREATE TABLE [dbo].[Tracks](
[Id] [uniqueidentifier] NOT NULL,
[Artist_Id] [uniqueidentifier] NOT NULL,
[Album_Id] [uniqueidentifier] NOT NULL,
[Title] [nvarchar](255) NOT NULL,
[Length] [int] NOT NULL,
CONSTRAINT [PK_Tracks_1] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[TrackHistory](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Track_Id] [uniqueidentifier] NOT NULL,
[Datetime] [datetime] NOT NULL,
CONSTRAINT [PK_TrackHistory] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [cooltunes].[dbo].[TrackHistory]
([Track_Id]
,[Datetime])
VALUES
("335294B0-735E-4E2C-8389-8326B17CE813"
,GETDATE())
CREATE TABLE [dbo].[Ratings](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Track_Id] [uniqueidentifier] NOT NULL,
[User_Id] [uniqueidentifier] NOT NULL,
[Rating] [tinyint] NOT NULL,
CONSTRAINT [PK_Ratings] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [cooltunes].[dbo].[Ratings]
([Track_Id]
,[User_Id]
,[Rating])
VALUES
("335294B0-735E-4E2C-8389-8326B17CE813"
,"C7D62450-8BE6-40F6-80F1-A539DA301772"
,1)
Users
User_Id|Guid
Other fields
Links between the tables are pretty obvious.
TrackHistory has each track added to it as a row whenever it is played ie. a track will appear in there many times.
Ratings value will either be 1 or -1.
What I'm trying to do is select the Track with the highest rating, that is more than 2 hours old, and if there is a duplicate rating for a track (ie a track receives 6 +1 ratings and 1 - rating, giving that track a total rating of 5, another track also has a total rating of 5), the track that was last played the longest ago should be returned. (If all tracks have been played within the last 2 hours, no rows should be returned)
I'm getting somewhere doing each part individually using the link above, SUM(Value) and GROUP BY Track_Id, but I'm having trouble putting it all together.
Hopefully someone with a bit more (MS)SQL knowledge will be able to help me. Many thanks!

select top 1 t.Id, SUM(r.Rating) as Rating, MAX(Datetime) as LastPlayed
from Tracks t
inner join TrackHistory h on t.Id = h.Track_Id
inner join Ratings r on t.Id = r.Track_Id
where h.Track_Id not in (
select Track_Id
from TrackHistory
where Datetime > DATEADD(HOUR, -2, getdate())
)
group by t.Id
order by Rating desc, LastPlayed

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Relations and joins between tables with primary key clustered index - sql

Related

How to properly create table with 3 columns as primary key?

RANK() SQL Server execution plan issue

Why is my simple SQL statement taking so long to execute and how do i go about finding the issue?

How can I improve the following query (filtered on XML with LIKE)?

Select highest rated, oldest track

Categories

Resources