Weird result sets counts upon datetime criteria - sql

I have the following table (the server is SQL Server 2016)
CREATE TABLE [dbo].[MyTable](
[RefID] [uniqueidentifier] NOT NULL,
[LG_ID] [bigint] NOT NULL,
[APP_SERVER_ID] [int] NOT NULL,
[WEB_SERVER_ID] [int] NOT NULL,
[WEB_BROWSER_ID] [int] NULL,
[CLIENT_TYPE_ID] [int] NOT NULL,
[OS_NAME_ID] [int] NULL,
[CLIENT_VERSION] [varchar](10) NULL,
[OS_VERSION] [varchar](10) NULL,
[BROWSER_VERSION] [varchar](10) NULL,
[DETAILS] [nvarchar](255) NULL,
[LG_SOURCE_TABLE] [char](2) NOT NULL,
[LG_TIME] [datetime] NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[RefID] ASC,
[LG_ID] ASC,
[LG_SOURCE_TABLE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Primary]
) ON [Primary]
CREATE NONCLUSTERED INDEX [IX_MyTable_LG_TIME] ON [dbo].[MyTable]
(
[LG_TIME] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MyTable] ADD CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[RefID] ASC,
[LG_ID] ASC,
[LG_SOURCE_TABLE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
And I am querying data with the following methods (I am using only method 1, the other are tests that I am doing):
METHOD 1:
SELECT
[RefID],
[LG_ID],
[APP_SERVER_ID],
[WEB_SERVER_ID],
[WEB_BROWSER_ID],
[CLIENT_TYPE_ID],
[OS_NAME_ID],
[CLIENT_VERSION],
[OS_VERSION],
[BROWSER_VERSION],
[DETAILS],
[LG_SOURCE_TABLE],
[LG_TIME]
FROM
[MyTable] WITH (NOLOCK)
WHERE
[LG_TIME] >= '20190923'
AND [LG_TIME] < '20190924'
--returns 19.880.317 rows
METHOD 2:
SELECT
[RefID],
[LG_ID],
[APP_SERVER_ID],
[WEB_SERVER_ID],
[WEB_BROWSER_ID],
[CLIENT_TYPE_ID],
[OS_NAME_ID],
[CLIENT_VERSION],
[OS_VERSION],
[BROWSER_VERSION],
[DETAILS],
[LG_SOURCE_TABLE],
[LG_TIME]
FROM
[MyTable] WITH (NOLOCK)
WHERE
CAST([LG_TIME] AS DATE) = '20190923'
--returns 19.879.488 rows
METHOD 3:
SELECT
*
FROM
[MyTable] WITH (NOLOCK)
WHERE
[LG_TIME] >= '20190923'
AND [LG_TIME] < '20190924'
--returns 19.880.186 rows
METHOD 4:
SELECT
*
FROM
[MyTable] WITH (NOLOCK)
WHERE
CAST([LG_TIME] AS DATE) = '20190923'
--returns 19.879.488 rows
And as you can see I am having different result sets each time.
I don't understand why this is happening, am I doing something wrong?
Which is the correct/safe way to get the data I need based on datetime range?
I am doing a bcp out/in from production server (which has 1.2 billion rows but will be truncated soon) to an archive (which has 10.5 billion rows) and sometimes I am losing a record because they have the same [LG_TIME] (maybe I will create a different question about that).
EDIT 1: Some extra info:
The query (METHOD 1) is part of BCP OUT routine that happens daily at about 00:05 and the date range is always the same, 2 current days ago.
I have strict guidelines to always use WITH (NOLOCK) at queries at production servers.
There are millions of rows that have LG_TIME = NULL at the table but these rows are obsolete and will be deleted, i don't know if these rows cause any issues with my query.
The table is part of logging mechanism of the application, only INSERTS are happening and DELETEs very early morning (about 02:00-03:00) when the daily archiving is success. No one else is reading the table that time.
EDIT 2 (2019-09-26 10:30):
So i added LG_TIME IS NOT NULL to all my queries at where criteria and these are my results:
METHOD 1: 19.879.488 rows
METHOD 2: 19.879.488 rows
METHOD 3: 19.880.036 rows
METHOD 4: 19.879.488 rows
Still weird!

Related

OFFSET ... FETCH is slow on high paging value

This is my scenario:
CREATE TABLE [dbo].[tblSMSSendQueueMain](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SendMethod] [int] NOT NULL
CONSTRAINT [PK_tblSMSSendQueueLog] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE TABLE [dbo].[tblSMSSendQueueMainSendStatus](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[QueueID] [int] NULL,
[SendStatus] [int] NULL,
[StatusDate] [datetime] NULL,
[UserID] [int] NULL,
CONSTRAINT [PK_tblSMSSendQueueMainSendStatus] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
and some indexes:
CREATE NONCLUSTERED INDEX [IX_tblSMSSendQueueMainSendStatus_SendStatus_Single] ON [dbo].[tblSMSSendQueueMainSendStatus]
(
[SendStatus] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_tblSMSSendQueueMain_SendMethod] ON [dbo].[tblSMSSendQueueMain]
(
[SendMethod] ASC,
[ID] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
each table have about 13m rows
QueueID column of tblSMSSendQueueMainSendStatus is foreign key of ID column in tblSMSSendQueueMain.
The server has an 8 cores Xeon CPU and 8GB RAM.
I use offset and fetch for my paging plan, its perfect and OK for offset under 100k but when the offset going up (more than 100k), the query response is slow, and takes about 5 or 6 seconds to run.
This is my query:
SELECT q.ID
FROM tblSMSSendQueueMain q
INNER JOIN tblSMSSendQueueMainSendStatus qs
ON q.ID = qs.QueueID
WHERE 1 = 1
AND qs.SendStatus = 5
AND [SendMethod] = 19
ORDER BY q.ID desc OFFSET 10 * (1000000 - 1) ROWS
FETCH NEXT 10 ROWS ONLY
Does anyone have any idea where I am going wrong?
The reason this is so slow is that the only way for the server to get the correct starting row is by reading every single row before it.
You are much better off using Keyset Pagination. Instead of paging by starting row-number, pass in a parameter of the starting key.
For this to work, you must return a unique column or columns, and for this to be performant they should be indexed well.
Pass in #startingRow as the previous batch's highest ID, you can get this any way you like. E.g. I have used an ORDER BY so it will be the last row, or your client app will be able to retrieve it from a variable.
SELECT TOP (10)
q.ID
FROM tblSMSSendQueueMain q
INNER JOIN tblSMSSendQueueMainSendStatus qs
ON q.ID = qs.QueueID
WHERE 1 = 1
AND qs.SendStatus = 5
AND q.[SendMethod] = 19
AND qs.ID > #startingRow -- drop this line for the first query
ORDER BY qs.ID;
I must say, your query is somewhat strange. If the foreign key is q.ID = qs.QueueID, then you will get multiple identical results if you are just querying q.ID. I suspect you actually only want q.ID, in which case that is your unique key:
SELECT TOP (10) DISTINCT
q.ID
FROM tblSMSSendQueueMain q
INNER JOIN tblSMSSendQueueMainSendStatus qs
ON q.ID = qs.QueueID
WHERE 1 = 1
AND qs.SendStatus = 5
AND q.[SendMethod] = 19
AND q.ID > #startingRow -- drop this line for the first query
ORDER BY q.ID;
Alternatively, I would prefer an EXISTS/IN as it more clearly states the requirement:
SELECT TOP (10)
q.ID
FROM tblSMSSendQueueMain q
WHERE 1 = 1
AND q.[SendMethod] = 19
AND q.ID IN (
SELECT qs.QueueID
FROM tblSMSSendQueueMainSendStatus qs
WHERE qs.SendStatus = 5
)
AND q.ID > #startingRow -- drop this line for the first query
ORDER BY q.ID;

SQL Server slow select from large Table

i have 2 Really Big sql server Database tables for IOT Project
First TABLE IS Message (rows count 7,423,889,085 rows)
CREATE TABLE [aymax].[Message](
[MessageId] [bigint] IDENTITY(1,1) NOT NULL,
[ObjectId] [int] NOT NULL,
[TimeStamp] [datetime] NOT NULL CONSTRAINT [DF__Message__TimeSta__3B75D760] DEFAULT (getdate()),
[GpsTime] [datetime] NOT NULL,
[VisibleSatelites] [int] NOT NULL,
[X] [float] NOT NULL,
[Y] [float] NOT NULL,
CONSTRAINT [Message_PK] PRIMARY KEY NONCLUSTERED
(
[MessageId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Second table is MessageSensors , row count (26,359,568,037 rows) , this table have value for each sensor in message table
CREATE TABLE [aymax].[MessageSensors](
[MessageId] [bigint] NOT NULL,
[DataSourceId] [int] NOT NULL,
[Value] [float] NOT NULL CONSTRAINT [DF__AnalogDat__Value__5812160E] DEFAULT ((0)),
CONSTRAINT [AnalogData_PK] PRIMARY KEY CLUSTERED
(
[MessageId] ASC,
[DataSourceId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
my problem that seek by time interval between 2 date time is really slow , also it became more slow if i select with message sensor data , also when i use sp_BlitzIndex check from brentozar.com it say that i have
"Indexaphobia: High value missing index"
[aymax].[MessageSensors] (EQUALITY: [DataSourceId], [Value] INCLUDES: [MessageId] )
[aymax].[MessageSensors] EQUALITY: [Value] INCLUDES: [MessageId], [DataSourceId]
I belive that create this 2 index is will increase storage alot , also will take too much time to be created , i need your advice for both table regarding index
my current indexes
1-
CREATE NONCLUSTERED INDEX [IX_gpstime_objectid] ON [aymax].[Message]
(
[GpsTime] ASC
)
INCLUDE ( [MessageId],
[ObjectId]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
2-
alter TABLE [aymax].[Message] ADD CONSTRAINT [Message_PK] PRIMARY KEY NONCLUSTERED
(
[MessageId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
3rd-
ALTER TABLE [aymax].[MessageSensors] ADD CONSTRAINT [AnalogData_PK] PRIMARY KEY CLUSTERED
(
[MessageId] ASC,
[DataSourceId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
4-
CREATE NONCLUSTERED INDEX [MessageData_DataSourceId_IDX] ON [aymax].[MessageSensors]
(
[DataSourceId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
any help please , i need to make a fast retrieve from message , and message sensor
update
while doing some investigate i found that select float value will slow up the result too much , from 1 second to 3 minutes
SELECT m.messageid,
m.objectid,
m.gpstime,
m.x,
m.y,
-- slow is here if i replace md.value with md.messageId will return fast , md.value is float
md.Value ,
0
FROM aymax.[message] m WITH (nolock)
left JOIN aymax.MessageSensors md WITH (nolock)
ON m.messageid = md.messageid
AND md.datasourceid = 425732
WHERE m.objectid = 14099
AND m.gpstime BETWEEN '2017-04-01 19:46:18.607' AND '2017-04-10 19:05:18.607'
Possible solutions:
Filtered index (filter by date and do not index old data)
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-indexes.
Clustered index on GpsTime, MessageId (Espessially if you have no plans about another indexes). Requires rebuild your table.
Partitions (see #Siyaul's comments)

How can I improve the following query (filtered on XML with LIKE)?

90% of the cost for following query is according to execution plan related to clustered scan against the primary key index. The average duration is around 2 seconds per execution. The execution count for this in our applications is very high and therefore this results in large load. Can you help me improve this either with index or restructure of query?
CREATE TABLE [dbo].[EventLog](
[Id] [uniqueidentifier] NOT NULL,
[StartTime] [datetime] NOT NULL,
[StopTime] [datetime] NULL,
[executionStatus] [smallint] NOT NULL,
[executionType] [smallint] NOT NULL,
[Info] [xml] NULL,
CONSTRAINT [PK_Log] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY];
GO
SELECT TOP(1) execution.Id,
StartTime,
StopTime,
executionStatus,
executionType,
execution.Info
FROM dbo.EventLog execution INNER JOIN
(SELECT Id,
cast(Info as VARCHAR(MAX)) as Info
FROM dbo.EventLog
WHERE (executionType=1 OR executionType=4)
AND executionStatus=1
AND StopTime IS NOT NULL) as SUBQ
on execution.Id=SUBQ.Id
WHERE SUBQ.Info LIKE '%<Name>For Trial</Name>%'
AND SUBQ.Info LIKE '%<Type>2</Type>%'
ORDER BY StartTime DESC;
GO
Thanks in advance!
What's the use of Join at all? Just drop it and move all conditions inside 'where' branch.

Optimizing CLUSTERED INDEX for use with JOIN

table optin_channel_1 (for each 'channel' there's a dedicated table)
CREATE TABLE [dbo].[optin_channel_1](
[key_id] [bigint] NOT NULL,
[valid_to] [datetime] NOT NULL,
[valid_from] [datetime] NOT NULL,
[key_type_id] [int] NOT NULL,
[optin_flag] [tinyint] NOT NULL,
[source_proc_id] [int] NOT NULL,
[date_inserted] [datetime] NOT NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX [ix_id] ON [dbo].[optin_channel_1]
(
[key_type_id] ASC,
[key_id] ASC,
[valid_to] ASC,
[valid_from] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
table profile_conns
CREATE TABLE [dbo].[profile_conns](
[profile_key_id] [bigint] NOT NULL,
[valid_to] [datetime] NOT NULL,
[valid_from] [datetime] NOT NULL,
[conn_key_id] [bigint] NOT NULL,
[conn_key_type_id] [int] NOT NULL,
[conn_type_id] [int] NOT NULL,
[source_proc_id] [int] NOT NULL,
[date_inserted] [datetime] NOT NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX [ix_id] ON [dbo].[profile_conns]
(
[profile_key_id] ASC,
[conn_key_type_id] ASC,
[conn_key_id] ASC,
[valid_to] ASC,
[valid_from] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
table lu_channel_conns
CREATE TABLE [dbo].[lu_channel_conns](
[channel_id] [int] NOT NULL,
[conn_type_id] [int] NOT NULL,
CONSTRAINT [PK_lu_channel_conns] PRIMARY KEY CLUSTERED
(
[channel_id] ASC,
[conn_type_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
table lu_conn_type
CREATE TABLE [dbo].[lu_conn_type](
[conn_type_id] [int] NOT NULL,
[default_key_type_id] [int] NOT NULL,
[master_key_type_id] [int] NOT NULL,
[date_inserted] [datetime] NOT NULL,
CONSTRAINT [PK_lu_conns] PRIMARY KEY CLUSTERED
(
[conn_type_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
view v_source_proc_id_by_group_id
SELECT DISTINCT x.source_proc_id, x.source_proc_group_id
FROM lu_source_proc x INNER JOIN lu_source_proc_group y ON x.source_proc_group_id = y.group_id
There's a dynamic SQL statement going to be executed:
SET #sql_str='SELECT #ret=MAX(o.optin_flag)
FROM optin_channel_'+CAST(#channel_id AS NVARCHAR(100))+' o
INNER HASH JOIN dbo.v_source_proc_id_by_group_id y ON o.source_proc_id=y.source_proc_id AND y.source_proc_group_id=#source_proc_group_id
INNER HASH JOIN profile_conns z ON z.profile_key_id=cast(#profile_key_id AS NVARCHAR(100)) AND z.conn_key_type_id=o.key_type_id AND z.conn_key_id=o.[key_id] AND z.valid_to=''01.01.3000''
INNER HASH JOIN lu_channel_conns x ON x.channel_id=#channel_id AND z.conn_type_id=x.conn_type_id
INNER HASH JOIN lu_conn_type ct ON ct.conn_type_id=x.conn_type_id AND ct.default_key_type_id=o.key_type_id'
SET #param='#channel_id INT, #profile_key_id INT, #source_proc_group_id INT, #ret NVARCHAR(400) OUTPUT'
EXEC sp_executesql #sql_str,#param,#channel_id,#profile_key_id,#source_proc_group_id,#ret OUTPUT
I.e. this gives:
SELECT #ret=MAX(o.optin_flag) AS optin_flag
FROM optin_channel_1 o
INNER HASH JOIN dbo.v_source_proc_id_by_group_id y
ON o.source_proc_id=y.source_proc_id
AND y.source_proc_group_id=5
INNER HASH JOIN profile_conns z
ON z.profile_key_id=1
AND z.conn_key_type_id=o.key_type_id
AND z.conn_key_id=o.[key_id]
AND z.valid_to='01.01.3000'
INNER HASH JOIN lu_channel_conns x
ON x.channel_id=1
AND z.conn_type_id=x.conn_type_id
INNER HASH JOIN lu_conn_type ct
ON ct.conn_type_id=x.conn_type_id
AND ct.default_key_type_id=o.key_type_id
These tables are used for an optin database. optin_flag could be 0 or 1. With the last statement I want to get a 1 as optin_flag from optin_channel_1 for the given channel_id=1 for user with profile_key_id=1, when optin was inserted into database by process belonging to source_proc_group_id=5. I hope this is enough to comprehend what's going on.
Is this the best way to use the CLUSTERED INDEX'es? Or would it be better to remove profile_key_id from index on profile_conns and put z.profile_key_id=1 in a WHERE clause?
May be there's a much better way for optimizing this select (changes in database schema is not possible, only changes on indexes and modifing statement).
Without knowing the size of the tables and the sort of data stored in it them it is difficult to gauge.
Assuming optin_channel_1 has a lot of data and profile_cons has a lot of data I would try the following:
Clustered index on optin_channel_1(key_id) or key_type_id depending on which field has the most distinct values. (since you don't have a covering index)
Clustered index on profile_conns (cons_key_id) or cons_key_type_id depending on what you have chosen in optin_channel_1
etc...
Basically, if your table profile_conns table has not much data, I would put the clustered index on the most fragmented "filter" field (I suspect profile_key_id). If the table has a lot of data I would aim for a hash/merge join and match the clustered index with the clustered index of the optin_channel_1 table.
I would also rewrite the query as such:
SELECT #ret = MAX(o.optin_flag) AS optin_flag
FROM optin_channel_1 o
JOIN dbo.v_source_proc_id_by_group_id y
ON o.source_proc_id = y.source_proc_id
JOIN profile_conns z
ON z.conn_key_type_id = o.key_type_id
AND z.conn_key_id = o.[key_id]
JOIN lu_channel_conns x
ON z.conn_type_id = x.conn_type_id
JOIN lu_conn_type ct
ON ct.conn_type_id = x.conn_type_id
AND ct.default_key_type_id=o.key_type_id
WHERE y.source_proc_group_id = 5
AND z.profile_key_id = 1
AND x.channel_id = 1
AND z.valid_to = '01.01.3000'
The query changed this way because:
Putting the filter conditions in the where clause shows you what are relevant fields to aim for a hash/merge join
Putting join hints is rarely a good idea. It is very hard to beat the query governor to determine the best query plan. A bad plan usually indicates you have an issue with your indexes/statistics.
So as summary:
small table joined to big table ==> go for nested loops & focus your clustered index on the "filter" field in the small table & the join field in the big table.
big table joined to big table => go for hash/merge join and put the clustered index on the matching field on both sides
multi-field indexes usually only a good idea when they are "covering", this means all the fields you query are included in the index. (or are included with the include() clause)

Select highest rated, oldest track

I have several tables:
CREATE TABLE [dbo].[Tracks](
[Id] [uniqueidentifier] NOT NULL,
[Artist_Id] [uniqueidentifier] NOT NULL,
[Album_Id] [uniqueidentifier] NOT NULL,
[Title] [nvarchar](255) NOT NULL,
[Length] [int] NOT NULL,
CONSTRAINT [PK_Tracks_1] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[TrackHistory](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Track_Id] [uniqueidentifier] NOT NULL,
[Datetime] [datetime] NOT NULL,
CONSTRAINT [PK_TrackHistory] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [cooltunes].[dbo].[TrackHistory]
([Track_Id]
,[Datetime])
VALUES
("335294B0-735E-4E2C-8389-8326B17CE813"
,GETDATE())
CREATE TABLE [dbo].[Ratings](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Track_Id] [uniqueidentifier] NOT NULL,
[User_Id] [uniqueidentifier] NOT NULL,
[Rating] [tinyint] NOT NULL,
CONSTRAINT [PK_Ratings] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [cooltunes].[dbo].[Ratings]
([Track_Id]
,[User_Id]
,[Rating])
VALUES
("335294B0-735E-4E2C-8389-8326B17CE813"
,"C7D62450-8BE6-40F6-80F1-A539DA301772"
,1)
Users
User_Id|Guid
Other fields
Links between the tables are pretty obvious.
TrackHistory has each track added to it as a row whenever it is played ie. a track will appear in there many times.
Ratings value will either be 1 or -1.
What I'm trying to do is select the Track with the highest rating, that is more than 2 hours old, and if there is a duplicate rating for a track (ie a track receives 6 +1 ratings and 1 - rating, giving that track a total rating of 5, another track also has a total rating of 5), the track that was last played the longest ago should be returned. (If all tracks have been played within the last 2 hours, no rows should be returned)
I'm getting somewhere doing each part individually using the link above, SUM(Value) and GROUP BY Track_Id, but I'm having trouble putting it all together.
Hopefully someone with a bit more (MS)SQL knowledge will be able to help me. Many thanks!
select top 1 t.Id, SUM(r.Rating) as Rating, MAX(Datetime) as LastPlayed
from Tracks t
inner join TrackHistory h on t.Id = h.Track_Id
inner join Ratings r on t.Id = r.Track_Id
where h.Track_Id not in (
select Track_Id
from TrackHistory
where Datetime > DATEADD(HOUR, -2, getdate())
)
group by t.Id
order by Rating desc, LastPlayed