SQL Server 2005 query optimization with Max subquery

SQL Server 2005 query optimization with Max subquery - sql-server-2005

I've got a table that looks like this (I wasn't sure what all might be relevant, so I had Toad dump the whole structure)
CREATE TABLE [dbo].[TScore] (
[CustomerID] int NOT NULL,
[ApplNo] numeric(18, 0) NOT NULL,
[BScore] int NULL,
[OrigAmt] money NULL,
[MaxAmt] money NULL,
[DateCreated] datetime NULL,
[UserCreated] char(8) NULL,
[DateModified] datetime NULL,
[UserModified] char(8) NULL,
CONSTRAINT [PK_TScore]
PRIMARY KEY CLUSTERED ([CustomerID] ASC, [ApplNo] ASC)
);
And when I run the following query (on a database with 3 million records in the TScore table) it takes about a second to run, even though if I just do: Select BScore from CustomerDB..TScore WHERE CustomerID = 12345, it is instant (and only returns 10 records) -- seems like there should be some efficient way to do the Max(ApplNo) effect in a single query, but I'm a relative noob to SQL Server, and not sure -- I'm thinking I may need a separate key for ApplNo, but not sure how clustered keys work.
SELECT BScore
FROM CustomerDB..TScore (NOLOCK)
WHERE ApplNo = (SELECT Max(ApplNo)
FROM CustomerDB..TScore sc2 (NOLOCK)
WHERE sc2.CustomerID = 12345)
Thanks much for any tips (pointers on where to look for optimization of sql server stuff appreciated as well)

When you filter by ApplNo, you are using only part of the key. And not the left hand side. This means the index has be scanned (look at all rows) not seeked (drill to a row) to find the values.
If you are looking for ApplNo values for the same CustomerID:
Quick way. Use the full clustered index:
SELECT BScore
FROM CustomerDB..TScore
WHERE ApplNo = (SELECT Max(ApplNo)
FROM CustomerDB..TScore sc2
WHERE sc2.CustomerID = 12345)
AND CustomerID = 12345
This can be changed into a JOIN
SELECT BScore
FROM
CustomerDB..TScore T1
JOIN
(SELECT Max(ApplNo) AS MaxApplNo, CustomerID
FROM CustomerDB..TScore sc2
WHERE sc2.CustomerID = 12345
) T2 ON T1.CustomerID = T2.CustomerID AND T1.ApplNo= T2.MaxApplNo
If you are looking for ApplNo values independent of CustomerID, then I'd look at a separate index. This matches your intent of the current code
CREATE INDEX IX_ApplNo ON TScore (ApplNo) INCLUDE (BScore);
Reversing the key order won't help because then your WHERE sc2.CustomerID = 12345 will scan, not seek
Note: using NOLOCK everywhere is a bad practice

Related

How to do cumulative query

I would share ddl I am try in my local :
Table Inv :
create table inv(
inv_id integer not null primary key,
document_no varchar(150) not null,
grandtotal integer not null);
Tabel Pay :
create table pay(
pay_id integer not null primary key,
document_no varchar(150) not null,
inv_id integer references inv(inv_id),
payamt integer not null);
Insert into Inv :
insert into inv(inv_id, document_no, grandtotal) values
(1,'ABC18',50000),(2,'ABC19',45000);
Insert into Pay :
insert into pay(pay_id, document_no, inv_id, payamt) values
(1,'DEF18-1',1,20000),(2,'DEF18-2',1,30000);
How to make cumulative query? I am try
select inv.document_no, inv.grandtotal, sum(pay.payamt),
sum(pay.payamt)- inv.grandtotal as total
from inv, pay
where inv.inv_id= pay.inv_id
group by inv.document_no, inv.grandtotal
But it doesn't give me the expected result.

First of all, do not use that Join syntax, I am advising you to not use it. You can see the reason why here
Bad Habits to kick : using old style joins
From your ddl you share and your query I assume you want to see the history of your transaction and do cumulative?
This query should work :
SELECT inv.document_no AS doc_inv,
inv.grandtotal AS total_inv,
COALESCE(pay.document_no, '-') AS doc_pay,
COALESCE(pay.payamt, '0') AS total_pay,
COALESCE(( inv.grandtotal - Sum(pay.payamt)
OVER(
partition BY inv.inv_id
ORDER BY pay.pay_id) ), inv.grandtotal)
AS cumulative
FROM inv
LEFT OUTER JOIN pay
ON inv.inv_id = pay.inv_id
I am using Left Outer Join because there are Inv not get Pay in your insert Data. And of course it is only guessing without more guidance.
And what do you need is Window Function
Definition :
Performs a calculation across a set of table rows that are somehow
related to the current row.
And about join table you can read here : Join Documentation
Here Demo :
Demo<>Fiddle

SQL Query to search records in multiple tables

I'm trying to implement a search feature. I need to look into multiple tables in SQL database using a text-string. Currently, I'm only looking into 3 tables i.e.,
Table Items:
[dbo].[Items]
(
[ItemID] INT IDENTITY (1, 1) NOT NULL,
[CategoryID] INT NOT NULL,
[BrandID] INT NOT NULL,
[ItemName] NVARCHAR(MAX) NOT NULL,
[ItemPrice] DECIMAL(18, 2) NOT NULL,
[imageUrl] NVARCHAR(MAX) NULL,
CONSTRAINT [PK_dbo.Items]
PRIMARY KEY CLUSTERED ([ItemID] ASC),
CONSTRAINT [FK_dbo.Items_dbo.Brands_BrandID]
FOREIGN KEY ([BrandID]) REFERENCES [dbo].[Brands] ([BrandID]),
CONSTRAINT [FK_dbo.Items_dbo.Categories_CategoryID]
FOREIGN KEY ([CategoryID]) REFERENCES [dbo].[Categories] ([CategoryID])
)
Table Categories:
[dbo].[Categories]
(
[CategoryID] INT IDENTITY (1, 1) NOT NULL,
[Name] NVARCHAR (MAX) NULL,
CONSTRAINT [PK_dbo.Categories]
PRIMARY KEY CLUSTERED ([CategoryID] ASC)
)
Table Brands:
[dbo].[Brands]
(
[BrandID] INT IDENTITY (1, 1) NOT NULL,
[Name] NVARCHAR (MAX) NULL,
CONSTRAINT [PK_dbo.Brands]
PRIMARY KEY CLUSTERED ([BrandID] ASC)
)
Any records that may contain the supplied text-string must be fetched out. I'm a newbie on SQL knowledge. This is my implementation is:
SELECT *
FROM Items
WHERE ItemName LIKE 'cocacola'
SELECT *
FROM Categories
WHERE Name LIKE 'cocacola'
SELECT *
FROM Brands
WHERE Name LIKE 'cocacola'
which is obviously incorrect. Can someone please guide.
Thanks.

If you want to return a substring search, it might be slow depending on how much data you have.
If you are able to pre-specify the tables, and want a single search that searches all and returns matches across all tables, you will want something like this:
SELECT
'Items' as table_name,
item_id as record_id,
ItemName AS found
FROM
Items
WHERE
ItemName LIKE '%cocacola%'
UNION
SELECT
'Categories' as table_name,
CategoryID AS record_id,
Name AS found
FROM
Categories
WHERE
Name LIKE '%cocacola%'
UNION
SELECT
'Brands' as table_name,
BrandID AS record_id,
Name AS found
FROM
Brands
WHERE
Name LIKE '%cocacola%'
The UNION will append the results from one query to another.
It will be slow if you have a lot of data

You solution is not incorrect. You run three queries. Each against a different Table. Depending on your use case this is probably fine.
You can join the tables if you want to search all tables with only one query. This is probably slower than running three queries because the database has to match the values together.
SELECT *
FROM Items
FULL OUTER JOIN Categories ON Categories.CategoryID = Items.CategoryID
FULL OUTER JOIN Brands ON Brands.BrandID = Items.BrandID
WHERE Items.ItemName LIKE 'cocacola'
AND Categories.Name LIKE 'cocacola'
AND Brands.Name LIKE 'cocacola'
If you get a hit in the category name with this query, the category will be listed for every item that's associated with this category.

It sounds like you might want to try using a union to join together the results of all three queries.
For example:
SELECT ItemID, ItemName
FROM Items
WHERE ItemName = 'cocacola'
UNION
SELECT CategoryID, Name
FROM Categories
WHERE Name = 'cocacola'
UNION
SELECT BrandID, Name
FROM Brands
WHERE Name = 'cocacola'
One note about union is that you have to make sure that each part of the query is returning the same number of columns with the same datatype in the same order.

Ambiguous column name SQL

I get the following error when I want to execute a SQL query:
"Msg 209, Level 16, State 1, Line 9
Ambiguous column name 'i_id'."
This is the SQL query I want to execute:
SELECT DISTINCT x.*
FROM items x LEFT JOIN items y
ON y.i_id = x.i_id
AND x.last_seen < y.last_seen
WHERE x.last_seen > '4-4-2017 10:54:11'
AND x.spot = 'spot773'
AND (x.technology = 'Bluetooth LE' OR x.technology = 'EPC Gen2')
AND y.id IS NULL
GROUP BY i_id
This is how my table looks like:
CREATE TABLE [dbo].[items] (
[id] INT IDENTITY (1, 1) NOT NULL,
[i_id] VARCHAR (100) NOT NULL,
[last_seen] DATETIME2 (0) NOT NULL,
[location] VARCHAR (200) NOT NULL,
[code_hex] VARCHAR (100) NOT NULL,
[technology] VARCHAR (100) NOT NULL,
[url] VARCHAR (100) NOT NULL,
[spot] VARCHAR (200) NOT NULL,
PRIMARY KEY CLUSTERED ([id] ASC));
I've tried a couple of things but I'm not an SQL expert:)
Any help would be appreciated
EDIT:
I do get duplicate rows when I remove the GROUP BY line as you can see:

I'm adding another answer in order to show how you'd typically select the lastest record per group without getting duplicates. You's use ROW_NUMBER for this, marking every last record per i_id with row number 1.
SELECT *
FROM
(
SELECT
i.*,
ROW_NUMBER() over (PARTITION BY i_id ORDER BY last_seen DESC) as rn
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
) ranked
WHERE rn = 1;
(You'd use RANK or DENSE_RANK instead of ROW_NUMBER if you wanted duplicates.)

You forgot the table alias in GROUP BY i_id.
Anyway, why are you writing an anti join query where you are trying to get rid of duplicates with both DISTINCT and GROUP BY? Did you have issues with a straight-forward NOT EXISTS query? You are making things way more complicated than they actually are.
SELECT *
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
AND NOT EXISTS
(
SELECT *
FROM items other
WHERE i.i_id = other.i_id
AND i.last_seen < other.last_seen
);
(There are other techniques of course to get the last seen record per i_id. This is one; another is to compare with MAX(last_seen); another is to use ROW_NUMBER.)

SQL Query Optimization (After table structure change)

I am just wondering if anyone can see a better solution to this issue.
I previously had a flat (wide) table to work with, that contained multiple columns. This table has now been changed to a dynamic table containing just 2 columns (statistic_name and value).
I have amended my code to use sub queries to return the same results as before, however I am worried the performance is going to be terrible when using real live data. This is based on the exacution plan which shows a considerable difference between the 2 versions.
See below for a very simplified example of my issue -
CREATE TABLE dbo.TEST_FLAT
(
ID INT,
TEST1 INT,
TEST2 INT,
TEST3 INT,
TEST4 INT,
TEST5 INT,
TEST6 INT,
TEST7 INT,
TEST8 INT,
TEST9 INT,
TEST10 INT,
TEST11 INT,
TEST12 INT
)
CREATE TABLE dbo.TEST_DYNAMIC
(
ID INT,
STAT VARCHAR(6),
VALUE INT
)
CREATE TABLE dbo.TEST_URNS
(
ID INT
)
-- OLD QUERY
SELECT D.[ID], D.TEST1, D.TEST2, D.TEST3, D.TEST4, D.TEST5, D.TEST6, D.TEST7, D.TEST8, D.TEST9, D.TEST10, D.TEST11, D.TEST12
FROM [dbo].[TEST_URNS] U
INNER JOIN [dbo].[TEST_FLAT] D
ON D.ID = U.ID
-- NEW QUERY
SELECT U.[ID],
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST1') AS TEST1,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST2') AS TEST2,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST3') AS TEST3,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST4') AS TEST4,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST5') AS TEST5,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST6') AS TEST6,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST7') AS TEST7,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST8') AS TEST8,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST9') AS TEST9,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST10') AS TEST10,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST11') AS TEST11,
(SELECT VALUE FROM dbo.TEST_DYNAMIC WHERE ID = U.ID AND STAT = 'TEST12') AS TEST12
FROM [dbo].[TEST_URNS] U
Note this is in SQL2008 R2 and this will be part of a stored procedure, the flat version of the table contained hundreds of thousands of records (900k or so at last count).
Thanks in advance.

Create an index on the STAT column of TEST_DYNAMIC, for quick lookups.
But first consider redesigning TEST_DYNAMIC changing STAT varchar(6) to STAT_ID int (referencing a lookup table)
Then on TEST_DYNAMIC, create an index on STAT_ID which will run quite a bit faster than an index on a text field.

Create your TEST_DYNAMIC and TEST_URNS tables like this:
CREATE TABLE [dbo].[TEST_DYNAMIC](
[ID] [int] IDENTITY(1,1) NOT NULL,
[STAT] [varchar](50) NOT NULL,
[VALUE] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_TEST_DYNAMIC] PRIMARY KEY CLUSTERED
(
[ID]
))
CREATE TABLE dbo.TEST_URNS
(
ID [int] IDENTITY(1,1) NOT NULL
)
CONSTRAINT [PK_TEST_URNS] PRIMARY KEY CLUSTERED
(
[ID]
))
If you notice after a period of time that performance becomes poor, then you can check the index fragmentation:
SELECT a.index_id, name, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(dbo.TEST_DYNAMIC'),
NULL, NULL, NULL) AS a
JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
GO
Then you can rebuild the index like so:
ALTER INDEX PK_PK_TEST_DYNAMIC ON dbo.TEST_DYNAMIC
REBUILD;
GO
For details please see https://msdn.microsoft.com/en-us/library/ms189858.aspx
Also, I like #Brett Lalonde's suggestion to change STAT to an int.

The only way to really know is to try it out. In general, modern hardware should be able to support either query with little noticable impact in performance as long as you are indexing both tables correctly (you'll probably need an index on ID and STAT).
If you have 900K entities and 12 attributes, you have around 10 million rows; that should be fine on a decent serer. Eventually, you may run into performance problems if you add many records every month.
The bigger problem is that the example queries you paste are almost certainly not what you'll end up running in your real queries. If you have to filter and/or compare TEST5 with TEST6 on your derived table, you don't benefit from the additional indexing you could do if they were "real" columns.
You could then come full circle and implement your EAV table as an indexed view.

Leaderboard design using SQL Server

I am building a leaderboard for some of my online games. Here is what I need to do with the data:
Get rank of a player for a given game across multiple time frame (today, last week, all time, etc.)
Get paginated ranking (e.g. top score for last 24 hrs., get players between rank 25 and 50, get rank or a single user)
I defined with the following table definition and index and I have a couple of questions.
Considering my scenarios, do I have a good primary key? The reason why I have a clustered key across gameId, playerName and score is simply because I want to make sure that all data for a given game is in the same area and that score is already sorted. Most of the time I will display the data is descending order of score (+ updatedDateTime for ties) for a given gameId. Is this a right strategy? In other words, I want to make sure that I can run my queries to get the rank of my players as fast as possible.
CREATE TABLE score (
[gameId] [smallint] NOT NULL,
[playerName] [nvarchar](50) NOT NULL,
[score] [int] NOT NULL,
[createdDateTime] [datetime2](3) NOT NULL,
[updatedDateTime] [datetime2](3) NOT NULL,
PRIMARY KEY CLUSTERED ([gameId] ASC, [playerName] ASC, [score] DESC, [updatedDateTime] ASC)
CREATE NONCLUSTERED INDEX [Score_Idx] ON score ([gameId] ASC, [score] DESC, [updatedDateTime] ASC) INCLUDE ([playerName])
Below is the first iteration of the query I will be using to get the rank of my players. However, I am a bit disappointed by the execution plan (see below). Why does SQL need to sort? The additional sort seem to come from the RANK function. But isn’t my data already sorted in descending order (based on the clustered key of the score table)? I am also wondering if I should normalize a bit more my table and move out the PlayerName column in a Player table. I originally decided to keep everything in the same table to minimize the number of joins.
DECLARE #GameId AS INT = 0
DECLARE #From AS DATETIME2(3) = '2013-10-01'
SELECT DENSE_RANK() OVER (ORDER BY Score DESC), s.PlayerName, s.Score, s.CountryCode, s.updatedDateTime
FROM [mrgleaderboard].[score] s
WHERE s.GameId = #GameId
AND (s.UpdatedDateTime >= #From OR #From IS NULL)
Thank you for the help!

[Updated]
Primary key is not good
You have a unique entity that is [GameID] + [PlayerName]. And composite clustered Index > 120 bytes with nvarchar. Look for the answer by #marc_s in the related topic SQL Server - Clustered index design for dictionary
Your table schema does not match of your requirements to time periods
Ex.: I earned 300 score on Wednesday and this score stored on leaderboard. Next day I earned 250 score, but it will not record on leaderboard and you don't get results if I run a query to Tuesday leaderboard
For complete information you can get from a historical table games played score but it can be very expensive
CREATE TABLE GameLog (
[id] int NOT NULL IDENTITY
CONSTRAINT [PK_GameLog] PRIMARY KEY CLUSTERED,
[gameId] smallint NOT NULL,
[playerId] int NOT NULL,
[score] int NOT NULL,
[createdDateTime] datetime2(3) NOT NULL)
Here are solutions to accelerate it related with the aggregation:
Indexed view on historical table (see post by #Twinkles).
You need 3 indexed view for the 3 time periods. Potentially huge size of historical tables and 3 indexed view. Unable to remove the "old" periods of the table. Performance issue to save score.
Asynchronous leaderboard
Scores saved in the historical table. SQL job/"Worker" (or several) according to schedule (1 per minute?) sorts historical table and populates the leaderboards table (3 tables for 3 time period or one table with time period key) with the precalculated rank of a user. This table also can be denormalized (have score, datetime, PlayerName and ...). Pros: Fast reading (without sorting), fast save score, any time periods, flexible logic and flexible schedules. Cons: The user has finished the game but did not found immediately himself on the leaderboard
Preaggregated leaderboard
During recording the results of the game session do pre-treatment. In your case something like UPDATE [Leaderboard] SET score = #CurrentScore WHERE #CurrentScore > MAX (score) AND ... for the player / game id but you did it only for "All time" leaderboard. The scheme might look like this:
CREATE TABLE [Leaderboard] (
[id] int NOT NULL IDENTITY
CONSTRAINT [PK_Leaderboard] PRIMARY KEY CLUSTERED,
[gameId] smallint NOT NULL,
[playerId] int NOT NULL,
[timePeriod] tinyint NOT NULL, -- 0 -all time, 1-monthly, 2 -weekly, 3 -daily
[timePeriodFrom] date NOT NULL, -- '1900-01-01' for all time, '2013-11-01' for monthly, etc.
[score] int NOT NULL,
[createdDateTime] datetime2(3) NOT NULL
)
playerId timePeriod timePeriodFrom Score
----------------------------------------------
1 0 1900-01-01 300
...
1 1 2013-10-01 150
1 1 2013-11-01 300
...
1 2 2013-10-07 150
1 2 2013-11-18 300
...
1 3 2013-11-19 300
1 3 2013-11-20 250
...
So, you have to update all 3 score for all time period. Also as you can see leaderboard will contain "old" periods, such as monthly of October. Maybe you have to delete it if you do not need this statistics. Pros: Does not need a historical table. Cons: Complicated procedure for storing the result. Need maintenance of leaderboard. Query requires sorting and JOIN
CREATE TABLE [Player] (
[id] int NOT NULL IDENTITY CONSTRAINT [PK_Player] PRIMARY KEY CLUSTERED,
[playerName] nvarchar(50) NOT NULL CONSTRAINT [UQ_Player_playerName] UNIQUE NONCLUSTERED)
CREATE TABLE [Leaderboard] (
[id] int NOT NULL IDENTITY CONSTRAINT [PK_Leaderboard] PRIMARY KEY CLUSTERED,
[gameId] smallint NOT NULL,
[playerId] int NOT NULL,
[timePeriod] tinyint NOT NULL, -- 0 -all time, 1-monthly, 2 -weekly, 3 -daily
[timePeriodFrom] date NOT NULL, -- '1900-01-01' for all time, '2013-11-01' for monthly, etc.
[score] int NOT NULL,
[createdDateTime] datetime2(3)
)
CREATE UNIQUE NONCLUSTERED INDEX [UQ_Leaderboard_gameId_playerId_timePeriod_timePeriodFrom] ON [Leaderboard] ([gameId] ASC, [playerId] ASC, [timePeriod] ASC, [timePeriodFrom] ASC)
CREATE NONCLUSTERED INDEX [IX_Leaderboard_gameId_timePeriod_timePeriodFrom_Score] ON [Leaderboard] ([gameId] ASC, [timePeriod] ASC, [timePeriodFrom] ASC, [score] ASC)
GO
-- Generate test data
-- Generate 500K unique players
;WITH digits (d) AS (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 0)
INSERT INTO Player (playerName)
SELECT TOP (500000) LEFT(CAST(NEWID() as nvarchar(50)), 20 + (ABS(CHECKSUM(NEWID())) & 15)) as Name
FROM digits CROSS JOIN digits ii CROSS JOIN digits iii CROSS JOIN digits iv CROSS JOIN digits v CROSS JOIN digits vi
-- Random score 500K players * 4 games = 2M rows
INSERT INTO [Leaderboard] (
[gameId],[playerId],[timePeriod],[timePeriodFrom],[score],[createdDateTime])
SELECT GameID, Player.id,ABS(CHECKSUM(NEWID())) & 3 as [timePeriod], DATEADD(MILLISECOND, CHECKSUM(NEWID()),GETDATE()) as Updated, ABS(CHECKSUM(NEWID())) & 65535 as score
, DATEADD(MILLISECOND, CHECKSUM(NEWID()),GETDATE()) as Created
FROM ( SELECT 1 as GameID UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) as Game
CROSS JOIN Player
ORDER BY NEWID()
UPDATE [Leaderboard] SET [timePeriodFrom]='19000101' WHERE [timePeriod] = 0
GO
DECLARE #From date = '19000101'--'20131108'
,#GameID int = 3
,#timePeriod tinyint = 0
-- Get paginated ranking
;With Lb as (
SELECT
DENSE_RANK() OVER (ORDER BY Score DESC) as Rnk
,Score, createdDateTime, playerId
FROM [Leaderboard]
WHERE GameId = #GameId
AND [timePeriod] = #timePeriod
AND [timePeriodFrom] = #From)
SELECT lb.rnk,lb.Score, lb.createdDateTime, lb.playerId, Player.playerName
FROM Lb INNER JOIN Player ON lb.playerId = Player.id
ORDER BY rnk OFFSET 75 ROWS FETCH NEXT 25 ROWS ONLY;
-- Get rank of a player for a given game
SELECT (SELECT COUNT(DISTINCT rnk.score)
FROM [Leaderboard] as rnk
WHERE rnk.GameId = #GameId
AND rnk.[timePeriod] = #timePeriod
AND rnk.[timePeriodFrom] = #From
AND rnk.score >= [Leaderboard].score) as rnk
,[Leaderboard].Score, [Leaderboard].createdDateTime, [Leaderboard].playerId, Player.playerName
FROM [Leaderboard] INNER JOIN Player ON [Leaderboard].playerId = Player.id
where [Leaderboard].GameId = #GameId
AND [Leaderboard].[timePeriod] = #timePeriod
AND [Leaderboard].[timePeriodFrom] = #From
and Player.playerName = N'785DDBBB-3000-4730-B'
GO
This is only an example for the presentation of ideas. It can be optimized. For example, combining columns GameID, TimePeriod, TimePeriodDate to one column through the dictionary table. The effectiveness of the index will be higher.
P.S. Sorry for my English. Feel free to fix grammatical or spelling errors

You could look into indexed views to create scoreboards for common time ranges (today, this week/month/year, all-time).

to get the rank of a player for a given game across multiple timeframes, you will select the game and rank (i.e. sort) by score over a multiple timeframes. for this, your nonclustered index could be changed like this since this is the way your select seems to query.
CREATE NONCLUSTERED INDEX [Score_Idx]
ON score ([gameId] ASC, [updatedDateTime] ASC, [score] DESC)
INCLUDE ([playerName])
for the paginated ranking:
for the 24h-top score i guess you will want all the top scores of a single user across all games within the last 24h. for this you will be querying [playername], [updateddatetime] with [gameid].
for the players between rank 25-50, i assume you are talking about a single game and have a long ranking that you can page through. the query will then be based upon [gameid], [score] and a little on [updateddatetime] for the ties.
the single-user ranks, probably for each game, is a little more difficult. you will need to query the leaderboards for all games in order to get the player's rank in them and then filter on the player. you will need [gameid], [score], [updateddatetime] and then filter by player.
concluding all this, i propose you keep your nonclustered index and change the primary key to:
PRIMARY KEY CLUSTERED ([gameId] ASC, [score] DESC, [updatedDateTime] ASC)
for the 24h-top score i think this might help:
CREATE NONCLUSTERED INDEX [player_Idx]
ON score ([playerName] ASC)
INCLUDE ([gameId], [score])
the dense_rank query sorts because it selects [gameId], [updatedDateTime], [score]. see my comment on the nonclustered index above.
i would also think twice about including the [updatedDateTime] in your queries and subsequently in your indexes. maybe sometmes two players get the same rank, why not? [updatedDateTime] will let your index swell up significantly.
also you might think about partitioning tables by [gameid].

As a bit of a sidetrack:
Ask yourself how accurate and how up to date do the scores in the leaderboard actually need to be?
As a player I don't care if I'm number 142134 in the world or number 142133. I do care if I beat my friends' exact score (but then I only need my score compared to a couple of other scores) and I want to know that my new highscore sends me from somewhere around 142000 to somewhere around 90000. (Yay!)
So if you want really fast leaderboards, you do not actually need all data to be up to date. You could daily or hourly compute a static sorted copy of the leaderboard and when showing player X's score, show at what rank it'd fit in the static copy.
When comparing to friends, last minute updates do matter, but you're dealing with only a couple hundred scores, so you can look up their actual scores in the up to date leaderboards.
Oh, and I care about the top 10 of course. Consider them my "friends" merely based on the fact that they scored so well, and show these values up to date.

Your clustered index is composite so it means that order is defined by more than one column. You request ORDER BY Score which is the 2nd column in the clustered index. For that reason, entries in the index are not necessarily in the order of Score, e.g. entries
1, 2, some date
2, 1, some other date
If you select just Score, the order will be
2
1
which needs to be sorted.

i would not put the "score" column into the clustered index because it will probably change all the time ... and updates on a column that's part of the clustered index will be expensive

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server 2005 query optimization with Max subquery - sql-server-2005

Related

How to do cumulative query

SQL Query to search records in multiple tables

Ambiguous column name SQL

SQL Query Optimization (After table structure change)

Leaderboard design using SQL Server

Categories

Resources