SQL Server 2005 performance issue with DISTINCT

SQL Server 2005 performance issue with DISTINCT - sql

I have a table tblStkMst2 which has 87 columns and 53,000 rows. If I execute the following query it takes 83 to 96 milliseconds (Core2 Duo, 2.8 GHz, 2 GB of RAM). But when I use a distinct keyword it takes 1086 to 1103 milliseconds (more than 1 second). It is really expensive. If I apply duplicate removal algorithm on 53,000 rows of data it does not take 1 seconds.
Is there any other way in SQL Server 2005 to improve execution time?
declare #monthOnly int set #monthOnly = 12
declare #yearOnly int set #yearOnly = 2011
SELECT --(distinct)--
tblSModelMst.SMNo as [ModelID]
,tblSModelMst.Vehicle as [ModelName]
FROM tblStkMst2
INNER JOIN tblDCDetail ON tblStkMst2.DCNo = tblDCDetail.DCNo AND tblDCDetail.Refund=0
INNER JOIN tblSModelMst ON tblStkMst2.SMno = tblSModelMst.SMNo
INNER JOIN tblBuyerMst ON tblDCDetail.BNo = tblBuyerMst.BNo
LEFT OUTER JOIN tblSModelSegment ON tblSModelMst.SMSeg = tblSModelSegment.ID
left outer JOIN dbo.tblProdManager as pd ON pd.PMID = tblBuyerMst.PMId
WHERE (pd.Active = 1) AND ((tblStkMst2.ISSFlg = 1) or (tblStkMst2.IsBooked = 1))
AND (MONTH(tblStkMst2.SIssDate) = #monthOnly) AND (YEAR(tblStkMst2.SIssDate) = #yearOnly)

It is not that DISTINCT is very expensive (this is only 53000 rows, which is tiny). You are seeing a significant performance difference because SQL server is choosing a completely different query plan when you add DISTINCT. Without seeing the query plans it is very difficult to see what is happening.
There are a couple of things in your query though which you could do better which could significantly improve performance.
(1) Avoid where clauses like this where you need to transform a column:
AND (MONTH(tblStkMst2.SIssDate) = #monthOnly) AND (YEAR(tblStkMst2.SIssDate) = #yearOnly)
If you have an index on the SIssDate column SQL Server won't be able to use it (it will likely do a table scan as I suspect it won't be able to use another index).
If you want to take advantage of the SIssDate index, it is better if you try and convert the #monthOnly/#yearonly parameters into a min and max date and use these in the query:
AND (tblStkMst2.SIssDate between #minDate and #maxDate);
If you have a surrogate primary key (which is the clustered index) on the table, it may be useful to do this before you run your query (assuming your surrogate primary key is called tblStkMst2_id)
SELECT #minId=MIN(tblStkMst2_id), #maxId=(tblStkMst2_id)
FROM
tblStkMst2 WHERE tblStkMsg2.SIssDate between #minDate and #maxDate;
This should be very fast as SQL server should not even need to look at the table (just at the SIssDate non-clustered index and the tblStkMst2_id clustered index).
Then you can do this in your main query (instead of the date check):
AND (tblStkMst2.tblStkMst2_id BETWEEN #minId and #maxId);
Using the clustered index is much faster than using a non-clustered index as the DB will be able to sequentially access these records (rather than going through the non-clustered index redirect).
(2) Delay the join to tblStkMst2 until after you do the DISTINCT (or GROUP BY). The fewer entries in the DISTINCT (GROUP BY) the better.

SQL Server optimizes to avoid worst-case execution. This can lead it to prefer a suboptimal algorithm, like preferring a disk sort over a hash sort, just to be on the safe side.
For a limited number of distinct values, a hash sort is the fastest way to execute a distinct operation. A hash sort trades memory for execution speed. But if you have a large number of values, the hash sort breaks down because the hash is too large to store in memory. So you need a way to tell SQL Server that the hash will fit into memory.
One possible way to do that is to use a temporary table:
declare #t (ModelID int, ModelName varchar(50))
insert #t (ModelID, ModelName) select ...your original query here...
select distinct ModelID, ModelName from #t
SQL Server will know the size of the temporary table, allowing it to choose a better algorithm in many cases.

Several ways.
1 - Don't use DISTINCT
2 - Create an index on TblSModelMst(SMNo) INCLUDE (Vehicle), and index your other JOIN keys.
You really should figure out why you get duplicates and take care of that first. It's likely additional matching rows in one or more of your JOINed tables.
DISTINCT has it's place but is heavily overused to obscure data issues, and it's a very expensive operator, especially when you have a large number of rows you are filtering down from.
To get a more complete answer you need to explain your data structure and what you are trying to achieve.

Related

Why changing where statement to a variable cause query to be 4 times slower

I am inserting data from one table "Tags" from "Recovery" database into another table "Tags" in "R3" database
they all live in my laptop similar SQL Server instance
I have built the insert query and because Recovery..Tags table is around 180M records I decided to break it into smaller sebsets. ( 1 million recs at the time)
Here is my query (Let's call Query A)
insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between 13000001 and 14000000
it takes around 2 minutes.
That is ok
To make things a bit easier for me
I put the iiD in the were statement in a variable
so my query looks like this (Let's call Query B)
declare #i int = 12
insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between (1000000 * #i) + 1 and (#i+1)*1000000
but that cause the insert to become so slow (around 10 min)
So what I tried query A again and gave me around 2 min
I tried query B again and gave around 8 min!!
I am attaching exec plan for each one (at a site that shows an analysis of the query plan) - Query A Plan and Query B Plan
Any idea why this is happening?
and how to fix it?

The big difference in time is due to the very different plans that are being created to join Tags and Reps.
Fundamentally, in version A, it knows how much data is being extracted (a million rows) and it can design an efficient query for that. However, because you are using variables in B to define how much data is being imported, it has to define a more generic query - one that would work for 10 rows, a million rows, or a hundred million rows.
In the plans, here are the relevant sections of the query joining Tags and Reps...
... in A
... and B
Note that in A it takes just over a minute to do the join; in B it takes 6 and a half minutes.
The key thing that appears to take the time is that it does a table scan of the Tags table which takes 5:44 to complete. The plan has this as a table scan, as the next time you run the query you may want many more than 1 million rows.
A secondary issue is that the amount of data it reads (or expects to read) from Reps is also way out of whack. In A it expected to read 2 million rows and read 1421; in B it basically read them all (even though technically it probably only needed the same 1421).
I think you have two main approaches to fix
Look at indexing, to remove the table scan on Tags - ensure the indexes match what is needed and allows the query to do a scan on that index (it appears that the index at the top of #MikePetri's answer is what you need, or similar). This way instead of doing a table scan, it can do an index scan which can start 'in the middle' of the data set (a table scan must start at either the start or end of the data set).
Separate this into two processes. The first process gets the relevant million rows from Tags, and saves it in a temporary table. The second process uses the data in the temporary table to join to Reps (also try using option (recompile) in the second query, so that it checks the temporary table's size before creating the plan).
You can even put an index or two (and/or Primary Key) on that temporary table to make it better for the next step.

The reason the first query is so much faster is it went parallel. This means the cardinality estimator knew enough about the data it had to handle, and the query was large enough to tip the threshold for parallel execution. Then, the engine passed chunks of data for different processors to handle individually, then report back and repartition the streams.
With the value as a variable, it effectively becomes a scalar function evaluation, and a query cannot go parallel with a scalar function, because the value has to determined before the cardinality estimator can figure out what to do with it. Therefore, it runs in a single thread, and is slower.
Some sort of looping mechanism might help. Create the included indexes to assist the engine in handling this request. You can probably find a better looping mechanism, since you are familiar with the identity ranges you care about, but this should get you in the right direction. Adjust for your needs.
With a loop like this, it commits the changes with each loop, so you aren't locking the table indefinitely.
USE Recovery;
GO
CREATE INDEX NCI_iID
ON Tags (iID)
INCLUDE (
DT
,RepID
,tag
,xmiID
,iBegin
,iEnd
,Confidence
,Polarity
,Uncertainty
,Conditional
,Generic
,HistoryOf
,CodingScheme
,Code
,CUI
,TUI
,PreferredText
,ValueBegin
,ValueEnd
,value
,Deleted
,sKey
);
GO
CREATE INDEX NCI_RepID ON Reps (RepID) INCLUDE (RepType);
USE R3;
GO
CREATE INDEX NCI_iID ON Tags (iID);
GO
DECLARE #RowsToProcess BIGINT
,#StepIncrement INT = 1000000;
SELECT #RowsToProcess = (
SELECT COUNT(1)
FROM Recovery..tags AS T
WHERE NOT EXISTS (
SELECT 1
FROM R3..Tags AS rt
WHERE T.iID = rt.iID
)
);
WHILE #RowsToProcess > 0
BEGIN
INSERT INTO R3..Tags
(
iID
,DT
,RepID
,Tag
,xmiID
,iBegin
,iEnd
,Confidence
,Polarity
,Uncertainty
,Conditional
,Generic
,HistoryOf
,CodingScheme
,Code
,CUI
,TUI
,PreferredText
,ValueBegin
,ValueEnd
,Value
,Deleted
,sKey
,RepType
)
SELECT TOP (#StepIncrement)
T.iID
,T.DT
,T.RepID
,T.Tag
,T.xmiID
,T.iBegin
,T.iEnd
,T.Confidence
,T.Polarity
,T.Uncertainty
,T.Conditional
,T.Generic
,T.HistoryOf
,T.CodingScheme
,T.Code
,T.CUI
,T.TUI
,T.PreferredText
,T.ValueBegin
,T.ValueEnd
,T.Value
,T.Deleted
,T.sKey
,R.RepType
FROM Recovery..tags AS T
INNER JOIN Recovery..Reps AS R ON T.RepID = R.RepID
WHERE NOT EXISTS (
SELECT 1
FROM R3..Tags AS rt
WHERE T.iID = rt.iID
)
ORDER BY
T.iID;
SET #RowsToProcess = #RowsToProcess - #StepIncrement;
END;

Index scan on SQL update statement

I have the following SQL statement, which I would like to make more efficient. Looking through the execution plan I can see that there is a Clustered Index Scan on #newWebcastEvents. Is there a way I can make this into a seek? Or are there any other ways I can make the below more efficient?
declare #newWebcastEvents table (
webcastEventId int not null primary key clustered (webcastEventId) with (ignore_dup_key=off)
)
insert into #newWebcastEvents
select wel.WebcastEventId
from WebcastChannelWebcastEventLink wel with (nolock)
where wel.WebcastChannelId = 1178
Update WebcastEvent
set WebcastEventTitle = LEFT(WebcastEventTitle, CHARINDEX('(CLONE)',WebcastEventTitle,0) - 2)
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)

The #newWebcastEvents table variable only contains the only single column, and you're asking for all rows of that table variable in this where clause:
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)
So doing a seek on this clustered index is usually rather pointless - SQL Server query optimizer will need all columns, all rows of that table variable anyway..... so it chooses an index scan.
I don't think this is a performance issue, anyway.
An index seek is useful if you need to pick a very small number of rows (<= 1-2% of the original number of rows) from a large table. Then, going through the clustered index navigation tree and finding those few rows involved makes a lot more sense than scanning the whole table. But here, with a single int column and 15 rows --> it's absolutely pointless to seek, it will be much faster to just read those 15 int values in a single scan and be done with it...
Update: no sure if it makes any difference in terms of performance, but I personally typically prefer to use joins rather than subselects for "connecting" two tables:
UPDATE we
SET we.WebcastEventTitle = LEFT(we.WebcastEventTitle, CHARINDEX('(CLONE)', we.WebcastEventTitle, 0) - 2)
FROM dbo.WebcastEvent we
INNER JOIN #newWebcastEvents nwe ON we.WebcastEventId = nwe.webcastEventId

Why does this SQL query take 8 hours to finish?

There is a simple SQL JOIN statement below:
SELECT
REC.[BarCode]
,REC.[PASSEDPROCESS]
,REC.[PASSEDNODE]
,REC.[ENABLE]
,REC.[ScanTime]
,REC.[ID]
,REC.[Se_Scanner]
,REC.[UserCode]
,REC.[aufnr]
,REC.[dispatcher]
,REC.[matnr]
,REC.[unitcount]
,REC.[maktx]
,REC.[color]
,REC.[machinecode]
,P.PR_NAME
,N.NO_NAME
,I.[inventoryID]
,I.[status]
FROM tbBCScanRec as REC
left join TB_R_INVENTORY_BARCODE as R
ON REC.[BarCode] = R.[barcode]
AND REC.[PASSEDPROCESS] = R.[process]
AND REC.[PASSEDNODE] = R.[node]
left join TB_INVENTORY as I
ON R.[inventid] = I.[id]
INNER JOIN TB_NODE as N
ON N.NO_ID = REC.PASSEDNODE
INNER JOIN TB_PROCESS as P
ON P.PR_CODE = REC.PASSEDPROCESS
The table tbBCScanRec has 556553 records while the table TB_R_INVENTORY_BARCODE has 260513 reccords and the table TB_INVENTORY has 7688. However, the last two tables (TB_NODE and TB_PROCESS) both have fewer than 30 records.
Incredibly, when it runs in SQL Server 2005, it takes 8 hours to return the result set.
Why does it take so much time to execute?
If the two inner joins are removed, it takes just ten seconds to finish running.
What is the matter?
There are at least two UNIQUE NONCLUSTERED INDEXes.
One is IX_INVENTORY_BARCODE_PROCESS_NODE on the table TB_R_INVENTORY_BARCODE, which covers four columns (inventid, barcode, process, and node).
The other is IX_BARCODE_PROCESS_NODE on the table tbBCScanRec, which covers three columns (BarCode, PASSEDPROCESS, and PASSEDNODE).

Well, standard answer to questions like this:
Make sure you have all the necessary indexes in place, i.e. indexes on N.NO_ID, REC.PASSEDNODE, P.PR_CODE, REC.PASSEDPROCESS
Make sure that the types of the columns you join on are the same, so that no implicit conversion is necessary.

You are working with around (556553 *30 *30) 500 millions of rows.
You probably have to add indexes on your tables.
If you are using SQL server, you can watch the plan query to see where you are losing time.
See the documentation here : http://msdn.microsoft.com/en-us/library/ms190623(v=sql.90).aspx
The query plan will help you to create indexes.

When you check the indexing, there should be clustered indexes as well - the nonclustered indexes use the clustered index, so not having one would render the nonclustered useless. Out-dated statistics could also be a problem.
However, why do you need to fetch ALL of the data? What is the purpose of that? You should have WHERE clauses restricting the result set to only what you need.

Slow query with unexpected index scan

I have this query:
SELECT *
FROM sample
INNER JOIN test ON sample.sample_number = test.sample_number
INNER JOIN result ON test.test_number = result.test_number
WHERE sampled_date BETWEEN '2010-03-17 09:00' AND '2010-03-17 12:00'
the biggest table here is RESULT, contains 11.1M records. The left 2 tables about 1M.
this query works slowly (more than 10 minutes) and returns about 800 records. executing plan shows clustered index scan (over it's PRIMARY KEY (result.result_number, which actually doesn't take part in query)) over all 11M records.
RESULT.TEST_NUMBER is a clustered primary key.
if I change 2010-03-17 09:00 to 2010-03-17 10:00 - i get about 40 records. it executes for 300ms. and plan shows index seek (over result.test_number index)
if i replace * in SELECT clause to result.test_number (covered with index) - then all become fast in first case too. this points to hdd IO issues, but doesn't clarifies changing plan.
so, any ideas?
UPDATE:
sampled_date is in table sample and covered by index.
other fields from this query: test.sample_number is covered by index and result.test_number too.
UPDATE 2:
obviously than sql server in any reasons don't want to use index.
i did a small experiment: i remove INNER JOIN with result, select all test.test_number and after that do
SELECT * FROM RESULT WHERE TEST_NUMBER IN (...)
this, of course, works fast. but i cannot get what is the difference and why query optimizer choose such inappropriate way to select data in 1st case.
UPDATE 3:
after backing up database and restoring to database with new name - both requests work fast as expected even on much more ranges...
so - are there any special commands to clean or optimize, whatever, that can be relevant to this? :-(

A couple things to try:
Update statistics
Add hints to the query about what index to use (in SQL Server you might say WITH (INDEX(myindex)) after specifying a table)
EDIT: You noted that copying the database made it work, which tells me that the index statistics were out of date. You can update them with something like UPDATE STATISTICS mytable on a regular basis.
Use EXEC sp_updatestats to update the whole database.

The first thing I would do is specify the exact columns I want, and see if the problems persists. I doubt you would need all the columns from all three tables.
It sounds like it has trouble getting all the rows out of the result table. How big is a row? Look at how big all the data in the table is and divide it by the number of rows. Right click on the table -> properties..., Storage tab.
Try putting where clause into a subquery to force it to do that first?
SELECT *
FROM
(SELECT * FROM sample
WHERE sampled_date
BETWEEN '2010-03-17 09:00' AND '2010-03-17 12:00') s
INNER JOIN test ON s.sample_number = test.sample_number
INNER JOIN result ON test.test_number = result.test_number
OR this might work better if you expect a small number of samples
SELECT *
FROM sample
INNER JOIN test ON sample.sample_number = test.sample_number
INNER JOIN result ON test.test_number = result.test_number
WHERE sample.sample_ID in (
SELECT sample_ID
FROM sample
WHERE sampled_date BETWEEN '2010-03-17 09:00' AND '2010-03-17 12:00'
)

If you do a SELECT *, you want all the data from the table. The data for the table is in the clustered index - the leaf nodes of the clustered index are the data pages.
So if you want all of those data pages anyway, and since you're joining 1 mio. rows to 11 mio. rows (1 out of 11 isn't very selective for SQL Server), using an index to find the rows, and then do bookmark lookups into the actual data pages for each of those rows found, might just not be very efficient, and thus SQL Server uses the clustered index scan instead.
So to make a long story short: only select those rows you really need! You thus give SQL Server a chance to use an index, do a seek there, and find the necessary data.
If you only select three, four columns, then the chances that SQL Server will find and use an index that contains those columns are just so much higher than if you ask for all the data from all the tables involved.
Another option would be to try and find a way to express a subquery, using e.g. a Common Table Expression, that would grab data from the two smaller tables, and reduce that number of rows even more, and join the hopefully quite small result against the main table. If you have a small result set of only 40 or 800 results (rather than two tables with 1 mio. rows each), then SQL Server might be more inclined to use a Clustered Index Seek and do bookmark lookups on 40 or 800 rows, rather than doing a full Clustered Index Scan.

SQL Server slow select from large table

I have a table with about 20+ million records.
Structure is like:
EventId UNIQUEIDENTIFIER
SourceUserId UNIQUEIDENTIFIER
DestinationUserId UNIQUEIDENTIFIER
CreatedAt DATETIME
TypeId INT
MetaId INT
Table is receiving about 100k+ records each day.
I have indexes on each column except MetaId, as it is not used in 'where' clauses
The problem is when i want to pick up eg. latest 100 records for desired SourceUserId
Query sometimes takes up to 4 minutes to execute, which is not acceptable.
Eg.
SELECT TOP 100 * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND
(
TypeId IN (2, 3, 4)
OR
(TypeId = 60 AND SrcMemberId != DstMemberId)
)
ORDER BY CreatedAt DESC
I can't do partitioning etc as I am using Standard version of SQL Server and Enterprise is too expensive.
I also think that the table is quite small to be that slow.
I think the problem is with ORDER BY clause as db must go through much bigger set of data.
Any ideas how to make it quicker ?
Perhaps relational database is not a good idea for that kind of data.
Data is always being picked up ordered by CreatedAt DESC
Thank you for reading.
PabloX

You'll likely want to create a composite index for this type of query - when the query runs slowly it is most likely choosing to scan down an index on the CreatedAt column and perform a residual filter on the SourceUserId value, when in reality what you want to happen is to jump directly to all records for a given SourceUserId ordered properly - to achieve this, you'll want to create a composite index primarily on SourceUserId (performing an equality check) and secondarily on CreateAt (to preserve the order within a given SourceUserId value). You may want to try adding the TypeId in as well, depending on the selectivity of this column.
So, the 2 that will most likely give the best repeatable performance (try them out and compare) would be:
Index on (SourceUserId, CreatedAt)
Index on (SourceUserId, TypeId, CreatedAt)
As always, there are also many other considerations to take into account with determining how/what/where to index, as Remus discusses in a separate answer one big consideration is covering the query vs. keeping lookups. Additionally you'll need to consider write volumes, possible fragmentation impact (if any), singleton lookups vs. large sequential scans, etc., etc.

I have indexes on each column except
MetaId
Non-covering indexes will likely hit the 'tipping point' and the query would revert to a table scan. Just adding an index on every column because it is used in a where clause does not equate good index design. To take your query for example, a good 100% covering index would be:
INDEX ON (SourceUserId , CreatedAt) INCLUDE (TypeId, SrcMemberId, DstMemberId)
Following index is also usefull, altough it still going to cause lookups:
INDEX ON (SourceUserId , CreatedAt) INCLUDE (TypeId)
and finaly an index w/o any included column may help, but is just as likely will be ignored (depends on the column statistics and cardinality estimates):
INDEX ON (SourceUserId , CreatedAt)
But a separate index on SourceUSerId and one on CreatedAt is basically useless for your query.
See Index Design Basics.

The fact that the table has indexes built on GUID values, indicates a possible series of problems that would affect performance:
High index fragmentation: since new GUIDs are generated randomly, the index cannot organize them in a sequential order and the nodes are spread unevenly.
High number of page splits: the size of a GUID (16 bytes) causes many page splits in the index, since there's a greater chance than a new value wont't fit in the remaining space available in a page.
Slow value comparison: comparing two GUIDs is a relatively slow operation because all 33 characters must be matched.
Here a couple of resources on how to investigate and resolve these problems:
How to Detect Index Fragmentation in SQL Server 2000 and 2005
Reorganizing and Rebuilding Indexes
How Using GUIDs in SQL Server Affect Index Performance

I would recomend getting the data in 2 sep var tables
INSERT INTO #Table1
SELECT * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND
(
TypeId IN (2, 3, 4)
)
INSERT INTO #Table2
SELECT * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND
(
(TypeId = 60 AND SrcMemberId != DstMemberId)
)
then apply a unoin from the selects, ordered and top. Limit the data from the get go.

I suggest using a UNION:
SELECT TOP 100 x.*
FROM (SELECT a.*
FROM EVENTS a
WHERE a.typeid IN (2, 3, 4)
UNION ALL
SELECT b.*
FROM EVENTS b
WHERE b.typeid = 60
AND b.srcmemberid != b.dstmemberid) x
WHERE x.sourceuserid = '15b534b17-5a5a-415a-9fc0-7565199c3461'

We've realised a minor gain by moving to a BIGINT IDENTITY key for our event table; by using that as a clustered primary key, we can cheat and use that for date ordering.

I would make sure CreatedAt is indexed properly

you could split the query in two with an UNION to avoid the OR (which can cause your index not to be used), something like
SElect * FROM(
SELECT TOP 100 * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND TypeId IN (2, 3, 4)
UNION SELECT TOP 100 * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND TypeId = 60 AND SrcMemberId != DstMemberId
)
ORDER BY CreatedAt DESC
Also, check that the uniqueidentifier indexes are not CLUSTERED.

If there are 100K records added each day, you should check your index fragmentation.
And rebuild or reorganize it accordingly.
More info :
SQLauthority

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas