Adding WHERE adds 20+ seconds to SQL Azure query - sql

I'm looking for some advice on speeding up queries in SQL Azure. This is an example of the two queries we're running, when we add a WHERE clause on there, the queries grind to a halt.
Both columns, theTime and orderType are indexed. Can anyone suggest how to make these run faster, or things to do to the query to make it more efficient?
5.2 seconds:
sum(cast(theTime AS INT)) as totalTime from Orders
20.2 seconds:
sum(cast(theTime AS INT)) as totalTime from Orders WHERE orderType='something_in_here'
Here's the relevant information:
CREATE TABLE [dbo].[Orders] (
[ID] int IDENTITY(1,1) NOT NULL,
[orderType] nvarchar(90) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[orderTime] nvarchar(90) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PrimaryKey_fe2bdbea-c65a-0b85-1de9-87324cc29bff] PRIMARY KEY CLUSTERED ([ID])
WITH (IGNORE_DUP_KEY = OFF)
)
GO
CREATE NONCLUSTERED INDEX [orderTime]
ON [dbo].[Orders] ([orderTime] ASC)
WITH (IGNORE_DUP_KEY = OFF,
STATISTICS_NORECOMPUTE = OFF,
ONLINE = OFF)
GO
CREATE NONCLUSTERED INDEX [actiontime_int]
CREATE NONCLUSTERED INDEX [orderType]
ON [dbo].[Orders] ([orderType] ASC)
WITH (IGNORE_DUP_KEY = OFF,
STATISTICS_NORECOMPUTE = OFF,
ONLINE = OFF)
GO

I suspect your query is not doing what you think. It is taking the first million counts, rather than the count of the first million rows. I think you want:
select sum(cast(theTime AS INT))
from (select top (1000000) Orders
from Orders
) t
versus:
select sum(cast(theTime AS INT))
from (select top (1000000) Orders
from Orders
WHERE orderType='something_in_here'
) t
My suspicion is that using the index actually slows things down, depending on the selectivity of the where clause.
In the original query, you are reading all the data, sequentially. This is fast, because the pages just cycle through the processor.
Going through the index slows things down, because the pages are not read in order. You may be still be reading all the pages (if every page has a matching row), but they are no longer being read in "physical" or "logical" order. They are being read in the order of the index -- which is likely to be random.

Related

Query with index incredibly slow on Sort

I have a database table with about 3.5 million rows. The table holds contract data records, with an amount, a date, and some IDs related to other tables (VendorId, AgencyId, StateId), this is the database table:
CREATE TABLE [dbo].[VendorContracts]
(
[Id] [uniqueidentifier] NOT NULL,
[ContractDate] [datetime2](7) NOT NULL,
[ContractAmount] [decimal](19, 4) NULL,
[VendorId] [uniqueidentifier] NOT NULL,
[AgencyId] [uniqueidentifier] NOT NULL,
[StateId] [uniqueidentifier] NOT NULL,
[CreatedBy] [nvarchar](max) NULL,
[CreatedDate] [datetime2](7) NOT NULL,
[LastModifiedBy] [nvarchar](max) NULL,
[LastModifiedDate] [datetime2](7) NULL,
[IsActive] [bit] NOT NULL,
CONSTRAINT [PK_VendorContracts]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I have a page on my site where the user can filter a paged grid by VendorId and ContractDate, and sort by the ContractAmount or ContractDate. This is the query that EF Core produces when sorting by ContractAmount for this particular vendor that has over a million records:
DECLARE #__vendorId_0 uniqueIdentifier = 'f39c7198-b05a-477e-b7bc-cb189c5944c0';
DECLARE #__startDate_1 datetime2 = '2017-01-01T07:00:00.0000000';
DECLARE #__endDate_2 datetime2 = '2018-01-02T06:59:59.0000000';
DECLARE #__p_3 int = 0;
DECLARE #__p_4 int = 50;
SELECT [v].[Id], [v].[AdminFee], [v].[ContractAmount], [v].[ContractDate], [v].[PONumber], [v].[PostalCode], [v].[AgencyId], [v].[StateId], [v].[VendorId]
FROM [VendorContracts] AS [v]
WHERE (([v].[VendorId] = #__vendorId_0) AND ([v].[ContractDate] >= #__startDate_1)) AND ([v].[ContractDate] <= #__endDate_2)
ORDER BY [v].[ContractAmount] ASC
OFFSET #__p_3 ROWS FETCH NEXT #__p_4 ROWS ONLY
When I run this, it takes 50s, whether sorting ASC or DESC or offsetting by thousands, it's always 50s.
If I look at my Execution Plan, I see that it does use my index, but the Sort Cost is what's making the query take so long
This is my index:
CREATE NONCLUSTERED INDEX [IX_VendorContracts_VendorIdAndContractDate] ON [dbo].[VendorContracts]
(
[VendorId] ASC,
[ContractDate] DESC
)
INCLUDE([ContractAmount],[AdminFee],[PONumber],[PostalCode],[AgencyId],[StateId])
WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)
The strange thing is that I have a similar index for sorting by ContractDate, and that one returns results in less than a second, even on the vendor that has millions of records.
Is there something wrong with my index? Or is sorting by a decimal data type just incredibly intensive?
You have an index that allows the
VendorId = #__vendorId_0 and ContractDate BETWEEN #__startDate_1 AND #__endDate_2
predicate to be seeked exactly.
SQL Server estimates that 6,657 rows will match this predicate and need to be sorted so it requests a memory grant suitable for that amount of rows.
In reality for the parameter values where you see the problem nearly half a million are sorted and the memory grant is insufficient and the sort spills to disc.
50 seconds for 10,299 spilled pages does still sound unexpectedly slow but I assume you may well be on some very low SKU in Azure SQL Database?
Some possible solutions to resolve the issue might be to
Force it to use an execution plan that is compiled for parameter values with your largest vendor and wide date range (e.g. with OPTIMIZE FOR hint). This will mean an excessive memory grant for smaller vendors though which may mean other queries have to incur memory grant waits.
Use OPTION (RECOMPILE) so every invocation is recompiled for the specific parameter values passed. This means in theory every execution will get an appropriate memory grant at the cost of more time spent in compilation.
Remove the need for a sort at all. If you have an index on VendorId, ContractAmount INCLUDE (ContractDate) then the VendorId = #__vendorId_0 part can be seeked and the index read in ContractAmount order. Once 50 rows have been found that match the ContractDate BETWEEN #__startDate_1 AND #__endDate_2 predicate then query execution can stop. SQL Server might not choose this execution plan without hints though.
I'm not sure how easy or otherwise it is to apply query hints through EF but you could look at forcing a plan via query store if you manage to get the desired plan to appear there.

How to best create an index for a large table with no primary key?

First off, I am not a database programmer.
I have built the following table for stock market tick data:
CREATE TABLE [dbo].[Tick]
(
[trade_date] [int] NOT NULL,
[delimiter] [tinyint] NOT NULL,
[time_stamp] [int] NOT NULL,
[exchange] [tinyint] NOT NULL,
[symbol] [varchar](10) NOT NULL,
[price_field] [tinyint] NOT NULL,
[price] [int] NOT NULL,
[size_field] [tinyint] NOT NULL,
[size] [int] NOT NULL,
[exchange2] [tinyint] NOT NULL,
[trade_condition] [tinyint] NOT NULL
) ON [PRIMARY]
GO
The table will store 6 years of data to begin with. At an average of 300 million ticks per day that would be about 450 billion rows.
Common query on this table is to get all the ticks for some symbol(s) over a date range:
SELECT
trade_date, time_stamp, symbol, price, size
WHERE
trade_date > 20160101 and trade_date < 20170101
AND symbol = 'AAPL'
AND price_field = 0
ORDER BY
trade_date, time_stamp
This is my first attempt at an index:
CREATE UNIQUE CLUSTERED INDEX [ClusteredIndex-20180324-183113]
ON [dbo].[Tick]
(
[trade_date] ASC,
[symbol] ASC,
[time_stamp] ASC,
[price_field] ASC,
[delimiter] ASC,
[exchange] ASC,
[price] ASC,
[size_field] ASC,
[size] ASC,
[exchange2] ASC,
[trade_condition] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
First, I put date before symbol because there's less days than symbol so the shorter path is to get to date first.
I have included all the columns I would potentially need to retrieve. When I tested building it for one day's worth of data the size of the index was relatively quite large, about 4gb for a 20gb table.
Two questions:
Is my not including a primary key to save space a wise choice assuming my query requirements don't change?
Would I save space if I only include trade_date and symbol in the index? How would that affect performance, because I've been told I need to include all the columns I need in the index otherwise retrieval would be very slow because it would have to go back to the primary key to find the values of columns not included in the index. If this is true, how would that even work when my table doesn't have a primary key?
Your unique clustered index should contain the minimum amount of columns necessary to uniquely identify a row in your table. If that means almost every column in your table, I would think you should add an artificial primary key. Cutting an artificial primary key to save space is a poor decision IMO, only cut it if you can create a natural primary key out of your data.
The clustered index is essentially where all your data is stored. The leaf nodes of the index contain all the data for that row, the columns that make up the index determine how to reach those leaf nodes.
Including extra columns in your index to speed up queries only applies to NONCLUSTERED indexes, as there the leaf node generally only contains a lookup value. For these indexes, the way to include extra columns is to use the INCLUDE clause, not just list them all as part of the index. For example.
CREATE NONCLUSTERED INDEX [IX_TickSummary] ON [dbo].[Tick]
(
[trade_date] ASC,
[symbol] ASC
)
INCLUDE (
[time_stamp],
[price],
[size],
[price_field]
)
This is a concept known as creating a covering index, where the index itself contains all the columns needed to process your query so no additional lookup into the data table is needed. The up side of this is increased speed. The down side is that those INCLUDE'ed columns are essentially duplicated resulting in a large index and eating more space.
Include columns that are used very frequently, such as those used to generate summary listings. Columns that are queried infrequently, such as those only needed in detailed views, should be left out of the index to save space.
Potentially helpful reading: Using Covering Indexes to Improve Query Performance
Looking at your most common query, you should create a composite index based first on the columns involved in the where clause:
trade_date, simbol, price_field
then in select
time_stamp, symbol, price, size
This way, you can use the index for where and select column retrieving avoiding access to the data table
trade_date, simbol, price_field, time_stamp, symbol, price, size
In your sequence you have time_stamp before price_field .. a select column before a where column this don't let the db engine use completely the power of index

Convincing SQL server to search backwards on clustered index for an insert only schema

Our database has many tables which follow an "insert only" schema. Rows are added to the end, the "current" value can then be found by finding the most recently inserted row for each logical key.
Here's an example:
CREATE TABLE [dbo].[SPOTQUOTE](
[ID] [numeric](19, 0) NOT NULL,
[QUOTETYPE] [varchar](255) NOT NULL,
[QUOTED_UTC_TS] [datetime] NOT NULL,
[QUOTED_UTC_MILLIS] [smallint] NOT NULL,
[RECEIVED_UTC_TS] [datetime] NOT NULL,
[RECEIVED_UTC_MILLIS] [smallint] NOT NULL,
[VALUE_ASK] [float] NULL,
[VALUE_BID] [float] NULL,
[FEEDITEM_ID] [numeric](19, 0) NOT NULL,
[SAMPLING] [int] NOT NULL,
CONSTRAINT [SPOTQUOTE_pk1] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
The logical key of this table is "feeditem_id". However, so that we can perform historical queries, instead, rows are only ever inserted into this table at the end, using "ID" as the actual physical key.
Therefore, we know that the max(id) for each distinct feeditem_id is going to be found towards the end of the table, not the beginning.
When querying the table, we want to find the "latest" update for each "feeditem_id", which is the "logical key" for this table.
The following is the query we want:
select feeditem_id, max(id)
from spotquote
group by feeditem_id
having feeditem_id in (827, 815, 806)
so that we have the latest id for each feeditem_id.
Unfortunately, SQL server 2008 generates a sub optimal query plan for this query.
From my understanding of SQL, the fact that this is selecting for the max id, which is the primary clustered key, implies that the optimal query plan for this is to:
Start at the end of the table
Walk backwards, keeping track of the max id encountered so far for each feeditem_id
Stop once an Id for each feeditem_id has been found
I would expect this to be extremely fast.
First question: is there some way I can explicitly tell SQL server to execute the above query plan?
I have tried:
SELECT feeditem_id, max(ID) as latest from SPOTQUOTE with (index(SPOTQUOTE_pk1)) group by feeditem_id
having FEEDITEM_ID in (827, 815, 806)
But, in practise, it seems to execute even more slowly.
I am wondering if the "clustered index scan" is walking the table forwards instead of backwards... Is there a way I can confirm if this is what is happening?
How can I confirm that this clustered index scan works from the back of the table, and how can I convince SQL server to search the clustered index backwards instead?
Update
The issue is indeed that the clustered index scan is not searching backwards when I perform a group by.
In contrast, the following SQL query produces essentially the correct query plan:
select
FEEDITEM_ID, MAX(id) from
(select top 100 * from
SPOTQUOTE
where FEEDITEM_ID in (827,815,806)
order by ID desc) s
group by feeditem_id
I can see it in management studio that "Ordered = True" and "Scan Direction = BACKWARD":
and it executes blindingly fast - 2 milliseconds - and it "almost certainly" works.
I just want it to "stop" once it has found an entry for each feed id rather than the first 100 entries.
It's frustrating that there seems to be no way to tell SQL server to execute this obviously more efficient query.
If I do a normal "group by" with appropriate indexes on feeditem_id and id, it's faster - about 300 ms - but still that's still 100 times slower than the backwards clustered index scan.
SQL Server is unable to produce such a query plan as of 2012. Rewrite the query:
SELECT ids.feeditem_id, MaxID
FROM (VALUES (827), (815), (806)) ids(feeditem_id)
CROSS APPLY (
select TOP 1 ID AS MaxID
from spotquote sq
where sq.feeditem_id = ids.feeditem_id
ORDER BY ID DESC
) x
This results in a plan that does a seek into the spotquote table per ID that you specify. This is the best we can do. SQL Server is unable to abort an aggregation as soon as all groups you are interested in have at least one value.

SQL Server 2008 included columns

I recently discovered included columns in SQL Server indexes. Do included columns in an index take up extra memory or are they stored on disk?
Also can someone point me to performance implications of including columns of differing data types as included columns in a Primary Key, which in my case is typically an in?
Thanks.
I don't fully understand the question: "Do included columns in an index take up extra memory or are they stored on disk?" Indexes are both stored on disk (for persistence) and in memory (for performance when being used).
The answer to your question is that the non-key columns are stored in the index and hence are stored both on disk and memory, along with the rest of the index. Included columns do have a significant performance advantage over key columns in the index. To understand this advantage, you have to understand the key values may be stored more than once in a b-tree index structure. They are used both as "nodes" in the tree and as "leaves" (the latter point to the actual records in the table). Non-key values are stored only in leaves, providing potentially a big savings in storage.
Such a savings means that more of the index can be stored in memory in a memory-limited environment. And that an index takes up less memory, allowing memory to be used for other things.
The use of included columns is to allow the index to be a "covering" index for queries, with a minimum of additional overhead. An index "covers" a query when all the columns needed for the query are in the index, so the index can be used instead of the original data pages. This can be a significant performance savings.
The place to go to learn more about them is the Microsoft documentation.
In SQL Server 2005 or upper versions, you can extend the functionality of nonclustered indexes by adding nonkey columns to the leaf level of the nonclustered index.
By including nonkey columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
• They can be data types not allowed as index key columns. ( All data types are allowed except text, ntext, and image.)
• They are not considered by the Database Engine when calculating the number of index key columns or index key size. You can include nonkey columns in a nonclustered index to avoid exceeding the current index size limitations of a maximum of 16 key columns and a maximum index key size of 900 bytes.
An index with included nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
Example:
Create Table Script
CREATE TABLE [dbo].[Profile](
[EnrollMentId] [int] IDENTITY(1,1) NOT NULL,
[FName] [varchar](50) NULL,
[MName] [varchar](50) NULL,
[LName] [varchar](50) NULL,
[NickName] [varchar](50) NULL,
[DOB] [date] NULL,
[Qualification] [varchar](50) NULL,
[Profession] [varchar](50) NULL,
[MaritalStatus] [int] NULL,
[CurrentCity] [varchar](50) NULL,
[NativePlace] [varchar](50) NULL,
[District] [varchar](50) NULL,
[State] [varchar](50) NULL,
[Country] [varchar](50) NULL,
[UIDNO] [int] NOT NULL,
[Detail1] [varchar](max) NULL,
[Detail2] [varchar](max) NULL,
[Detail3] [varchar](max) NULL,
[Detail4] [varchar](max) NULL,
PRIMARY KEY CLUSTERED
(
[EnrollMentId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Stored procedure script
CREATE Proc [dbo].[InsertIntoProfileTable]
As
BEGIN
SET NOCOUNT ON
Declare #currentRow int
Declare #Details varchar(Max)
Declare #dob Date
set #currentRow =1;
set #Details ='Let''s think about the book. Every page in the book has the page number. All information in this book is presented sequentially based on this page number. Speaking in the database terms, page number is the clustered index. Now think about the glossary at the end of the book. This is in alphabetical order and allow you to quickly find the page number specific glossary term belongs to. This represents non-clustered index with glossary term as the key column. Now assuming that every page also shows "chapter" title at the top. If you want to find in what chapter is the glossary term, you have to lookup what page # describes glossary term, next - open corresponding page and see the chapter title on the page. This clearly represents key lookup - when you need to find the data from non-indexed column, you have to find actual data record (clustered index) and look at this column value. Included column helps in terms of performance - think about glossary where each chapter title includes in addition to glossary term. If you need to find out what chapter the glossary term belongs - you don''t need to open actual page - you can get it when you lookup the glossary term. So included column are like those chapter titles. Non clustered Index (glossary) has addition attribute as part of the non-clustered index. Index is not sorted by included columns - it just additional attributes that helps to speed up the lookup (e.g. you don''t need to open actual page because information is already in the glossary index).'
while(#currentRow <=200000)
BEGIN
insert into dbo.Profile values( 'FName'+ Cast(#currentRow as varchar), 'MName' + Cast(#currentRow as varchar), 'MName' + Cast(#currentRow as varchar), 'NickName' + Cast(#currentRow as varchar), DATEADD(DAY, ROUND(10000*RAND(),0),'01-01-1980'),NULL, NULL, #currentRow%3, NULL,NULL,NULL,NULL,NULL, 1000+#currentRow,#Details,#Details,#Details,#Details)
set #currentRow +=1;
END
SET NOCOUNT OFF
END
GO
Using the above SP you can insert 200000 records at one time.
You can see that there is a clustered index on column “EnrollMentId”.
Now Create a non-Clustered index on “ UIDNO” Column.
Script
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20140216-223309] ON [dbo].[Profile]
(
[UIDNO] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Now Run the following Query
select UIDNO,FName,DOB, MaritalStatus, Detail1 from dbo.Profile --Takes about 30-50 seconds and return 200,000 results.
Query 2
select UIDNO,FName,DOB, MaritalStatus, Detail1 from dbo.Profile
where DOB between '01-01-1980' and '01-01-1985'
--Takes about 10-15 seconds and return 36,479 records.
Now drop the above non-clustered index and re-create with following script
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20140216-231011] ON [dbo].[Profile]
(
[UIDNO] ASC,
[FName] ASC,
[DOB] ASC,
[MaritalStatus] ASC,
[Detail1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
It will throw the following error
Msg 1919, Level 16, State 1, Line 1
Column 'Detail1' in table 'dbo.Profile' is of a type that is invalid for use as a key column in an index.
Because we can not use varchar(Max) datatype as key column.
Now Create a non-Clustered Index with included columns using following script
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20140216-231811] ON [dbo].[Profile]
(
[UIDNO] ASC
)
INCLUDE ( [FName],
[DOB],
[MaritalStatus],
[Detail1]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Now Run the following Query
select UIDNO,FName,DOB, MaritalStatus, Detail1 from dbo.Profile --Takes about 20-30 seconds and return 200,000 results.
Query 2
select UIDNO,FName,DOB, MaritalStatus, Detail1 from dbo.Profile
where DOB between '01-01-1980' and '01-01-1985'
--Takes about 3-5 seconds and return 36,479 records.
Included columns provide functionality similar to a clustered index where the row contents are kept in the leaf node of the primary index. In addition to the key columns in the index, additional attributes are kept in the index table leaf nodes.
This permits immediate access to the column values without having to access another page in the database. There is a trade off with increased index size and general storage against the improved response from not having to indirect through a page reference in the index. The impact is likely similar with adding multiple indices to tables.
From here:-
An index with nonkey columns can significantly improve query
performance when all columns in the query are included in the index
either as key or nonkey columns. Performance gains are achieved
because the query optimizer can locate all the column values within
the index; table or clustered index data is not accessed resulting in
fewer disk I/O operations.

Slow SQL performance

I have a query as follows;
SELECT COUNT(Id) FROM Table
The table contains 33 million records - it contains a primary key on Id and no other indices.
The query takes 30 seconds.
The actual execution plan shows it uses a clustered index scan.
We have analysed the table and found it isn't fragmented using the first query shown in this link: http://sqlserverpedia.com/wiki/Index_Maintenance.
Any ideas as to why this query is so slow and how to fix it.
The Table Definition:
CREATE TABLE [dbo].[DbConversation](
[ConversationID] [int] IDENTITY(1,1) NOT NULL,
[ConversationGroupID] [int] NOT NULL,
[InsideIP] [uniqueidentifier] NOT NULL,
[OutsideIP] [uniqueidentifier] NOT NULL,
[ServerPort] [int] NOT NULL,
[BytesOutbound] [bigint] NOT NULL,
[BytesInbound] [bigint] NOT NULL,
[ServerOutside] [bit] NOT NULL,
[LastFlowTime] [datetime] NOT NULL,
[LastClientPort] [int] NOT NULL,
[Protocol] [tinyint] NOT NULL,
[TypeOfService] [tinyint] NOT NULL,
CONSTRAINT [PK_Conversation_1] PRIMARY KEY CLUSTERED
(
[ConversationID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
One thing I have noticed is the database is set to grow in 1Mb chunks.
It's a live system so we restricted in what we can play with - any ideas?
UPDATE:
OK - we've improved performance in the actual query of interest by adding new non-clustered indices on appropriate columns so it's not a critical issue anymore.
SELECT COUNT is still slow though - tried it with NOLOCK hints - no difference.
We're all thinking it's something to do with the Autogrowth set to 1Mb rather than a larger number, but surprised it has this effect. Can MDF fragmentation on the disk be a possible cause?
Is this a frequently read/inserted/updated table? Is there update/insert activity concurrent with your select?
My guess is the delay is due to contention.
I'm able to run a count on 189m rows in 17 seconds on my dev server, but there's nothing else hitting that table.
If you aren't too worried about contention or absolute accuracy you can do:
exec sp_spaceused 'MyTableName' which will give a count based on meta-data.
If you want a more exact count but don't necessarily care if it reflect concurrent DELETE or INSERT activity you can do your current query with a NOLOCK hint:
SELECT COUNT(id) FROM MyTable WITH (NOLOCK) which will not get row-level locks for your query and should run faster.
Thoughts:
Use SELECT COUNT(*) which is correct for "how many rows" (as per ANSI SQL). Even if ID is the PK and thus not nullable, SQL Server will count ID. Not rows.
If you can live with approximate counts, then use sys.dm_db_partition_stats. See my answer here: Fastest way to count exact number of rows in a very large table?
If you can live with dirty reads use WITH (NOLOCK)
use [DatabaseName]
select tbl.name, dd.rows from sysindexes dd
inner join sysobjects tbl on dd.id = tbl.id where dd.indid < 2 and tbl.xtype = 'U'
select sum(dd.rows)from sysindexes dd
inner join sysobjects tbl on dd.id = tbl.id where dd.indid < 2 and tbl.xtype = 'U'
By using these queries you can fetch all tables' count within 0-5 seconds
use where clause according to your requirement.....
Another idea: When the files grow with 1MB parts, it may be fragmented on the file system. You can't see this by SQL, you see it using a disk defragmentation tool.