I have a SQL Server table containing around 50,000,000 rows and I need to run the following two queries on it:
SELECT Count(*) AS Total
FROM TableName WITH(NOLOCK)
WHERE Col1 = 'xxx' AND Col2 = 'yyy'
then
SELECT TOP 1 Col3
FROM TableName WITH(NOLOCK)
WHERE Col1 = 'xxx' AND Col2 = 'yyy'
ORDER BY TableNameId DESC
The table has the following structure:
dbo.TableName
TableNameId int PK
Col1 varchar(12)
Col2 varchar(256)
Col3 int
Timestamp datetime
As well as running queries on it, there are loads of inserts every second going into the table hence the NOLOCK. I've tried creating the following index:
NONCLUSTERED INDEX (Col1, Col2) INCLUDE (TableNameId, Col3)
I need these queries to return results as quick as possible (1 second max). At this stage, I have the ability to restructure the table as the data isn't live yet, and I can also get rid of the Timestamp field if I need to.
Firstly - include TableNameID in your index and not as included column, then you can specify descending in the index order.
That should speed up things regarding your TOP 1 ... ORDER BY TableNameId DESC
Secondly - check up on how much time is I/O for example (SET STATISTICS IO ON) and how much is CPU (SET STATISTICS TIME ON).
If it is I/O there's not much you can do because you do have to move through a lot of data.
If you'd like a very fast estimate of how many rows are in the table, try quering sys.dm_db_partition_stats:
-- Report how many rows are in each table
SELECT o.name as [Table Name], ddps.row_count
FROM sys.indexes (nolock) AS i
INNER JOIN sys.objects (nolock) AS o ON i.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.dm_db_partition_stats (nolock) AS ddps ON i.OBJECT_ID = ddps.OBJECT_ID
AND i.index_id = ddps.index_id
WHERE i.index_id < 2
AND o.is_ms_shipped = 0 -- Remove to include system objects
AND ddps.row_count > 0
ORDER BY ddps.row_count DESC
If you have multiple partitions, this may not work. You might need to get the SUM of the row_count.
However, if you need an accurate count, you will need to count the rows, and this will take a while. Also, you may get the error "Could not continue scan with NOLOCK due to data movement."
You didn't mention how long your indexed query is running. The index and your query look fine to me.
Related
I have a query in SQL Server 2014 that takes a lot of time to get the results when I execute it.
When I remove the TOPor the ORDER BYintructions, it executes faster, but if I write both of them, it takes a lot of time.
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
How could I make it faster?
You say
When I remove the TOP or the ORDER BY ... it executes faster
Which would indicate that SQL Server has no problem generating the entire result set in the desired order. It just goes pear shaped with the limiting of TOP 10. This is a common issue with rowgoals. When SQL Server knows you just need the first few results it can choose a different plan attempting to optimise for this case that can backfire.
More recent versions include the hint DISABLE_OPTIMIZER_ROWGOAL to disable this on a per query basis. On older versions you can use QUERYTRACEON 4138 as below.
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
OPTION (QUERYTRACEON 4138)
You can use this to verify the cause but may find permissions to run QUERYTRACEON are a problem.
In that eventuality you can hide the TOP value in a variable as below
DECLARE #Top INT = 10
SELECT TOP (#Top) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
option (optimize for (#Top = 1000000))
create the index based on ID column of both tables
CREATE INDEX index_nameA
ON TableA (ID, DateValue)
;
CREATE INDEX index_nameB
ON TableB (ID)
it will create better plan in times of query execution
The best way would be to use the indexes to improve performance.
Here, in this case, the index can be put on (date_value).
For uses of indexes refer to this URL:using indexes
This is pretty hopeless, unless most of your data has an earlier date. If the date is special, you could create a computed persisted column to speed up the query in general. However, I doubt that is the case.
I can envision a better execution plan for the query phrased this way:
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA A
WHERE EXISTS (SELECT 1 FROM TableB b WHERE A.ID = B.ID) AND
A.DateValue > '1982-05-02'
ORDER BY ValueA;
with an indexes on TableA(ValueA, DateValue, Id, ColumnValue) and TableB(id). That execution plan would scan the index from the beginning and then do the test on DateValue and Id and return ColumnValue for the corresponding matching rows.
However, I don't think SQL Server would generate this plan (although it is worth a try), and I don't know how to force it if it doesn't.
I have a table that is already ordered by a column 'datetime'. Because when it is inserted I store the UTC date, so it is ordered. It's a very populated table. So I am trying to improve the query performance, if it is possible.
When I use something WHERE columnDateTime > dateToSearch it takes too long to return the rows. As my table is already ordered by columnDateTime what could I do to improve this query performance. For example, when a table is ordered by a cod and you try to search for cod > 40 T-SQL optimization will stop the search when it finds a cod = 41 and will return the rest of the table, cause it knows the table is ordered by that index. Is that a way that could tell T-SQL that my table is already ordered by that columnDateTime too?
Inserting the data in order doesn't mean it is saved in order. Without getting too technical and for faster performance:
Create a CLUSTERED INDEX on that column. This requiers that there are no other clustered indexes on you table and it doesn't have a PRIMARY KEY (or it has it NONCLUSTERED which is not the default). With a clustered index, the engine will do a index scan (not a full table scan) when filtering with > datetimeValue and doesn't need to access aditional pages for the data, since a clustered index leaves are the data.
Create a NONCLUSTERED INDEX on that column. No restrictions on this clause (at least for this case), but for each match with your filtered date, the engine will need to access another page with the requested columns, unless you INCLUDE them when creating your index. Keep in mind that inlcuded columns will raise the size of the index and will need additional maintenance tasks like, for example, when an included column is modified.
That aside, you should check your query plan; if you have joins, function calls or additional conditions, the SQL engine might not use the indexes even if they exist. There are many things that could make a query run slow, you will have to post the full query execution plan (for a start) to check the details.
You can use this query to check if your table already has indexes:
DECLARE #table_name VARCHAR(200) = 'YourTableName'
SELECT
SchemaName = SCHEMA_NAME(t.schema_id),
TableName = t.name,
IndexName = ind.name,
IndexType = CASE ind.index_id WHEN 0 THEN 'Heap' WHEN 1 THEN 'Clustered' ELSE 'Nonclustered' END,
Disabled = ind.is_disabled,
ColumnOrder = ic.index_column_id,
ColumnName = col.name,
ColumnType = y.name,
ColumnLength = y.max_length,
ColumnIncluded = ic.is_included_column
FROM
sys.indexes ind
INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id
INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id
INNER JOIN sys.tables t ON ind.object_id = t.object_id
INNER JOIN sys.types y ON y.user_type_id = col.user_type_id
WHERE
t.is_ms_shipped = 0 AND
t.name = #table_name
ORDER BY
SchemaName,
t.name,
ind.name,
ic.index_column_id
You need to make sure that there is at least one index that has your datetimeColumn with ColumnOrder = 1 and it's not disabled. If it already exists then your problem lies elsewhere and we won't be able to help much without more detail.
This question already has an answer here:
Is COUNT(*) indexed?
(1 answer)
Closed 9 years ago.
I have a MS SQL table with over 250 million rows. Whenever I execute the following query
SELECT COUNT(*) FROM table_name
it takes over 30 seconds to get me the output. Why is it taking so much time? Does this do a count when I query? I'm assuming till date that it stores this information somewhere (probably in the table meta data. I m not sure if table meta even exists).
Also, I would like to know if this query is IO/Processor/Memory intensive?
Thanks
Every time you execute SELECT COUNT(*) from TABLE SQL server actually goes through the table and counts all rows. To get estemated row count on one or more tables you can run the following query which gets stored information and returns in under 1 sec.
SELECT OBJECT_NAME(OBJECT_ID) TableName, st.row_count
FROM sys.dm_db_partition_stats st
WHERE index_id < 2
ORDER BY st.row_count DESC
Read more about it here http://technet.microsoft.com/en-us/library/ms187737.aspx
No, sql server dosen't store this information. It computes it every query. But it can cache execution plan to emprove perfomace. So, if you want to get results quickly, you need a primary key at least.
As for what SQL server is doing and how expensive it is, you can look this up yourself. In SSMS enable the execution plan button for the query and run a select count(*). You will see that the server actually does an index scan (full table scan). (I would have expected the PK to be used for that, but in my test case it used some other non-clustered index.).
To get a feel for the cost right-click your query editor window, select Query Options... -> Execution -> Advanced and activate the check boxes for SET STATISTICS TIME and SET STATISTICS IO. The messages tab will contain information regarding IO and timing after you re-executed the select statement.
Also note that a select count(*) is quite aggressive in terms of shared locks it uses. To guarantee the result the whole table will be locked with a shared lock.
A very fast, lock-free alternative is to use the meta data for the table. The count you get from the meta-data is almost always accurate, but there is no guarantee.
USE <database_name,,>
GO
SELECT ddps.row_count
FROM sys.indexes AS i
INNER JOIN sys.objects AS o
ON i.object_id = o.object_id
AND o.name = '<your_table,,>'
INNER JOIN sys.dm_db_partition_stats AS ddps
ON i.object_id = ddps.object_id
AND i.index_id = ddps.index_id
WHERE i.index_id = 1
This is a SSMS template. Copy this into a query window and hit CTRL+SHIFT+M to get a dialog that asks you for values for database_name and table_name.
If you are looking for approximation counts on tables and your version is greater than or equal to SQL Server 2005, you can simply use:
SELECT t.NAME AS 'TableName'
,s.Name AS 'TableSchema'
,p.rows AS 'RowCounts'
FROM sys.tables t
INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id
INNER JOIN sys.indexes i
ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p
ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
WHERE
t.is_ms_shipped = 0
GROUP BY
t.NAME, s.Name, p.Rows
ORDER BY
s.Name, t.Name
Doing a count(*) would only consume a small amount of memory/processor. It isn't that big of an operation in terms of database functions.
I have following SQL query, and it is taking long time to complete. I wanted to check if a different query can be used to achieve the same result, but could also help with performance.
Delete from table1 WHERE fk_id = '1234' AND pk_id NOT IN ('aaaa', 'bbbb', 'cccc');
Note: pk_id is PK column and fk_id is a FK for table1.
I don't have any suggestions about the query, it's pretty basic.
But you might want to try adding an index:
ALTER TABLE table1 ADD INDEX (fk_id);
With the data you have provided there seems to be no problem with your query as you have used 'AND' condition. Now you need to consider and analyze your database that why is your query taking so much time. So you might want to check out with the following things.
1) select count(1)
from table1
where fk_id = '1234' AND pk_id NOT IN ('aaaa', 'bbbb', 'cccc')
This will give you the count of rows to be deleted.
2) Try something like
SELECT
T.name AS TableName
,O.name TriggerName
FROM sysobjects O
INNER JOIN sys.tables T ON T.object_id = O.parent_obj
WHERE O.type = 'TR' AND T.name = 'table1'
This will tell you the triggers associated with your table.
3) You need to further investigate on the table properties for indexes, any constraints present over there.
I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)
In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:
ALTER TABLE dbo.Table1 ADD IsExcluded BIT;
Make it NOT NULL and default to 0. Then you could create a filtered index:
CREATE INDEX n ON dbo.Table1(name)
WHERE IsExcluded = 0;
Now you just have to update the table once:
UPDATE t
SET IsExcluded = 1
FROM dbo.Table1 AS t
INNER JOIN dbo.ExcludedCodes AS x
ON t.Code = x.Code;
And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:
SELECT #Count = COUNT(Name)
FROM dbo.Table1 WHERE IsExcluded = 0;
EDIT
As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:
EDIT 2
I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
GROUP BY src.Name;
Or the LEFT JOIN equivalent:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
LEFT OUTER JOIN dbo.ExcludedCodes AS x
ON src.Code = x.Code
WHERE x.Code IS NULL
GROUP BY src.Name;
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.
You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for #name in a temp table and index it and then join to it?
select t.name, count(t.name)
from table t
join #name n on t.name = n.name
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name
That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer
select t.name, count(t.name)
from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name
NOT EXISTS typically performs better than NOT IN, but you should test it on your system.
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code)
Without knowing more about your query it's tough to supply concrete optimization suggestions (i.e. code suitable for copy/paste). Does it really need to run 40,000 times? Sounds like your stored procedure needs reworking, if that's feasible. You could exec the above once at the start of the proc and insert the results in a temp table, which can keep the indexes from Table1, and then join on that instead of running this query.
This particular bit might not even be the bottleneck that makes your query run 27 minutes. For example, are you using a cursor over those 90 million rows, or scalar valued UDFs in your WHERE clauses?
Have you thought about doing the query once and populating the data in a table variable or temp table? Something like
insert into #temp (name, Namecount)
values Name, Count(name)
from table1
where name not in(select code from excludedcodes)
group by name
And don't forget that you could possibly use a filtered index as long as the excluded codes table is somewhat static.
Start evaluating the execution plan. Which is the heaviest part to compute?
Regarding the relation between the two tables, use a JOIN on indexed columns: indexes will optimize query execution.