Verify SQL Server statistics - sql

From my ASP.NET app using EF Core I get a query to the database that has quite a few joins (13). When I execute it, it works - but it takes 25 seconds to complete.
However, when I use "Legacy Cardinality Estimation" in database options the execution is just instant. As I understand the cardinality estimation is done based on statistics, therefore I executed exec sp_updatestats. While once it helped on the same db (but different query), this time it did not.
Therefore the first question that comes to my mind is: how do I verify the statistics are correct? And if they are why would cardinality estimator make bad choices?
Or more generally: how do I approach this problem without resorting to above mentioned option (turning on something legacy doesn't sound right)?

a good start to check is the indexes rowcnt/rows. you can usually see the issue here, though fixing it is either by updating statistics or getting the latest CU (cumulative update) or SP (service pack), and even then it may not be patched yet.
I had this problem in 2014 sp2 and sp3, the service packs alone did not fix the cardinality estimator. You need to get the latest Cumulative Update.
declare
#table_name varchar(128) = 'TableName'
,#schema_name varchar(128) = 'dbo'
/* 2014 and after I believe */
select max(p.rows) rows from sys.tables t
inner join sys.indexes i on t.object_id = i.object_id
inner join sys.partitions p on t.object_id = p.object_id and p.index_id = i.index_id -- current-ish
where t.name = #table_name and schema_name(t.schema_id) = #schema_name
/* 2008 to 2012 I believe */
select max(i.rows)
from sys.tables t
inner join sys.sysindexes i on i.id = t.object_id -- the old dbo.sysindexes becomes a view
where t.name = #table_name and schema_name(t.schema_id) = #schema_name
/* 2000 and possibly earlier */
select max(i.rows)
from sys.tables t
inner join dbo.sysindexes i on i.id = t.object_id -- a real live table, still selectable
where t.name = #table_name and schema_name(t.schema_id) = #schema_name

Unfortunately when it comes to CE everything is common ground. You said it yourself. 25 seconds vs < 1 seconds, and both used the same stats, updated or not, same indexes, same query and so on.
I had worked quite a lot about CE choice and it comes that (after stats, indexes and all standard recommended actions) one estimator may work very good and the other very bad. And this can be at the very same query with 1 filter difference or 2 different queries producing exact equal results.
In most cases both perform well or acceptable. Sometimes however 1 of them total screw up. There you must either use the legacy CE 2012 or make your statement 'CE 2014 friendly'.
Or, if your statements have a clear identity on your code and this is for example 'mystatementA021' then you can keep your own stats and select CE 2012 or 2014. But that is a longer story.

Related

Last entries in any table of the database

While testing a website by adding records via the UI, I cannot always tell which tables are being updated. I would like a query - in MSSQL and a version for PostgreSQL - which returns the last entry/entries added/modified in the database, without knowing the table, so I can figure out which tables are related to the feature I am looking at.
In this case I cannot provide an example because I cannot tell which table is being updated and how.
If you are just trying to track "what table(s) is this UI writing to" without wanting to use Extended Events or Query Store to see what commands are actually running, and the service hasn't been restarted since the UI did its thing, and nobody else is using the database, you can do something like this:
SELECT TOP (10) -- or some other arbitrary number, or no TOP at all
[Schema] = s.name,
[Table] = t.name,
LastWrite = MAX(ius.last_user_update)
FROM sys.schemas AS s
INNER JOIN sys.objects AS t
ON s.[schema_id] = t.[schema_id]
INNER JOIN sys.dm_db_index_usage_stats AS ius
ON ius.[object_id] = t.[object_id]
GROUP BY s.name, t.name
ORDER BY LastWrite DESC;
But it seems like a narrow use case and can be invalidated by a lot of variables. If you want to know what your UI is doing, look at the code, or use Extended Events to monitor.

What makes count(*) query to run for 30 sec? [duplicate]

This question already has an answer here:
Is COUNT(*) indexed?
(1 answer)
Closed 9 years ago.
I have a MS SQL table with over 250 million rows. Whenever I execute the following query
SELECT COUNT(*) FROM table_name
it takes over 30 seconds to get me the output. Why is it taking so much time? Does this do a count when I query? I'm assuming till date that it stores this information somewhere (probably in the table meta data. I m not sure if table meta even exists).
Also, I would like to know if this query is IO/Processor/Memory intensive?
Thanks
Every time you execute SELECT COUNT(*) from TABLE SQL server actually goes through the table and counts all rows. To get estemated row count on one or more tables you can run the following query which gets stored information and returns in under 1 sec.
SELECT OBJECT_NAME(OBJECT_ID) TableName, st.row_count
FROM sys.dm_db_partition_stats st
WHERE index_id < 2
ORDER BY st.row_count DESC
Read more about it here http://technet.microsoft.com/en-us/library/ms187737.aspx
No, sql server dosen't store this information. It computes it every query. But it can cache execution plan to emprove perfomace. So, if you want to get results quickly, you need a primary key at least.
As for what SQL server is doing and how expensive it is, you can look this up yourself. In SSMS enable the execution plan button for the query and run a select count(*). You will see that the server actually does an index scan (full table scan). (I would have expected the PK to be used for that, but in my test case it used some other non-clustered index.).
To get a feel for the cost right-click your query editor window, select Query Options... -> Execution -> Advanced and activate the check boxes for SET STATISTICS TIME and SET STATISTICS IO. The messages tab will contain information regarding IO and timing after you re-executed the select statement.
Also note that a select count(*) is quite aggressive in terms of shared locks it uses. To guarantee the result the whole table will be locked with a shared lock.
A very fast, lock-free alternative is to use the meta data for the table. The count you get from the meta-data is almost always accurate, but there is no guarantee.
USE <database_name,,>
GO
SELECT ddps.row_count
FROM sys.indexes AS i
INNER JOIN sys.objects AS o
ON i.object_id = o.object_id
AND o.name = '<your_table,,>'
INNER JOIN sys.dm_db_partition_stats AS ddps
ON i.object_id = ddps.object_id
AND i.index_id = ddps.index_id
WHERE i.index_id = 1
This is a SSMS template. Copy this into a query window and hit CTRL+SHIFT+M to get a dialog that asks you for values for database_name and table_name.
If you are looking for approximation counts on tables and your version is greater than or equal to SQL Server 2005, you can simply use:
SELECT t.NAME AS 'TableName'
,s.Name AS 'TableSchema'
,p.rows AS 'RowCounts'
FROM sys.tables t
INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id
INNER JOIN sys.indexes i
ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p
ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
WHERE
t.is_ms_shipped = 0
GROUP BY
t.NAME, s.Name, p.Rows
ORDER BY
s.Name, t.Name
Doing a count(*) would only consume a small amount of memory/processor. It isn't that big of an operation in terms of database functions.

SQL Server Count is slow

Counting tables with large amount of data may be very slow, sometimes it takes minutes; it also may generate deadlock on a busy server. I want to display real values, NOLOCK is not an option.
The servers I use is SQL Server 2005 or 2008 Standard or Enterprise - if it matters.
I can imagine that SQL Server maintains the counts for every table and if there is no WHERE clause I could get that number pretty quickly, right?
For example:
SELECT COUNT(*) FROM myTable
should immediately return with the correct value. Do I need to rely on statistics to be updated?
Very close approximate (ignoring any in-flight transactions) would be:
SELECT SUM(p.rows) FROM sys.partitions AS p
INNER JOIN sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
WHERE t.name = N'myTable'
AND s.name = N'dbo'
AND p.index_id IN (0,1);
This will return much, much quicker than COUNT(*), and if your table is changing quickly enough, it's not really any less accurate - if your table has changed between when you started your COUNT (and locks were taken) and when it was returned (when locks were released and all the waiting write transactions were now allowed to write to the table), is it that much more valuable? I don't think so.
If you have some subset of the table you want to count (say, WHERE some_column IS NULL), you could create a filtered index on that column, and structure the where clause one way or the other, depending on whether it was the exception or the rule (so create the filtered index on the smaller set). So one of these two indexes:
CREATE INDEX IAmTheException ON dbo.table(some_column)
WHERE some_column IS NULL;
CREATE INDEX IAmTheRule ON dbo.table(some_column)
WHERE some_column IS NOT NULL;
Then you could get the count in a similar way using:
SELECT SUM(p.rows) FROM sys.partitions AS p
INNER JOIN sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
INNER JOIN sys.indexes AS i
ON p.index_id = i.index_id
WHERE t.name = N'myTable'
AND s.name = N'dbo'
AND i.name = N'IAmTheException' -- or N'IAmTheRule'
AND p.index_id IN (0,1);
And if you want to know the opposite, you just subtract from the first query above.
(How large is "large amount of data"? - should have commented this first, but maybe the exec below helps you out already)
If I run a query on a static (means no one else is annoying with read/write/updates in quite a while so contention is not an issue) table with 200 million rows and COUNT(*) in 15 seconds on my dev machine (oracle).
Considering the pure amount of data, this is still quite fast (at least to me)
As you said NOLOCK is not an option, you could consider
exec sp_spaceused 'myTable'
as well.
But this pins down nearly to the same as NOLOCK (ignoring contention + delete/update afaik)
I've been working with SSMS for well over a decade and only in the past year found out that it can give you this information quickly and easily, thanks to this answer.
Select the "Tables" folder from the database tree (Object Explorer)
Press F7 or select View > Object Explorer Details to open Object Explorer Details view
In this view you can right-click on the column header to select the columns you want to see including table space used, index space used and row count:
Note that the support for this in Azure SQL databases seems a bit spotty at best - my guess is that the queries from SSMS are timing out, so it only returns a handful of tables each refresh, however the highlighted one always seems to be returned.
Count will do either a table scan or an index scan. So for a high number of rows it will be slow. If you do this operation frequently, the best way is to keep the count record in another table.
If however you do not want to do that, you can create a dummy index (that will not be used by your query's) and query it's number of items, something like:
select
row_count
from sys.dm_db_partition_stats as p
inner join sys.indexes as i
on p.index_id = i.index_id
and p.object_id = i.object_id
where i.name = 'your index'
I am suggesting creating a new index, because this one (if it will not be used) will not get locked during other operations.
As Aaron Bertrand said, maintaining the query might be more costly then using an already existing one. So the choice is yours.
If you just need a rough count of number of rows, ie. to make sure a table loaded properly or to make sure the data was not deleted, do the following:
MySQL> connect information_schema;
MySQL> select table_name,table_rows from tables;

What is syncobj in SQL Server

When I run this script to search particular text in sys.columns and I get a lot of "dbo.syncobj_0x3934443438443332" like rows.
SELECT c.name, s.name + '.' + o.name
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id
INNER JOIN sys.schemas s ON o.schema_id=s.schema_id
WHERE c.name LIKE '%text%'
If I get it right, they are replication objects. Is it so? Can i just throw them away from my query just like o.name NOT LIKE '%syncobj%' or there's another way?
Thank you.
I've found a solution. Doesn't know, if it's the best one or not.
SELECT c.name, s.name + '.' + o.name
FROM sys.columns c
INNER JOIN sys.objects o ON c.object_id=o.object_id
INNER JOIN sys.schemas s ON o.schema_id=s.schema_id
WHERE c.name LIKE '%text%' AND o.type = 'U'
The result is fine now. As I said syncobj's are replication objects and they don't have a meaning for us. They're used for replication purposes only.
http://www.developmentnow.com/g/114_2007_12_0_0_443938/syncobj-views.htm
EDIT:
Forgot to add, syncobj's are stored in DB as Views, so if you need list of views, you'll probably need to ignore them as I did in my question.
While checking difference between syncobj's and my views, the only difference is is_ms_shipped column. For syncobj it's 1, for others 0. It means that syncobj views are created by system.
P.S. I'll wait for some time and if nobody gives another answer, I'll accept mine.
When you create a replication that does not include all the fields or other meta data changes from the original table. If you do a generate script from a publication it will show you how it is created (see below). The view provide a object to generate the bcp extracts during the initial snapshots.
Here is an example
-- Adding the article synchronization object exec sp_articleview #publication = N'publication_data', #article = N'tablename',
#view_name = N'syncobj_0x4239373642443436', #filter_clause = N'',
#force_invalidate_snapshot = 1, #force_reinit_subscription = 1 GO
P.S. I recently had a problem when the I dropped replication, it failed to drop these and then you have to manually drop the system views to reuse a replication script. Giving a error message
Msg 2714, Level 16, State 3: There is already an object named
'syncobj_0x3437324238353830' in the database.
Which caused the bcp to fail during the snapshot.

Easy way to find out how many rows in total are stored within SQL Server Database?

I'm looking for easy way to count all rows within one SQL Server 2005/2008 database (skipping the system tables of course)? I know i could use
SELECT COUNT (COLUMN) FROM TABLE
and do it for each table and then add it up but would prefer some automated way?
Is there one?
SELECT SUM(row_count)
FROM sys.dm_db_partition_stats
WHERE index_id IN (0,1)
AND OBJECTPROPERTY([object_id], 'IsMsShipped') = 0;
This will be accurate except for, potentially, any rows that are being added or removed within a transaction at the time you run the query. And it won't have the expense of hitting individual tables.
But as I mentioned in another comment, I'm not sure how this helps you determine "how much data" your database holds. How many rows, sure, but if I have 10 glasses, each half full of water, and you have 5 glasses, each completely full, which of us has more water?
This was my answer to a similar question today:
SQL Server 2005 or later gives quite a useful report showing table sizes - including row counts etc. It's in Standard Reports - and it is Disc Usage by Table.
Programmatically, there's a nice solution at: http://www.sqlservercentral.com/articles/T-SQL/67624/
Try:
SELECT
[TableName] = so.name,
[RowCount] = MAX(si.rows)
FROM
sysobjects AS so,
sysindexes AS si
WHERE
so.xtype = 'U'
AND
si.id = OBJECT_ID(so.name)
GROUP BY
so.name
ORDER BY
2 DESC
This is the indexed rows. This is probably only an approximation, as databases change a lot and some stuff might not be indexed, but this will be fast.
EDIT: Note that so.xtype is user types, making the assumption you do not want the system stuff and only "real" data stuff.
EDIT2: no flames note: probably a bad idea to query on the sysobjects table :).
EDIT3: to specifically address requirement, and no associative joins :)
SELECT sum(mycount) from
(SELECT
MAX(si.rows) AS mycount
FROM
sysobjects AS so
join sysindexes AS si on si.id = OBJECT_ID(so.name)
WHERE
so.xtype = 'U'
GROUP BY
so.name
) as mylist
We know that sp_spaceused, when passed a table name, will return a row count, so we can examine what it does - it queries sys.dm_db_partition_stats - and copy it to get this:
SELECT
SUM(ddps.row_count) TotalRows
FROM
sys.indexes i
INNER JOIN sys.objects o ON i.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.dm_db_partition_stats ddps ON
o.OBJECT_ID = ddps.OBJECT_ID
AND i.index_id = ddps.index_id
WHERE
i.index_id < 2
AND o.is_ms_shipped = 0 -- to exclude system tables
Curious requirement though, I have to say...
You could query sysindexes, look at the rowcnt value for the clustered index on each table. But I'm not sure exactly how up to date that is.
Alternatively, something like this (briefly tested on a small test db):
CREATE TABLE #TableRowCount (TableName NVARCHAR(128), RowCnt BIGINT)
INSERT #TableRowCount (TableName, RowCnt)
EXECUTE sp_msforeachtable 'SELECT "?", COUNT(*) FROM ?'
SELECT SUM(RowCnt) AS TotalRowCount FROM #TableRowCount
DROP TABLE #TableRowCount
Check out the undocumented stored procedure sp_MSForEachTable. Using it you can run a Count(*) on every table in the database. For your specific issue:
EXEC sp_MSforeachtable 'SELECT ''?'', Count(*) as NumberOfRows FROM ?'
I'm not sure if older version of MS SQL has the information_shema SQL standard for data dictionary.
You can do something like:
SELECT SUM(table_rows)
FROM information_schema.tables
WHERE table_schema = 'DATABASENAME'