calculating the size of sql server database - sql

I need to find the size of a sql server 2008 database. I used the following stored procedure
EXEC sp_spaceused
If gave me back the following data, database name, database size, unallocated space, reserved, data, index_size, unused
Is there any way I can get the size of the database, excluding certain tables?
I am able to get the reserved size of each database table using this query
DECLARE #LOW int
SELECT #LOW = LOW
FROM [master].[dbo].[spt_values] (NOLOCK)
WHERE [number] = 1 AND [type] = 'E'
SELECT TableName,[Row Count],[Size (KB)] FROM
(
SELECT QUOTENAME(USER_NAME(o.uid)) + '.' +
QUOTENAME(OBJECT_NAME(i.id))
AS TableName
,SUM(i.rowcnt) [Row Count]
,CONVERT(numeric(15,2),
(((CONVERT(numeric(15,2),SUM(i.reserved)) * #LOW) / 1024))) AS [Size (KB)]
FROM sysindexes i (NOLOCK)
INNER JOIN sysobjects o (NOLOCK)
ON i.id = o.id AND
((o.type IN ('U', 'S')) OR o.type = 'U')
WHERE indid IN (0, 1, 255)
GROUP BY
QUOTENAME(USER_NAME(o.uid)) + '.' +
QUOTENAME(OBJECT_NAME(i.id))
HAVING SUM(i.rowcnt) > 0
) AS Z
ORDER BY [Size (KB)] DESC
But this confuses me slightly, as this only gives the reserved size per table. What is the reserved size? If I sum the reserved size for each database table, it does not add up to the database size.

There is a lot more taking up space in a database than just tables. Also keep in mind that the size of a table/database is an ever changing thing. Rows are added, removed, logs are built to keep track of what was done so it can undo or redo if necessary. As this space is used and released it doesn't typically get released back to the file system as SQL knows it will likely need it again. SQL will release space for it's own future usage but according to the file system that space is still being used by the database.

Please stop using backward compatibility views like sysindexes / sysobjects.
Something like this might be better, though indexes/tables alone do not account for everything in a database.
SELECT Size_In_MB = SUM(reserved_page_count)*8/1024.0
FROM sys.dm_db_partition_stats
WHERE OBJECT_NAME([object_id]) NOT IN (N'table1', N'table2');
Also, why are you ignoring non-clustered indexes? Do you think they don't contribute to size? You can add a similar filter here, but I'm not sure why you would:
AND index_id IN (0,1,255);

Related

Table in which RowCount is stored in Sybase ASE

Is there any system table in Sybase ASE which stores rowcount of all the user tables? I would like to avoid count(*). I know that we get rowcount when we use sp_help. So thought that it must be stored in any of system tables.
I haven't use this feature in Sybase, but Sybase is quite similar to SQL Server. Perusing through the documentation, it would seem that the field is in systabstats.rowcnt. This would result in a query something like this:
select o.name, s.rowcnt
from systabstats s join
sysobjects o
on s.id = o.id
where s.indid = 0;
I would imagine that this column is an approximation, and might be off in a high transaction environment.
You should not use the systabstats.rowcnt column: this information depends n whether the statistics have been updated. It is much better to use the row_count() function in a query against sysindexes. Unlike systabstats, this information is maintained automatically. Only when insert/delete activity is happening on the table can the result returned by this function temporarily be off by a small numbers of rows.
As for MSSQL Server still being similar to Sybase ASE: that's true on the outside. Microsoft has made many changes to the internals, and as a result the two databases are very different under the covers. Things like statistics and storage (both which we're discussing above) fall in this category.
If you are running SybaseASE pre 15.0 then
SELECT o.name,
ROWCNT(i.doampg) as ROW_COUNT
FROM sysobjects o,
sysindexes i
WHERE o.id = i.id
AND o.sysstat2 & 1024 = 0 -- not remote
AND o.sysstat2 & 2048 = 0 -- not proxy
AND (i.indid = 0 OR i.indid = 1) -- Heap or ClustIdx only
--AND ROWCNT(i.doampg) > 1000 -- only need for tables having more than 1000 rows
AND o.type = 'U' -- exclude system tables
ORDER BY o.name
If you are running sybase 15.0 and up then you can use ROW_COUNT() function ;
SELECT name,
ROW_COUNT(DB_ID(), id)
FROM sysobjects
WHERE type = "U"
AND sysstat2 & 1024 = 0 -- not remote
AND sysstat2 & 2048 = 0 -- not proxy
ORDER BY name
You could use systabstats.rowcnt or row_count (i.doampg) from sysindexes (ase15)

SQL Server Count is slow

Counting tables with large amount of data may be very slow, sometimes it takes minutes; it also may generate deadlock on a busy server. I want to display real values, NOLOCK is not an option.
The servers I use is SQL Server 2005 or 2008 Standard or Enterprise - if it matters.
I can imagine that SQL Server maintains the counts for every table and if there is no WHERE clause I could get that number pretty quickly, right?
For example:
SELECT COUNT(*) FROM myTable
should immediately return with the correct value. Do I need to rely on statistics to be updated?
Very close approximate (ignoring any in-flight transactions) would be:
SELECT SUM(p.rows) FROM sys.partitions AS p
INNER JOIN sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
WHERE t.name = N'myTable'
AND s.name = N'dbo'
AND p.index_id IN (0,1);
This will return much, much quicker than COUNT(*), and if your table is changing quickly enough, it's not really any less accurate - if your table has changed between when you started your COUNT (and locks were taken) and when it was returned (when locks were released and all the waiting write transactions were now allowed to write to the table), is it that much more valuable? I don't think so.
If you have some subset of the table you want to count (say, WHERE some_column IS NULL), you could create a filtered index on that column, and structure the where clause one way or the other, depending on whether it was the exception or the rule (so create the filtered index on the smaller set). So one of these two indexes:
CREATE INDEX IAmTheException ON dbo.table(some_column)
WHERE some_column IS NULL;
CREATE INDEX IAmTheRule ON dbo.table(some_column)
WHERE some_column IS NOT NULL;
Then you could get the count in a similar way using:
SELECT SUM(p.rows) FROM sys.partitions AS p
INNER JOIN sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas AS s
ON s.[schema_id] = t.[schema_id]
INNER JOIN sys.indexes AS i
ON p.index_id = i.index_id
WHERE t.name = N'myTable'
AND s.name = N'dbo'
AND i.name = N'IAmTheException' -- or N'IAmTheRule'
AND p.index_id IN (0,1);
And if you want to know the opposite, you just subtract from the first query above.
(How large is "large amount of data"? - should have commented this first, but maybe the exec below helps you out already)
If I run a query on a static (means no one else is annoying with read/write/updates in quite a while so contention is not an issue) table with 200 million rows and COUNT(*) in 15 seconds on my dev machine (oracle).
Considering the pure amount of data, this is still quite fast (at least to me)
As you said NOLOCK is not an option, you could consider
exec sp_spaceused 'myTable'
as well.
But this pins down nearly to the same as NOLOCK (ignoring contention + delete/update afaik)
I've been working with SSMS for well over a decade and only in the past year found out that it can give you this information quickly and easily, thanks to this answer.
Select the "Tables" folder from the database tree (Object Explorer)
Press F7 or select View > Object Explorer Details to open Object Explorer Details view
In this view you can right-click on the column header to select the columns you want to see including table space used, index space used and row count:
Note that the support for this in Azure SQL databases seems a bit spotty at best - my guess is that the queries from SSMS are timing out, so it only returns a handful of tables each refresh, however the highlighted one always seems to be returned.
Count will do either a table scan or an index scan. So for a high number of rows it will be slow. If you do this operation frequently, the best way is to keep the count record in another table.
If however you do not want to do that, you can create a dummy index (that will not be used by your query's) and query it's number of items, something like:
select
row_count
from sys.dm_db_partition_stats as p
inner join sys.indexes as i
on p.index_id = i.index_id
and p.object_id = i.object_id
where i.name = 'your index'
I am suggesting creating a new index, because this one (if it will not be used) will not get locked during other operations.
As Aaron Bertrand said, maintaining the query might be more costly then using an already existing one. So the choice is yours.
If you just need a rough count of number of rows, ie. to make sure a table loaded properly or to make sure the data was not deleted, do the following:
MySQL> connect information_schema;
MySQL> select table_name,table_rows from tables;

SQL Server, Unused tables for a long time (How to get rid of unnecessary tables)?

Is there a way to find the unused tables which are nothing else but rubbish in database?
The only way I can think of working out if a table is being used is to use sys.dm_db_index_usage_stats. The caveat with this is that it only records the usage of the tables since the SQL Service was last started.
So bearing that in mind, you can use the following query:
SELECT DISTINCT
OBJECT_SCHEMA_NAME(t.[object_id]) AS 'Schema'
, OBJECT_NAME(t.[object_id]) AS 'Table/View Name'
, CASE WHEN rw.last_read > 0 THEN rw.last_read END AS 'Last Read'
, rw.last_write AS 'Last Write'
, t.[object_id]
FROM sys.tables AS t
LEFT JOIN sys.dm_db_index_usage_stats AS us
ON us.[object_id] = t.[object_id]
AND us.database_id = DB_ID()
LEFT JOIN
( SELECT MAX(up.last_user_read) AS 'last_read'
, MAX(up.last_user_update) AS 'last_write'
, up.[object_id]
FROM (SELECT last_user_seek
, last_user_scan
, last_user_lookup
, [object_id]
, database_id
, last_user_update, COALESCE(last_user_seek, last_user_scan, last_user_lookup,0) AS null_indicator
FROM sys.dm_db_index_usage_stats) AS sus
UNPIVOT(last_user_read FOR read_date IN(last_user_seek, last_user_scan, last_user_lookup, null_indicator)) AS up
WHERE database_id = DB_ID()
GROUP BY up.[object_id]
) AS rw
ON rw.[object_id] = us.[object_id]
ORDER BY [Last Read]
, [Last Write]
, [Table/View Name];
;
If you use a source control, see the latest database script. Its the easiest way.
I think you might find the database statistics would the most profitable place to look.
it should be able to tell you which tables are read from most, and which ones are updated most.
If you find tables which are neither read from nor written to, they're probably not much used.
I'm not sure what database statistics are available in SQL Svr 2000 though.
However, rather than simply looking at which tables are not much used, wouldn't a better approach be to examine what each table holds and what it if for, so you gain a proper understanding of the design? In this case you would then be able to properly judge what is necessary and what is not.
It is a concern that you don't know what source control is though (It's a way of managing changing to files - usually sorce code - so you can keep track of who changed what, when and why.) Anything larger than a one-man project (and even some one-man projects) should use it.
You can use sp_depends to confirm any depedencies for the suspect tables.
Here is an example:
CREATE TABLE Test (ColA INT)
GO
CREATE PROCEDURE usp_Test AS
BEGIN
SELECT * FROM Test
END
GO
CREATE FUNCTION udf_Test()
RETURNS INT
AS
BEGIN
DECLARE #t INT
SELECT TOP 1 #t = ColA FROM Test
RETURN #t
END
GO
EXEC sp_depends 'Test'
/** Results **/
In the current database, the specified object is referenced by the following:
name type
----- ----------------
dbo.udf_Test scalar function
dbo.usp_Test stored procedure
This approach has some caveats. Also this won't help with tables that are accessed directly from an application or other software (i.e. Excel, Access, etc.).
To be completely thorough, I would recommend using SQL Profiler in order to monitor your database and see if and when these tables are referenced.

Limiting SQL Temp DB Growth

I am facing a serious issue in my production server where the temp DB grow exponantialy. Is there any way we can recover the tempDB space without restarting the SQL service?
Cheers
Kannan.
I would ignore posts advising you to change the recovery model or limit the size of tempDB(!).
You need to track down the actual cause of the growth.
If you have the default trace turned on (it's on by default, out of the box), you can retrospectively find out what caused the growth by running this:
--check if default trace is enabled
if exists (select 1 from sys.configurations where configuration_id = 1568)
BEGIN
declare #defaultTraceFilepath nvarchar(256)
--get the current trace rollover file
select #defaultTraceFilepath = CONVERT(varchar(256), value) from ::fn_trace_getinfo(0)
where property = 2
SELECT ntusername,loginname, objectname, e.category_id, textdata, starttime,spid,hostname, eventclass,databasename, e.name
FROM ::fn_trace_gettable(#defaultTraceFilepath,0)
inner join sys.trace_events e
on eventclass = trace_event_id
INNER JOIN sys.trace_categories AS cat
ON e.category_id = cat.category_id
where
databasename = 'tempDB' and
cat.category_id = 2 and --database category
e.trace_event_id in (92,93) --db file growth
END
Otherwise, you can start a SQL Profiler trace to capture these events. Turn on capturing of Auto Growth events, Sort Warnings and Join Warnings and look for cross joins, hash joins or missing join conditions.
SQL Server exposes a way to identify tempDB space allocations by currently executing queries, using DMVs:
-- This DMV query shows currently executing tasks and tempdb space usage
-- Once you have isolated the task(s) that are generating lots
-- of internal object allocations,
-- you can find out which TSQL statement and its query plan
-- for detailed analysis
select top 10
t1.session_id,
t1.request_id,
t1.task_alloc,
t1.task_dealloc,
(SELECT SUBSTRING(text, t2.statement_start_offset/2 + 1,
(CASE WHEN statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max),text)) * 2
ELSE statement_end_offset
END - t2.statement_start_offset)/2)
FROM sys.dm_exec_sql_text(sql_handle)) AS query_text,
(SELECT query_plan from sys.dm_exec_query_plan(t2.plan_handle)) as query_plan
from (Select session_id, request_id,
sum(internal_objects_alloc_page_count + user_objects_alloc_page_count) as task_alloc,
sum (internal_objects_dealloc_page_count + user_objects_dealloc_page_count) as task_dealloc
from sys.dm_db_task_space_usage
group by session_id, request_id) as t1,
sys.dm_exec_requests as t2
where t1.session_id = t2.session_id and
(t1.request_id = t2.request_id) and
t1.session_id > 50
order by t1.task_alloc DESC
(Ref.)
You can use DBCC SHRINKFILE to shrink the tempdb files and recover some space.
DBCC SHRINKFILE ('tempdev', 1)
DBCC SHRINKFILE ('templog', 1)
The filenames can be found in the sysfiles table.
You still need to discover the root cause, but this can give you some breathing room until you do. The amount of space you recover will depend on usage and other factors.
Also:
How to shrink the tempdb database in SQL Server
http://support.microsoft.com/kb/307487
In SIMPLE mode, the tempdb database's log is constantly being truncated, and it can never be backed up. So check it is in Simple Mode

Easy way to find out how many rows in total are stored within SQL Server Database?

I'm looking for easy way to count all rows within one SQL Server 2005/2008 database (skipping the system tables of course)? I know i could use
SELECT COUNT (COLUMN) FROM TABLE
and do it for each table and then add it up but would prefer some automated way?
Is there one?
SELECT SUM(row_count)
FROM sys.dm_db_partition_stats
WHERE index_id IN (0,1)
AND OBJECTPROPERTY([object_id], 'IsMsShipped') = 0;
This will be accurate except for, potentially, any rows that are being added or removed within a transaction at the time you run the query. And it won't have the expense of hitting individual tables.
But as I mentioned in another comment, I'm not sure how this helps you determine "how much data" your database holds. How many rows, sure, but if I have 10 glasses, each half full of water, and you have 5 glasses, each completely full, which of us has more water?
This was my answer to a similar question today:
SQL Server 2005 or later gives quite a useful report showing table sizes - including row counts etc. It's in Standard Reports - and it is Disc Usage by Table.
Programmatically, there's a nice solution at: http://www.sqlservercentral.com/articles/T-SQL/67624/
Try:
SELECT
[TableName] = so.name,
[RowCount] = MAX(si.rows)
FROM
sysobjects AS so,
sysindexes AS si
WHERE
so.xtype = 'U'
AND
si.id = OBJECT_ID(so.name)
GROUP BY
so.name
ORDER BY
2 DESC
This is the indexed rows. This is probably only an approximation, as databases change a lot and some stuff might not be indexed, but this will be fast.
EDIT: Note that so.xtype is user types, making the assumption you do not want the system stuff and only "real" data stuff.
EDIT2: no flames note: probably a bad idea to query on the sysobjects table :).
EDIT3: to specifically address requirement, and no associative joins :)
SELECT sum(mycount) from
(SELECT
MAX(si.rows) AS mycount
FROM
sysobjects AS so
join sysindexes AS si on si.id = OBJECT_ID(so.name)
WHERE
so.xtype = 'U'
GROUP BY
so.name
) as mylist
We know that sp_spaceused, when passed a table name, will return a row count, so we can examine what it does - it queries sys.dm_db_partition_stats - and copy it to get this:
SELECT
SUM(ddps.row_count) TotalRows
FROM
sys.indexes i
INNER JOIN sys.objects o ON i.OBJECT_ID = o.OBJECT_ID
INNER JOIN sys.dm_db_partition_stats ddps ON
o.OBJECT_ID = ddps.OBJECT_ID
AND i.index_id = ddps.index_id
WHERE
i.index_id < 2
AND o.is_ms_shipped = 0 -- to exclude system tables
Curious requirement though, I have to say...
You could query sysindexes, look at the rowcnt value for the clustered index on each table. But I'm not sure exactly how up to date that is.
Alternatively, something like this (briefly tested on a small test db):
CREATE TABLE #TableRowCount (TableName NVARCHAR(128), RowCnt BIGINT)
INSERT #TableRowCount (TableName, RowCnt)
EXECUTE sp_msforeachtable 'SELECT "?", COUNT(*) FROM ?'
SELECT SUM(RowCnt) AS TotalRowCount FROM #TableRowCount
DROP TABLE #TableRowCount
Check out the undocumented stored procedure sp_MSForEachTable. Using it you can run a Count(*) on every table in the database. For your specific issue:
EXEC sp_MSforeachtable 'SELECT ''?'', Count(*) as NumberOfRows FROM ?'
I'm not sure if older version of MS SQL has the information_shema SQL standard for data dictionary.
You can do something like:
SELECT SUM(table_rows)
FROM information_schema.tables
WHERE table_schema = 'DATABASENAME'