How to find size of index in table-valued function - sql

In article about sys.indexes there is a phrase that this view
Contains a row per index or heap of a tabular object, such as a table,
view, or table-valued function.
I was interested to find a size of such an index.
So I created function with index:
create function fIndexSize()
returns #res table
(
object_id int not null
, name varchar(128) not null
, primary key (object_id)
)
as
begin
insert into #res
select object_id, name
from sys.objects
where object_id > 255
return
end
Here we can see the name of new index:
There is also a record in sys.indexes:
Usually I get sizes of indexes using this query:
select
o.schema_id
, o.object_id
, o.name
, o.type_desc
, sum (a.total_pages) * 8.00 / 1024 / 1024 as TotalSpaceGB
from sys.objects o
inner join sys.indexes i on o.object_id = i.object_id
inner join sys.partitions p on i.object_id = p.object_id and i.index_id = p.index_id
inner join sys.allocation_units a on p.partition_id = a.container_id
where (o.name = 'fIndexSize' or i.name like 'PK__fIndexSi%')
group by o.schema_id, o.object_id, o.name, o.type_desc
But this time nothing was returned.
Can anyone give me advice how to find size of such an index?

Yes you can find the size of this index, but you should consider it's living only for a time of a batch and you should look for it in tempdb (as it is table variable):
create function fIndexSize()
returns #res table
(
object_id_xxxx int not null
, name varchar(128) not null
, primary key (object_id_xxxx)
)
as
begin
insert into #res
select object_id, name
from sys.objects
where object_id > 255
return
end;
go
select i.name,
c.name,
8 * SUM(au.used_pages) as size_kb
from tempdb.sys.indexes i
join tempdb.sys.columns c
on i.object_id = c.object_id
join tempdb.sys.partitions as p
on p.object_id = i.object_id and p.index_id = i.index_id
join tempdb.sys.allocation_units as au
on au.container_id = p.partition_id
where c.name = 'object_id_xxxx'
group by i.name,
c.name
I left the column name here only to show that the index found is what we are looking for, and I chose the column name with xxxx for distinguish it well

The result of a table-valued function is not stored in a permanent table in the database. It is generated on the fly during the query execution.
Yes, you have a row in sys.indexes which tells you index properties, like type (clustered or not), is_primary_key, is_unique, etc.
But, there are no corresponding rows in sys.partitions and in sys.allocation_units. That's why your query returns nothing. If you replace inner joins with left joins, you'd see one row with NULL as TotalSpaceGB.
So, documentation is correct. Documentation doesn't say that table-valued functions will have rows in sys.allocation_units.
Each invocation of the function may return different number of rows. This set of rows doesn't exist before the query runs and it doesn't exist after the query finishes.
Even during the function execution sys.partitions and sys.allocation_units are empty for this index (PK__fIndexSi...).
When I looked at the actual execution plan of the query
select * from fIndexSize()
I could see that optimiser creates a temp table behind the scenes. Well, it has to store the rows somewhere and they are stored in TempDB.
So, you should run your select from sys.allocation_units using tempdb.
At first I used SQL Sentry Plan Explorer to see the name of the temporary table:
Then I ran your query against TempDB:

Related

Trying to create a rolling row limit stored procedure

Platform: SQL Server
Goal: get the current number of rows and subtract from it the amount I want to keep and then delete the remainder. I was planning to turn it into a stored procedure if I could get it working.
For the code I was thinking something like this:
SET N = (EXEC sp_spaceused dbo.Name rows)
SET D = (%N%-30000000)
DELETE TOP (%D%) FROM dbo.Name
I used sp_spaceused to avoid locking the table to input statements, as would be the case with count.
You can use an updatable CTE or derived table to delete. Assuming you wanted to delete random rows from the table, you don't need any further calculations.
Don't use sp_spaceused, get the data from sys.partitions instead
DECLARE #N int = (
SELECT SUM(p.rows)
FROM sys.partitions p
INNER JOIN sys.tables t ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas s ON s.[schema_id] = t.[schema_id]
WHERE t.name = N'Name'
AND s.name = N'dbo'
AND p.index_id IN (0,1)
);
WITH cte AS (
SELECT TOP (#N - 30000000)
*
FROM dbo.Name
)
DELETE FROM cte;
db<>fiddle

SQL Server - Get approximate size of table

In production, issuing a SELECT COUNT can be a bad idea - it can be a performance hit depending on your database engine. In Oracle, if I want to get an idea of the size of a table without having to resort to a COUNT, I can do the following:
SELECT
table_name,
num_rows,
last_analyzed
FROM all_tables
WHERE table_name = 'MY_TABLE_NAME';
This will retrieve Oracle's table analyses if they're enabled. While the count isn't exact, it can give me an idea of how large a table is in case I need to query it (and the last_analyzed column lets me know how old that approximation is).
How can I do something similar in SQL Server? (Related - is this necessary for SQL Server? Oracle has to count row-by-row, hence the avoidance.)
Thanks!
You can use the management studio also
Right Click on table -> Properties -> Storage
or you can use the query like this:
sp_spaceused 'TableName'
To get it for all the tables you can use it like tihs:
CREATE TABLE #tmp
(
tableName varchar(100),
numberofRows varchar(100),
reservedSize varchar(50),
dataSize varchar(50),
indexSize varchar(50),
unusedSize varchar(50)
)
insert #tmp
EXEC sp_MSforeachtable #cmd="EXEC sp_spaceused '?'"
select * from #tmp
You can call sp_spaceused 'table_name'.
If you want to do this for all tables, wrap it inside sp_MSforeachtable:
sp_MSforeachtable 'sp_spaceused ''[?]'''
Calling sp_spaceused without any parameter will give you the database size.
Courtesy: #marc_s
SELECT
s.Name AS SchemaName,
t.NAME AS TableName,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM
sys.tables t
INNER JOIN
sys.schemas s ON s.schema_id = t.schema_id
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
WHERE
t.NAME NOT LIKE 'dt%' -- filter out system tables for diagramming
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
GROUP BY
t.Name, s.Name, p.Rows
ORDER BY
s.Name, t.Name
If absolute accuracy isn't vital this is a very quick route to an approximate row count; if your SQL Server is a supported version it should have this dmv.
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('MY_TABLE_NAME')
AND (index_id=0 or index_id=1);

SQL grouping results

I'm trying to get the last time a table was updated by the users:
Declare #Collect Table (Name Varchar(100),last_user_update datetime)
Insert into #Collect
EXEC sp_MSForEachTable 'SELECT ''?'' as TableName,
last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID(''SP3D_DB_RESEARCH_MDB'') AND OBJECT_ID = OBJECT_ID(''?'')'
SELECT * FROM #Collect ORDER BY last_user_update DESC
The problem is that in the results, some tables are appearing 3 times (please see the image bellow)
Since it appears that all tables duplicated have the same last updated time. Is there any way to group the results by the table name?
If the values are indeed the same, you can just add DISTINCT to the query, and have it return unique results
SELECT DISTINCT ''?'' as TableName, last_user_update ...
If you want to group after the fact, and only the last update interests you, you can do
SELECT TableName, max(last_user_update) as last_update
FROM #Collect
GROUP BY TableName
ORDER BY 2 DESC
Tables can have multiple indexes. The dynamic management view sys.dm_db_index_usage_stats will have separate entries for each index.
If you want to see the index name for each one, try this:
SELECT
o.name as TableName,
i.name as IndexName,
istats.last_user_update
from sys.dm_db_index_usage_stats istats
inner join sys.objects o
on o.object_id = istats.object_id
inner join sys.indexes i
on i.index_id = istats.index_id
and i.object_id = istats.object_id
order by
o.name,
i.name
Or, if you don't care about that and just want the last update time, you can group by the table name:
SELECT
o.name as TableName,
max(istats.last_user_update)
from sys.dm_db_index_usage_stats istats
inner join sys.objects o
on o.object_id = istats.object_id
group by
o.name
You can do an insert directly into your table with this query:
declare #Collect table (Name varchar(100),last_user_update datetime)
insert into #Collect
select
o.name as TableName,
istats.last_user_update
from sys.dm_db_index_usage_stats istats
inner join sys.objects o
on o.object_id = istats.object_id
inner join sys.indexes i
on i.index_id = istats.index_id
and i.object_id = istats.object_id
where database_id = db_id('SP3D_DB_RESEARCH_MDB')
Also, I'm not sure what your goal is, but please understand that this view only has entries for indexes that have activity on them. If an index is unused, it is not in this view. The first access creates a row in the view. The real interesting stuff on this view is the seek and scan information.
See this note from MSDN:
When an index is used, a row is added to sys.dm_db_index_usage_stats
if a row does not already exist for the index. When the row is added,
its counters are initially set to zero.
If your goal is to enumerate all the indexes and then show the last update date for all of them, you'll need to join to sys.indexes and then left join to sys.dm_db_index_usage_stats.

How to determine if a specific set of tables in a database are empty

I have database A which contains a table (CoreTables) that stores a list of active tables within database B that the organization's users are sending data to.
I would like to be able to have a set-based query that can output a list of only those tables within CoreTables that are populated with data.
Dynamically, I normally would do something like:
For each row in CoreTables
Get the table name
If table is empty
Do nothing
Else
Print table name
Is there a way to do this without a cursor or other dynamic methods? Thanks for any assistance...
Probably the most efficient option is:
SELECT c.name
FROM dbo.CoreTables AS c
WHERE EXISTS
(
SELECT 1
FROM sys.partitions
WHERE index_id IN (0,1)
AND rows > 0
AND [object_id] = OBJECT_ID(c.name)
);
Just note that the count in sys.sysindexes, sys.partitions and sys.dm_db_partition_stats are not guaranteed to be completely in sync due to in-flight transactions.
While you could just run this query in the context of the database, you could do this for a different database as follows (again assuming that CoreTables does not include schema in the name):
SELECT c.name
FROM DatabaseA.CoreTables AS c
WHERE EXISTS
(
SELECT 1
FROM DatabaseB.sys.partitions AS p
INNER JOIN DatabaseB.sys.tables AS t
ON p.[object_id] = t.object_id
WHERE t.name = c.name
AND p.rows > 0
);
If you need to do this for multiple databases that all contain the same schema (or at least overlapping schema that you're capturing in aggregate in a central CoreTables table), you might want to construct a view, such as:
CREATE VIEW dbo.CoreTableCounts
AS
SELECT db = 'DatabaseB', t.name, MAX(p.rows)
FROM DatabaseB.sys.partitions AS p
INNER JOIN DatabaseB.sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN DatabaseA.dbo.CoreTables AS ct
ON t.name = ct.name
WHERE p.index_id IN (0,1)
GROUP BY t.name
UNION ALL
SELECT db = 'DatabaseC', t.name, rows = MAX(p.rows)
FROM DatabaseC.sys.partitions AS p
INNER JOIN DatabaseC.sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN DatabaseA.dbo.CoreTables AS ct
ON t.name = ct.name
WHERE p.index_id IN (0,1)
GROUP BY t.name
-- ...
GO
Now your query isn't going to be quite as efficient, but doesn't need to hard-code database names as object prefixes, instead it can be:
SELECT name
FROM dbo.CoreTableCounts
WHERE db = 'DatabaseB'
AND rows > 0;
If that is painful to execute you could create a view for each database instead.
In SQL Server, you can do something like:
SELECT o.name, st.row_count
FROM sys.dm_db_partition_stats st join
sys.objects o
on st.object_id = o.object_id
WHERE index_id < 2 and st.row_count > 0
By the way, this specifically does not use OBJECT_ID() or OBJECT_NAME() because these are evaluated in the current database. The above code continues to work for another database, using 3-part naming. This version also takes into account multiple partitions:
SELECT o.name, sum(st.row_count)
FROM <dbname>.sys.dm_db_partition_stats st join
<dbname>.sys.objects o
on st.object_id = o.object_id
WHERE index_id < 2
group by o.name
having sum(st.row_count) > 0
something like this?
//
foreach (System.Data.DataTable dt in yourDataSet.Tables)
{
if (dt.Rows.Count != 0) { PrintYourTableName(dt.TableName); }
}
//
This is a way you can do it, that relies on system tables, so be AWARE it may not always work in future versions of SQL. With that strong caveat in mind.
select distinct OBJECT_NAME(id) as tabName,rowcnt
from sys.sysindexes si
join sys.objects so on si.id=si.id
where indid=1 and so.type='U'
You would add to the where clause the tables you are interested in and rowcnt <1

Where are seed_value and increment_value for IDENTITY columns?

I'm collecting metadata using the sys.* views, and according to the documentation, the sys.identity_columns view will return the seed and increment values like so.
CREATE TABLE ident_test (
test_id int IDENTITY(1000,10),
other int
)
SELECT name, seed_value, increment_value
FROM sys.identity_columns
WHERE object_id = OBJECT_ID( 'ident_test' )
However, the above query just returns one column. Is it just me?
(Note: I've had to change this question somewhat from its earlier version.)
Shouldn't you reverse the from and join, like this:
SELECT c.name, i.seed_value, i.increment_value
from sys.identity_columns i
join sys.columns c
ON i.object_id = c.object_id
AND i.column_id = c.column_id
You are missing the Where clause. Your query is effectively saying 'Give me all of sys.columns and any matching rows from sys.identity_columns you have (but give me null if there are no matching rows)'.
By adding the Where clause below you'll change it to only return where an exact match is returned, which is the same as an inner join in this instance really.
SELECT
c.name, i.seed_value, i.increment_value
FROM
sys.columns c
LEFT OUTER JOIN sys.identity_columns i
ON i.object_id = c.object_id
AND i.column_id = c.column_id
Where I.seed_value is not null
So I think your data is correct, there are no results to view though.
Are you sure you are running this in a database with tables with IDENTITY columns?
SELECT c.name, i.seed_value, i.increment_value
FROM sys.columns c
INNER JOIN sys.identity_columns i
ON i.object_id = c.object_id
AND i.column_id = c.column_id
Returns rows for me in a regular production database with a few identities.
Using a LEFT JOIN returns these rows as well as many which are not IDENTITY
I ran this on another database, and I noticed some NULLs are returned (even in the INNER JOIN case). This is because some of the columns are in VIEWs.
Try adding:
INNER JOIN sys.tables t
ON t.object_id = c.object_id
To filter only to actual IDENTITY columns in tables.
your query returns what I'd expect [see below]; it returns the single meta-data row about the single identity column (test_ID) in table (ident_test), the oter column (other) has no meta-data in the sys.identity_column as is is not an identity.
SELECT name, seed_value, increment_value
FROM sys.identity_columns
WHERE object_id = OBJECT_ID( 'ident_test' )
select name, is_identity, is_nullable
from sys.columns
WHERE object_id = OBJECT_ID( 'ident_test' )
Which gives
name seed_value increment_value
-----------------------------------------
test_id 1000 10
(1 row(s) affected)
name is_identity is_nullable
-------------------------------------
test_id 1 0
other 0 1
(2 row(s) affected)