Trying to create a rolling row limit stored procedure - sql

Platform: SQL Server
Goal: get the current number of rows and subtract from it the amount I want to keep and then delete the remainder. I was planning to turn it into a stored procedure if I could get it working.
For the code I was thinking something like this:
SET N = (EXEC sp_spaceused dbo.Name rows)
SET D = (%N%-30000000)
DELETE TOP (%D%) FROM dbo.Name
I used sp_spaceused to avoid locking the table to input statements, as would be the case with count.

You can use an updatable CTE or derived table to delete. Assuming you wanted to delete random rows from the table, you don't need any further calculations.
Don't use sp_spaceused, get the data from sys.partitions instead
DECLARE #N int = (
SELECT SUM(p.rows)
FROM sys.partitions p
INNER JOIN sys.tables t ON p.[object_id] = t.[object_id]
INNER JOIN sys.schemas s ON s.[schema_id] = t.[schema_id]
WHERE t.name = N'Name'
AND s.name = N'dbo'
AND p.index_id IN (0,1)
);
WITH cte AS (
SELECT TOP (#N - 30000000)
*
FROM dbo.Name
)
DELETE FROM cte;
db<>fiddle

Related

Can't assign an alias to result of a join - SQL

I have a DB with lots of tables without use. I'd like to filter out the tables without any data. So I used a snippet How to fetch the row count for all tables by #ismetAlkan.
However, I want to filter out 0, so used something like this and it doesn't work.
USE [my_db]
GO
SELECT * FROM
(
SELECT SCHEMA_NAME(A.schema_id) + '.' +
A.Name, SUM(B.rows) AS 'RowCount'
FROM sys.objects A
INNER JOIN sys.partitions B ON A.object_id = B.object_id
WHERE A.type = 'U'
GROUP BY A.schema_id, A.Name
) AS Result
where Result.RowCount > 0
GO
Any help appreciated!
There are three problems.
SELECT * FROM
(
SELECT ObjectName = SCHEMA_NAME(obj.[schema_id])
+ '.' + obj.name,
[RowCount] = SUM(p.rows)
FROM sys.objects AS obj
INNER JOIN sys.partitions AS p
ON obj.[object_id] = p.[object_id]
WHERE obj.type = 'U'
GROUP BY obj.[schema_id], obj.name
) AS Result
WHERE Result.[RowCount] > 0;
When you move a query into a derived table, subquery, or CTE, all of the columns need to have names.
'Alias' should be [Alias] since the former makes it look like a string and that form is deprecated in some contexts.
As Larnu pointed out, ROWCOUNT needs to be escaped in all spots, not just one.

How to find size of index in table-valued function

In article about sys.indexes there is a phrase that this view
Contains a row per index or heap of a tabular object, such as a table,
view, or table-valued function.
I was interested to find a size of such an index.
So I created function with index:
create function fIndexSize()
returns #res table
(
object_id int not null
, name varchar(128) not null
, primary key (object_id)
)
as
begin
insert into #res
select object_id, name
from sys.objects
where object_id > 255
return
end
Here we can see the name of new index:
There is also a record in sys.indexes:
Usually I get sizes of indexes using this query:
select
o.schema_id
, o.object_id
, o.name
, o.type_desc
, sum (a.total_pages) * 8.00 / 1024 / 1024 as TotalSpaceGB
from sys.objects o
inner join sys.indexes i on o.object_id = i.object_id
inner join sys.partitions p on i.object_id = p.object_id and i.index_id = p.index_id
inner join sys.allocation_units a on p.partition_id = a.container_id
where (o.name = 'fIndexSize' or i.name like 'PK__fIndexSi%')
group by o.schema_id, o.object_id, o.name, o.type_desc
But this time nothing was returned.
Can anyone give me advice how to find size of such an index?
Yes you can find the size of this index, but you should consider it's living only for a time of a batch and you should look for it in tempdb (as it is table variable):
create function fIndexSize()
returns #res table
(
object_id_xxxx int not null
, name varchar(128) not null
, primary key (object_id_xxxx)
)
as
begin
insert into #res
select object_id, name
from sys.objects
where object_id > 255
return
end;
go
select i.name,
c.name,
8 * SUM(au.used_pages) as size_kb
from tempdb.sys.indexes i
join tempdb.sys.columns c
on i.object_id = c.object_id
join tempdb.sys.partitions as p
on p.object_id = i.object_id and p.index_id = i.index_id
join tempdb.sys.allocation_units as au
on au.container_id = p.partition_id
where c.name = 'object_id_xxxx'
group by i.name,
c.name
I left the column name here only to show that the index found is what we are looking for, and I chose the column name with xxxx for distinguish it well
The result of a table-valued function is not stored in a permanent table in the database. It is generated on the fly during the query execution.
Yes, you have a row in sys.indexes which tells you index properties, like type (clustered or not), is_primary_key, is_unique, etc.
But, there are no corresponding rows in sys.partitions and in sys.allocation_units. That's why your query returns nothing. If you replace inner joins with left joins, you'd see one row with NULL as TotalSpaceGB.
So, documentation is correct. Documentation doesn't say that table-valued functions will have rows in sys.allocation_units.
Each invocation of the function may return different number of rows. This set of rows doesn't exist before the query runs and it doesn't exist after the query finishes.
Even during the function execution sys.partitions and sys.allocation_units are empty for this index (PK__fIndexSi...).
When I looked at the actual execution plan of the query
select * from fIndexSize()
I could see that optimiser creates a temp table behind the scenes. Well, it has to store the rows somewhere and they are stored in TempDB.
So, you should run your select from sys.allocation_units using tempdb.
At first I used SQL Sentry Plan Explorer to see the name of the temporary table:
Then I ran your query against TempDB:

SQL grouping results

I'm trying to get the last time a table was updated by the users:
Declare #Collect Table (Name Varchar(100),last_user_update datetime)
Insert into #Collect
EXEC sp_MSForEachTable 'SELECT ''?'' as TableName,
last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID(''SP3D_DB_RESEARCH_MDB'') AND OBJECT_ID = OBJECT_ID(''?'')'
SELECT * FROM #Collect ORDER BY last_user_update DESC
The problem is that in the results, some tables are appearing 3 times (please see the image bellow)
Since it appears that all tables duplicated have the same last updated time. Is there any way to group the results by the table name?
If the values are indeed the same, you can just add DISTINCT to the query, and have it return unique results
SELECT DISTINCT ''?'' as TableName, last_user_update ...
If you want to group after the fact, and only the last update interests you, you can do
SELECT TableName, max(last_user_update) as last_update
FROM #Collect
GROUP BY TableName
ORDER BY 2 DESC
Tables can have multiple indexes. The dynamic management view sys.dm_db_index_usage_stats will have separate entries for each index.
If you want to see the index name for each one, try this:
SELECT
o.name as TableName,
i.name as IndexName,
istats.last_user_update
from sys.dm_db_index_usage_stats istats
inner join sys.objects o
on o.object_id = istats.object_id
inner join sys.indexes i
on i.index_id = istats.index_id
and i.object_id = istats.object_id
order by
o.name,
i.name
Or, if you don't care about that and just want the last update time, you can group by the table name:
SELECT
o.name as TableName,
max(istats.last_user_update)
from sys.dm_db_index_usage_stats istats
inner join sys.objects o
on o.object_id = istats.object_id
group by
o.name
You can do an insert directly into your table with this query:
declare #Collect table (Name varchar(100),last_user_update datetime)
insert into #Collect
select
o.name as TableName,
istats.last_user_update
from sys.dm_db_index_usage_stats istats
inner join sys.objects o
on o.object_id = istats.object_id
inner join sys.indexes i
on i.index_id = istats.index_id
and i.object_id = istats.object_id
where database_id = db_id('SP3D_DB_RESEARCH_MDB')
Also, I'm not sure what your goal is, but please understand that this view only has entries for indexes that have activity on them. If an index is unused, it is not in this view. The first access creates a row in the view. The real interesting stuff on this view is the seek and scan information.
See this note from MSDN:
When an index is used, a row is added to sys.dm_db_index_usage_stats
if a row does not already exist for the index. When the row is added,
its counters are initially set to zero.
If your goal is to enumerate all the indexes and then show the last update date for all of them, you'll need to join to sys.indexes and then left join to sys.dm_db_index_usage_stats.

How to count rows in table join?

I am developing a SQL sproc and I want to return the number of rows for each table. How could I rewrite this statement so that it will list number of rows from each table below?
SELECT COUNT(*)
FROM [test_setup_details_form_view] [tsdf]
JOIN [test_setup_header_form_view]
ON [test_setup_header_form_view].[test_setup_header_id]
= [tsdf].[test_setup_header_id]
JOIN [test_header_rv] [th] with(nolock)
ON [th].[test_setup_header_id]
= [test_setup_header_form_view].[test_setup_header_id]
JOIN [test_details_answers_expanded_view] [tdae]
ON [tdae].[test_setup_details_id] = [tsdf].[test_setup_details_id]
AND [th].[test_header_id] = [tdae].[test_header_id]
JOIN [event_log_rv] [e]
ON [e].[event_log_id] = [tdae].[event_log_id]
When I execute this statement, it just gives me the total rows after all of the joins.
If you are trying to just get counts for each of these tables irrespective of the joins:
SELECT
OBJECT_SCHEMA_NAME([object_id]),
OBJECT_NAME([object_id]),
c
FROM
(
SELECT [object_id],
c = SUM(row_count)
FROM
sys.dm_db_partition_stats -- no NOLOCK necessary
WHERE
index_id IN (0,1)
AND OBJECT_NAME([object_id]) IN
(
N'test_setup_details_from_view',
N'test_setup_header_from_view',
... etc etc. ...
)
GROUP BY [object_id]
) AS x;
Use count (distinct <columnname>) on a unique column for each table that you need to count.
From each table? Why not use the metadata tables then?
You are trying to do something in code that already exists in the metadata tables:
Select
schema_name(schema_id) + '.' + t.name as TableName
, i.rows
from sys.tables t (nolock)
join sys.sysindexes i (nolock) on t.object_id = i.id
and i.indid < 2

How to determine if a specific set of tables in a database are empty

I have database A which contains a table (CoreTables) that stores a list of active tables within database B that the organization's users are sending data to.
I would like to be able to have a set-based query that can output a list of only those tables within CoreTables that are populated with data.
Dynamically, I normally would do something like:
For each row in CoreTables
Get the table name
If table is empty
Do nothing
Else
Print table name
Is there a way to do this without a cursor or other dynamic methods? Thanks for any assistance...
Probably the most efficient option is:
SELECT c.name
FROM dbo.CoreTables AS c
WHERE EXISTS
(
SELECT 1
FROM sys.partitions
WHERE index_id IN (0,1)
AND rows > 0
AND [object_id] = OBJECT_ID(c.name)
);
Just note that the count in sys.sysindexes, sys.partitions and sys.dm_db_partition_stats are not guaranteed to be completely in sync due to in-flight transactions.
While you could just run this query in the context of the database, you could do this for a different database as follows (again assuming that CoreTables does not include schema in the name):
SELECT c.name
FROM DatabaseA.CoreTables AS c
WHERE EXISTS
(
SELECT 1
FROM DatabaseB.sys.partitions AS p
INNER JOIN DatabaseB.sys.tables AS t
ON p.[object_id] = t.object_id
WHERE t.name = c.name
AND p.rows > 0
);
If you need to do this for multiple databases that all contain the same schema (or at least overlapping schema that you're capturing in aggregate in a central CoreTables table), you might want to construct a view, such as:
CREATE VIEW dbo.CoreTableCounts
AS
SELECT db = 'DatabaseB', t.name, MAX(p.rows)
FROM DatabaseB.sys.partitions AS p
INNER JOIN DatabaseB.sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN DatabaseA.dbo.CoreTables AS ct
ON t.name = ct.name
WHERE p.index_id IN (0,1)
GROUP BY t.name
UNION ALL
SELECT db = 'DatabaseC', t.name, rows = MAX(p.rows)
FROM DatabaseC.sys.partitions AS p
INNER JOIN DatabaseC.sys.tables AS t
ON p.[object_id] = t.[object_id]
INNER JOIN DatabaseA.dbo.CoreTables AS ct
ON t.name = ct.name
WHERE p.index_id IN (0,1)
GROUP BY t.name
-- ...
GO
Now your query isn't going to be quite as efficient, but doesn't need to hard-code database names as object prefixes, instead it can be:
SELECT name
FROM dbo.CoreTableCounts
WHERE db = 'DatabaseB'
AND rows > 0;
If that is painful to execute you could create a view for each database instead.
In SQL Server, you can do something like:
SELECT o.name, st.row_count
FROM sys.dm_db_partition_stats st join
sys.objects o
on st.object_id = o.object_id
WHERE index_id < 2 and st.row_count > 0
By the way, this specifically does not use OBJECT_ID() or OBJECT_NAME() because these are evaluated in the current database. The above code continues to work for another database, using 3-part naming. This version also takes into account multiple partitions:
SELECT o.name, sum(st.row_count)
FROM <dbname>.sys.dm_db_partition_stats st join
<dbname>.sys.objects o
on st.object_id = o.object_id
WHERE index_id < 2
group by o.name
having sum(st.row_count) > 0
something like this?
//
foreach (System.Data.DataTable dt in yourDataSet.Tables)
{
if (dt.Rows.Count != 0) { PrintYourTableName(dt.TableName); }
}
//
This is a way you can do it, that relies on system tables, so be AWARE it may not always work in future versions of SQL. With that strong caveat in mind.
select distinct OBJECT_NAME(id) as tabName,rowcnt
from sys.sysindexes si
join sys.objects so on si.id=si.id
where indid=1 and so.type='U'
You would add to the where clause the tables you are interested in and rowcnt <1