Track number of rows in a #table which the population is in progress - sql

I am working in SQL Server 2012 Management studio.
In a SQL query window, an insert into a #table is happening. It is expected to insert somewhere around 80 million rows with 3 INT columns each.
The query execution is going on.
Is there a way that I can track the no of rows in the #table?

Since you cannot run two queries in the same window simultaneously and temp tables are not accessible in other sessions if they are declared with a single #, you should try defining it with a double # in your insert query.
Then you could try querying it using WITH(NOLOCK).
Open a new query window on the same db and try
SELECT COUNT(*)
FROM ##YourTableName WITH(NOLOCK)
This will get dirty reads, but i do not think it would be a problem in your case as you would like a rough measure on where your INSERT is.

One method is to query the DMVs using the temp table object id. You can get the local temp table object id from the session that created it using this query:
SELECT OBJECT_ID(N'tempdb..#table', 'U');
Then run the script below in another windows, supplying the object_id value from the above query (-1180342868 in this example):
DECLARE #object_id int = -1180342868;
SELECT SUM(rows)
FROM tempdb.sys.partitions
WHERE
object_id = #object_id
AND index_id IN(0,1);
Of course, this method assumes you had the foresight to get the temp table object id before running the insert. If the query is currently running, you could run the script below and make an educated guess as to which object might be the temp table being loaded.
USE tempdb;
SELECT OBJECT_NAME(object_id), SUM(rows)
FROM tempdb.sys.partitions
WHERE
index_id IN(0,1)
AND OBJECTPROPERTYEX(object_id, 'IsUserTable') = 1
GROUP BY
OBJECT_NAME(object_id);
Be aware that this might not be a reliable way to track the load progress. Much depends on the query plan particulars. It could be that the costly operators are earlier in the plan and the actual insert won't occur until the last minute.

If you wish to run the query to count rows in another window or outside the scope where the table was declared, please use a global temp table.
For Example,
CREATE TABLE ##table(
a int,
b int,
c int)
And the in another window you can run, this will work
SELECT COUNT(*) FROM ##table WITH (NOLOCK)

Related

Nested Loop in Where Statement killing performance

I am having serious performance issues when using a nested loop in a WHERE clause.
When I run the below code as is, it takes several minutes. The trick is I'm using the WHERE clause to pull ALL data if the report_id is NULL, but only certain report_id's if I set them in the parameter string.
The function [fn_Parse_List] turns a VARCHAR string such as '123,456,789' into a table where each row is each number in integer form, which is then used in the IN clause.
When I run the code below with report_id = '456' (the dashed out portion), the code takes seconds, but passing the temporary table and using the SELECT statement in the WHERE clause kills it.
alter procedure dbo.p_revenue
(#report_id varchar(max) = NULL)
as
select cast(value as int) Report_ID
into #report_ID_Temp
from [fn_Parse_List] (#report_id)
SELECT *
FROM BIGTABLE
where #report_id is null
or a.report_id in (select Report_ID from #report_ID_Temp)
--Where #report_id is null or a.report_id in (456)
exec p_revenue #report_id = '456'
Is there a way to optimize this? I tried a JOIN with the table #report_ID_Temp, but it still takes just as long and doesn't work when the report_id is NULL.
You're breaking three different rules.
If you want two query plans, you need two queries: OR does not give you two query plans. IF does.
If you have a temporary table, make sure it has a primary key and any appropriate indexes. In your case, you need an ALTER TABLE statement to add the primary key clustered index. Or you can CREATE TABLE to declare the structure in the first place.
If you think fn_Parse_List is a good idea, you haven't read enough Sommarskog
If I were to write the Stored Procedure for your case, I would use a Table Valued Parameter (TVP) instead of passing multiple values as a comma-seperated string.
Something like the following:
-- Create a type for the TVP
CREATE TYPE REPORT_IDS_PAR AS TABLE(
report_id INT
);
GO
-- Use the TVP type instead of VARCHAR
CREATE PROCEDURE dbo.revenue
#report_ids REPORT_IDS_PAR READONLY
AS
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS(SELECT 1 FROM #report_ids)
SELECT
*
FROM
BIGTABLE;
ELSE
SELECT
*
FROM
#report_ids AS ids
INNER JOIN BIGTABLE AS bt ON
bt.report_id=ids.report_id;
-- OPTION(RECOMPILE) -- see remark below
END
GO
-- Execute the Stored Procedure
DECLARE #ids REPORT_IDS_PAR;
-- Empty table for all rows:
EXEC dbo.revenue #ids;
-- Specific report_id's for specific rows:
INSERT INTO #ids(report_id)VALUES(123),(456),(789);
EXEC dbo.revenue #ids;
GO
If you run this procedure with a TVP with a lot of rows or a wildly varying number of rows, I suggest you add the option OPTION(RECOMPILE) to the query.
I see 2 possible things that could help improve performance. Depends on which part is taking the longest. First off, SELECT INTO is a single threaded operation until SQL Server 2014. If this is taking a long time, create an explicitly defined temp table with CREATE TABLE. Secondly, depending on the number of records inserted into the temp table, you probably need an index on the Report_ID column. That can all be done in the body of the stored procedure. If you do end up using an explicitly defined temp table, I would create the index after the data is loaded.
If that doesn't help, first check that the report_id column on the BIGTABLE is indexed. Then try splitting the select into 2 and combining with a UNION ALL like this:
ALTER PROCEDURE dbo.p_revenue
(
#report_id VARCHAR(MAX) = NULL
)
AS
SELECT CAST(value AS INT) Report_ID
INTO #report_ID_Temp
FROM fn_Parse_List(#report_id);
SELECT *
FROM BIGTABLE
WHERE #report_id IS NULL
UNION ALL
SELECT *
FROM BIGTABLE
WHERE a.report_id IN ( SELECT Report_ID
FROM #report_ID_Temp );
GO
EXEC p_revenue #report_id = '456';
Are you saying I should have two queries, one where it pulls if the report_id doesn't exists and one where there is a list of report_ids?
Yes, yes, yes. The fact, that it somehow works when You enter the numbers directly, distracts You from the core problem. You need table scan when #report_id is null and index seek when it is not and You can not have both in one execution plan. The performance would inevitably have to suffer, one way or another.
I would prefer not to, as the table i'm pulling from is actually a
view with 800 lines with an additional parameter not shown above.
I do not see where is the problem, SELECT * FROM BIGTABLE and SELECT * FROM BIGVIEW seems the same. If You need parameters You can use inline table valued function. If You have more parameters with variable selectivity like #report_id, I guess You would end up with dynamic sql anyway, sooner or later.
UNION ALL as proposed by #db_brad would help, but one of those subquery is executed even when there is no need for it.
As a quick patch You can append OPTION(RECOMPILE) to the SELECT and have table scan one time and index seek the other time, but recompiling every time would induce nontrivial overhead.

Unable to access one of the table in SQL Server database

I have a production database which has 200 tables. Since last week I am unable to access one of the tables. When I just select top 100 rows it keeps on running.
How can I find out why the table is not accessible? How can I find if there is any lock on the table? All the other tables are running fine
From what I understood, you are not able to get any results when you query it.
There may be a lot of reasons for that.
1) It could be locked.
To do a dirty read, try querying with NOLOCK hint.
SELECT Column1 FROM TableName WITH (NOLOCK)
To check if there are locks on the table, use the script below:
declare #a table (
spid int,
[dbid] int,
objid int,
indid int,
[type] varchar(10),
resource varchar(100),
mode varchar(2),
[status] varchar(20)
);
insert into #a
exec sp_lock
select object_name(objid) tablename, * from #a where object_name(objid) = 'TableName'
2) Queries might be slow when statistics are outdated. Try updating them.
UPDATE STATISTICS dbo.TableName;
3) The TOP operator itself. The top operator basically takes the entire set of data and sorts it and gives you the first 100. You can add query hints to get some data before it is sorted.
SELECT TOP 10 Column1 FROM TableName (OPTION FAST(1))
--Have avoided doing a `SELECT * FROM....`
SELECT 1 FROM TableName (OPTION FAST(1))
--Without `TOP`
Check Permissions on the table. Right click on the table and select properties. Click on the permissions tab and make sure you have access to that table.

SQL Server Agent Job running stored proc VERY slowly

I have a stored procedure that essentially rebuilds a pivot table. It builds the new data in a temp table, then truncates the permanent table and inserts the new data, and finally drops the temp table.
When I execute the stored proc (or the T-SQL code directly) in Management Studio, it takes about a minute. While I know this isn't the most efficient of processes, I'm OK with it.
My problem comes in when I try to schedule this task to run every 20 minutes or so. When I setup a SQL Server Agent Job to execute the stored proc, its now taking almost an hour and a half... that's right, 90 TIMES SLOWER!
I found this post: SQL Server Agent Job Running Slow, which seems to be a similar issue, but set nocount on doesn't seem to have any effect whether I call it at the beginning of the stored proc or before the exec command in the SQL Agent Job. My query doesn't use any cursors, though I am doing a cross apply on a table valued function (which also doesn't use any cursors).
I'm obviously missing something, but I don't even know where to start on this. I thought by creating the stored proc I would have avoided these types of issues.
For reference, the stored proc looks something like the following:
create table #temp
(
ID int,
data1 float,
data2 float
)
insert into #temp(ID, data1, data2)
select t.ID, d.data1, d.data2
from tbl1 t
cross apply dbo.getInterestingData(t.ID, t.param1) d
where d.useMe = 1
truncate table dataPivot
insert into dataPivot(ID, data1, data2)
select ID, data1, data2
from #temp
drop table #temp

C# SqlParameter - provide SQL (Microsoft SQL)

I am currently tasked with a project on a database whose schema cannot be changed. I need to insert a new row into a table that requires an ID to be unique, but the original creators of the structure did not set this value to autoincrement. To go around this, I have been using code akin to:
(SELECT TOP 1 [ID] from [Table] ORDER BY [ID] DESC) + 1
when giving the value of the ID field, basically having an inner query of sorts. Problem is that a few lines down, I need that ID I just inputted. If I could set a SQLParameter to output for this column, I could get the value it was set to, problem is I'm using SQL, and not a hard value like I do with other SQLParameters. Can't I use SQL in place of just a value?
This is a potential high volume exchange, so I'd rather not do 2 different queries (one to get id, then one to insert).
You say you cannot change the schema, but can you add an additional table to the project that does an autoincrement column? Then you could use that table to (safely) create your new IDs and return them to your code.
This is similar to how Oracle does IDs, and sometimes vendor applications for sql server that also run on Oracle will use that approach just to help minimize the differences between the two databases.
Update:
Ah, I just spotted your comment to the other answer here. In that case, the only other thing I can think that might work is to put your two statements (insert a new ID, and then read back the new ID) inside a transaction with the SERIALIZABLE isolation level. And that just kinda sucks, because it leaves you open to performance and locking gotchas.
Is it possible for you to create a stored procedure in the database to do this and the return value of the stored procedure will then return the ID that you need?
I'm a bit confused about where you need to use this ID. If it inside of the same stored proc just use this method:
DECLARE #NewId int
SELECT TOP 1 #NewId = [ID] + 1 from [Table] ORDER BY [ID] DESC
SELECT #NewId
You can put more than one SQL statement in a single SqlCommand. So you could easily do something along the lines of what Abe suggested:
DECLARE #NewId int
SELECT TOP 1 #NewId = [ID] + 1 from [Table] ORDER BY [ID] DESC
INSERT INTO [Table] (ID, ...) VALUES (#NewId, ...)
SELECT #NewId
Then you just call ExecuteScalar on your SqlCommand, and it will do the INSERT and then return the ID it used.

Using with vs declare a temporary table: performance / difference?

I have created a sql function in SQLServer 2008 that declared a temporary table and uses it to compute a moving average on the values inside
declare #tempTable table
(
GeogType nvarchar(5),
GeogValue nvarchar(7),
dtAdmission date,
timeInterval int,
fromTime nvarchar(5),
toTime nvarchar(5),
EDSyndromeID tinyint,
nVisits int
)
insert #tempTable select * from aces.dbo.fEDVisitCounts(#geogType, #hospID,DATEADD(DD,-#windowDays + 1,#fromDate),
#toDate,#minAge,#maxAge,#gender,#nIntervalsPerDay, #nSyndromeID)
INSERT #table (dtAdmission,EDSyndromeID, MovingAvg)
SELECT list.dtadmission
, #nSyndromeID
, AVG(data.nVisits) as MovingAvg
from #tempTable as list
inner join #tempTable as data
ON list.dtAdmission between data.dtAdmission and DATEADD(DD,#windowDays - 1,data.dtAdmission)
where list.dtAdmission >= #fromDate
GROUP BY list.dtAdmission
but I also found out that you can declare the tempTable like this:
with tempTable as
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
)
Question: Is there a major difference in these two approaches? Is one faster than the other or more common / standard? I would think the declare is faster since you define what the columns you are looking for are.. Would it also be even faster if I were to omit the columns that were not used in the calculations of moving average?(not sure about this one since it has to get all of the rows anyways, though selecting less columns makes intuitive sense that it would be faster/less to do)
I also have found a create temporary table #table from here How to declare Internal table in MySQL? but I don't want the table to persist outside of the function (I am not sure if the create temporary table does this or not.)
The #table syntax creates a table variable (an actual table in tempdb) and materialises the results to it.
The WITH syntax defines a Common Table Expression which is not materialised and is just an inline View.
Most of the time you would be better off using the second option. You mention that this is inside a function. If this is a TVF then most of the time you want these to be inline rather than multi statement so they can be expanded out by the optimiser - this would instantly disallow the use of table variables.
Sometimes however (say the underlying query is expensive and you want to avoid it being executed multiple times) you might determine that materializing the intermediate results improves performance in some specific cases. There is currently no way of forcing this for CTEs (without forcing a plan guide at least)
In that eventuality you (in general) have 3 options. A #tablevariable, #localtemp table and a ##globaltemp table. However only the first of these is permitted for use inside a function.
For further information regarding the differences between table variables and #temp tables see here.
In addition to what Martin answered
;with tempTable as
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
)
SELECT * FROM tempTable
can also be written like this
SELECT * FROM
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
) AS tempTable --now you can join here with other tables
In addition,and correcting to Martin
The #table syntax creates a table variable IN MEMORY
The #Temp syntax creates a table variable in Tempdb
Thats why #tables are faster than #temp tables