Alternatives to sql cursor - sql

what are the alternatives to using cursors in sql server.
i already know a trick which involves using the Row_Number() function which numbers the rows then i can loop over them one by one. any other ideas?

When I don't want to complicate things with SQL cursors I often populate temporary tables or table variables, then do a while loop to go through them.
For example:
declare #someresults table (
id int,
somevalue varchar(10)
)
insert into #someresults
select
id,
somevalue
from
whatevertable
declare #currentid int
declare #currentvalue varchar(10)
while exists(select 1 from #someresults)
begin
select top 1 #currentid = id, #currentvalue = somevalue from #someresults
--work with those values here
delete from #someresults where id = #currentid
end

Several options:
Best is to re-analyze the problem from a Mathematical Set-based perspective. If this can be done, it will most likely provide the best solution in both calrity and performance.
Second, use a Temporary table variable to store only the keys. Insert the keys into this temp table variable using a recursive Common table expression if possible, or failing that, use a T-SQL programming loop (Where Clause or constructed iterative loop of some kind), and then when the temp table variable has all the key values in it, use it to join to the real tables in the appropriate way to execute whatever your real SQL design goal happens to be... Use only the keys as you recursively or iteratively build the temp table to keep it as narrow as possible during the expensive construction phase...
use a temporary table (on disk) in a similar way to the above. This is a better choice when you need this temp table variable to contain more than a few columns and/or a very large (> 1M) number of rows, or if you need the temp table to have more than a primary Key index....

Related

Stored procedure using too many selects?

I recently started doing some performance tuning on a client's stored procedures and i bumped into this chunk of code and could'nt find a way to make it work more efficiently.
declare #StationListCount int;
select #StationListCount = count(*) from #StationList;
declare #FleetsCnt int;
select #FleetsCnt=COUNT(*) from #FleetIds;
declare #StationCnt int;
select #StationCnt=COUNT(*) from #StationIds;
declare #VehiclesCnt int;
select #VehiclesCnt=COUNT(*) from #VehicleIds;
declare #TrIds table(VehicleId bigint,TrId bigint,InRange bit);
insert into #TrIds(VehicleId,TrId,InRange)
select t.VehicleID,t.FuelTransactionId,1
from dbo.FuelTransaction t
join dbo.Fleet f on f.FleetID = t.FleetID and f.CompanyID=#ActorCompanyID
where t.TransactionTime>=#From and (#To is null or t.TransactionTime<#To)
and (#StationListCount=0 or exists (select id fRom #StationList where t.FuelStationID = ID))
and (#FleetsCnt=0 or exists (select ID from #FleetIds where ID = t.FleetID))
and (#StationCnt=0 or exists (select ID from #StationIds where ID = t.FuelStationID))
and (#VehiclesCnt=0 or exists (select ID from #VehicleIds where ID = t.VehicleID))
and t.VehicleID is not null
the insert command slows the whole procedure and takes 99% of the resources.
I am not sure but i think these nested loops are referring to the queries inside the where clause
I would very much appreciate the help i can get on this.
Thank you!
There are couple of things that you actually should go over and see the performance differences. First of all, as the previous answer suggest you should omit the count(*)-like aggragates as much as possible. If the table is so big, the cost of these functions exponentially increase. You can even think of storing those counts in a seperate table with proper index constraints.
I also suggest you to split the select statement into multiple statements because when you use so many NULL checks, or, and conditions in combinations; your indexes may be bypassed so that your query cost increases a lot. Sometimes, using UNIONs may provide far better performance than using such conditions.
Actually, you should try all these and see what fits your needs
hope it helps.
Insert is using only 1 table for vehicle Id so joining other tables doesn't requires.
I don't see the declaration of the #table variables, but (assuming the IDs in them are unique) consider communicating this information to the optimizer, IOW add primary key constraints to them.
Also, add the option(recompile) to the end of the query.

Difference between CTE, Temp Table and Table Variable in MSSQL

All are used to store data temporarily.
Are there any performance difference (time complexity and space complexity) for these 3 types of temporary table?
Performance issue should depend on whether the result is saved on disk or memory.
I have searched a lot but did not get satisfactory answer.
CTE - Common Table Expressions
CTE stands for Common Table expressions. It was introduced with SQL Server 2005. It is a temporary result set and typically it may be a result of complex sub-query. Unlike temporary table its life is limited to the current query. It is defined by using WITH statement. CTE improves readability and ease in maintenance of complex queries and sub-queries. Always begin CTE with semicolon.
With CTE1(Address, Name, Age)--Column names for CTE, which are optional
AS
(
SELECT Addr.Address, Emp.Name, Emp.Age from Address Addr
INNER JOIN EMP Emp ON Emp.EID = Addr.EID
)
SELECT * FROM CTE1 --Using CTE
WHERE CTE1.Age > 50
ORDER BY CTE1.NAME
When to use CTE?
This is used to store result of a complex sub query for further use.
This is also used to create a recursive query.
Temporary Tables
In SQL Server, temporary tables are created at run-time and you can do all the operations which you can do on a normal table. These tables are created inside Tempdb database. Based on the scope and behavior temporary tables are of two types as given below-
Local Temp Table
Local temp tables are only available to the SQL Server session or connection (means single user) that created the tables. These are automatically deleted when the session that created the tables has been closed. Local temporary table name is stared with single hash ("#") sign.
CREATE TABLE #LocalTemp
(
UserID int,
Name varchar(50),
Address varchar(150)
)
GO
insert into #LocalTemp values ( 1, 'Shailendra','Noida');
GO
Select * from #LocalTemp
The scope of Local temp table exist to the current session of current user means to the current query window. If you will close the current query window or open a new query window and will try to find above created temp table, it will give you the error.
Global Temp Table
Global temp tables are available to all SQL Server sessions or connections (means all the user). These can be created by any SQL Server connection user and these are automatically deleted when all the SQL Server connections have been closed. Global temporary table name is stared with double hash ("##") sign.
CREATE TABLE ##GlobalTemp
(
UserID int,
Name varchar(50),
Address varchar(150)
)
GO
insert into ##GlobalTemp values ( 1, 'Shailendra','Noida');
GO
Select * from ##GlobalTemp
Global temporary tables are visible to all SQL Server connections while Local temporary tables are visible to only current SQL Server connection.
Table Variable
This acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory. This also allows you to create primary key, identity at the time of Table variable declaration but not non-clustered index.
GO
DECLARE #TProduct TABLE
(
SNo INT IDENTITY(1,1),
ProductID INT,
Qty INT
)
--Insert data to Table variable #Product
INSERT INTO #TProduct(ProductID,Qty)
SELECT DISTINCT ProductID, Qty FROM ProductsSales ORDER BY ProductID ASC
--Select data
Select * from #TProduct
--Next batch
GO
Select * from #TProduct --gives error in next batch
Note
Temp Tables are physically created in the Tempdb database. These tables act as the normal table and also can have constraints, index like normal tables.
CTE is a named temporary result set which is used to manipulate the complex sub-queries data. This exists for the scope of statement. This is created in memory rather than Tempdb database. You cannot create any index on CTE.
Table Variable acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory.
Fairly broad topic to cover all the ins and outs. Here are a few high level differences which would give you more ideas for researching this.
CTEs are part of the same query and should be thought of as being very similar to a sub-query. A CTE allows for better readability and code-reuse (same CTE can be reused in different parts of the overall query).
Table variables and Temporary tables should be thought of as being similar real tables but with optimizations that enable SQL server to make operations against them fast especially when used with relatively small data sets. Note that although these operate against the tempdb, that doesn't automatically mean data stored here is actually persisted to disk. With each new version of SQL server, there have been additional optimizations (memory-optimized tables for example) to make these constructs faster, especially for their mainline use case of simplifying complex queries.
See this for more information on this topic:
https://www.brentozar.com/archive/2014/06/temp-tables-table-variables-memory-optimized-table-variables/

Poor performance of SQL query with Table Variable or User Defined Type

I have a SELECT query on a view, that contains 500.000+ rows. Let's keep it simple:
SELECT * FROM dbo.Document WHERE MemberID = 578310
The query runs fast, ~0s
Let's rewrite it to work with the set of values, which reflects my needs more:
SELECT * FROM dbo.Document WHERE MemberID IN (578310)
This is same fast, ~0s
But now, the set is of IDs needs to be variable; let's define it as:
DECLARE #AuthorizedMembers TABLE
(
MemberID BIGINT NOT NULL PRIMARY KEY, --primary key
UNIQUE NONCLUSTERED (MemberID) -- and index, as if it could help...
);
INSERT INTO #AuthorizedMembers SELECT 578310
The set contains the same, one value but is a table variable now. The performance of such query drops to 2s, and in more complicated ones go as high as 25s and more, while with a fixed id it stays around ~0s.
SELECT *
FROM dbo.Document
WHERE MemberID IN (SELECT MemberID FROM #AuthorizedMembers)
is the same bad as:
SELECT *
FROM dbo.Document
WHERE EXISTS (SELECT MemberID
FROM #AuthorizedMembers
WHERE [#AuthorizedMembers].MemberID = Document.MemberID)
or as bad as this:
SELECT *
FROM dbo.Document
INNER JOIN #AuthorizedMembers AS AM ON AM.MemberID = Document.MemberID
The performance is same for all the above and always much worse than the one with a fixed value.
The dynamic SQL comes with help easily, so creating an nvarchar like (id1,id2,id3) and building a fixed query with it keeps my query times ~0s. But I would like to avoid using Dynamic SQL as much as possible and if I do, I would like to keep it always the same string, regardless the values (using parameters - which above method does not allow).
Any ideas how to get the performance of the table variable similar to a fixed array of values or avoid building a different dynamic SQL code for each run?
P.S. I have tried the above with a user defined type with same results
Edit:
The results with a temporary table, defined as:
CREATE TABLE #AuthorizedMembers
(
MemberID BIGINT NOT NULL PRIMARY KEY
);
INSERT INTO #AuthorizedMembers SELECT 578310
have improved the execution time up to 3 times. (13s -> 4s). Which is still significantly higher than dynamic SQL <1s.
Your options:
Use a temporary table instead of a TABLE variable
If you insist on using a TABLE variable, add OPTION(RECOMPILE) at the end of your query
Explanation:
When the compiler compiles your statement, the TABLE variable has no rows in it and therefore doesn't have the proper cardinalities. This results in an inefficient execution plan. OPTION(RECOMPILE) forces the statement to be recompiled when it is run. At that point the TABLE variable has rows in it and the compiler has better cardinalities to produce an execution plan.
The general rule of thumb is to use temporary tables when operating on large datasets and table variables for small datasets with frequent updates. Personally I only very rarely use TABLE variables because they generally perform poorly.
I can recommend this answer on the question "What's the difference between temporary tables and table variables in SQL Server?" if you want an in-depth analysis on the differences.

Why use table valued function instead of a temp table in SQL?

I am trying to speed up my monster of a stored procedure that works on millions of records across many tables.
I've stumbled on this:
Is it possible to use a Stored Procedure as a subquery in SQL Server 2008?
My question is why using a table valued function be better then using a temp table.
Suppose my stored procedure #SP1
declare #temp table(a int)
insert into #temp
select a from BigTable
where someRecords like 'blue%'
update AnotherBigTable
set someRecords = 'were blue'
from AnotherBigTable t
inner join
#temp
on t.RecordID = #temp.a
After reading the above link it seems that the consunsus is instead of using my #temp as temp table, rather create a table valued function that will do that select.
(and inline it if its a simple select like I have in this example) But my actual selects are multiple and often not simple (ie with subqueires, etc)
What is the benefit?
Thanks
Generally, you would use a temporary table (#) instead of a table variable. Table variables are really only useful for
functions, which cannot create temporary objects
passing table-valued data (sets) as read-only parameters
gaming statistics for certain query edge-cases
execution plan stability (related to statistics and also the fact that INSERT INTO table variables cannot use a parallel plan)
prior to SQL Server 2012, #temp tables inherit collation from the tempdb whereas #table variables uses the current database collation
Other than those, a #temporary table will work as well as if not better than a variable.
Further reading: What's the difference between a temp table and table variable in SQL Server?
Probably no longer relevant... but two things I might suggest that take two different approaches.
Simple approach 1:
Try a primary key on your table valued variable:
declare #temp table(a int, primary key(a))
Simple approach 2:
In this particular case try a common table expression (CTE)...
;with
temp as (
SELECT a as Id
FROM BigTable
WHERE someRecords like '%blue'
),
UPDATE AnotherBigTable
SET someRecords = 'were Blue'
FROM AnotherBigTable
JOIN temp
ON temp.Id = AnotherBigTable.RecordId
CTE's are really great and help to isolate specific data sets you actually want to work on from the myriad of records contained in larger tables... and if you find your self utilizing the same CTE declaration repeatedly consider formalizing that expression into a view. Views are an often overlooked and very valuable tool for DBA and DB programmers to manage large complex data sets with lots of records and relationships.

Is it possible to add index to a temp table? And what's the difference between create #t and declare #t

I need to do a very complex query.
At one point, this query must have a join to a view that cannot be indexed unfortunately.
This view is also a complex view joining big tables.
View's output can be simplified as this:
PID (int), Kind (int), Date (date), D1,D2..DN
where PID and Date and Kind fields are not unique (there may be more than one row having same combination of pid,kind,date), but are those that will be used in join like this
left join ComplexView mkcs on mkcs.PID=q4.PersonID and mkcs.Date=q4.date and mkcs.Kind=1
left join ComplexView mkcl on mkcl.PID=q4.PersonID and mkcl.Date=q4.date and mkcl.Kind=2
left join ComplexView mkco on mkco.PID=q4.PersonID and mkco.Date=q4.date and mkco.Kind=3
Now, if I just do it like this, execution of the query takes significant time because the complex view is ran three times I assume, and out of its huge amount of rows only some are actually used (like, out of 40000 only 2000 are used)
What i did is declare #temptable, and insert into #temptable select * from ComplexView where Date... - one time per query I select only the rows I am going to use from my ComplexView, and then I am joining this #temptable.
This reduced execution time significantly.
However, I noticed, that if I make a table in my database, and add a clustered index on PID,Kind,Date (non-unique clustered) and take data from this table, then doing delete * from this table and insert into this table from complex view takes some seconds (3 or 4), and then using this table in my query (left joining it three times) take down query time to half, from 1 minute to 30 seconds!
So, my question is, first of all - is it possible to create indexes on declared #temptables.
And then - I've seen people talk about "create #temptable" syntax. Maybe this is what i need? Where can I read about what's the difference between declare #temptable and create #temptable? What shall I use for a query like mine? (this query is for MS Reporting Services report, if it matters).
#tablename is a physical table, stored in tempdb that the server will drop automatically when the connection that created it is closed, #tablename is a table stored in memory & lives for the lifetime of the batch/procedure that created it, just like a local variable.
You can only add a (non PK) index to a #temp table.
create table #blah (fld int)
create nonclustered index idx on #blah (fld)
It's not a complete answer but #table will create a temporary table that you need to drop or it will persist in your database. #table is a table variable that will not persist longer than your script.
Also, I think this post will answer the other part of your question.
Creating an index on a table variable
Yes, you can create indexes on temp tables or table variables. http://sqlserverplanet.com/sql/create-index-on-table-variable/
The #tableName syntax is a table variable. They are rather limited. The syntax is described in the documentation for DECLARE #local_variable. You can kind of have indexes on table variables, but only indirectly by specifying PRIMARY KEY and UNIQUE constraints on columns. So, if your data in the columns that you need an index on happens to be unique, you can do this. See this answer. This may be “enough” for many use cases, but only for small numbers of rows. If you don’t have indexes on your table variable, the optimizer will generally treat table variables as if they contain one row (regardless of how many rows there actually are) which can result in terrible query plans if you have hundreds or thousands of rows in them instead.
The #tableName syntax is a locally-scoped temporary table. You can create these either using SELECT…INTO #tableName or CREATE TABLE #tableName syntax. The scope of these tables is a little bit more complex than that of variables. If you have CREATE TABLE #tableName in a stored procedure, all references to #tableName in that stored procedure will refer to that table. If you simply reference #tableName in the stored procedure (without creating it), it will look into the caller’s scope. So you can create #tableName in one procedure, call another procedure, and in that other procedure read/update #tableName. However, once the procedure that created #tableName runs to completion, that table will be automatically unreferenced and cleaned up by SQL Server. So, there is no reason to manually clean up these tables unless if you have a procedure which is meant to loop/run indefinitely or for long periods of time.
You can define complex indexes on temporary tables, just as if they are permanent tables, for the most part. So if you need to index columns but have duplicate values which prevents you from using UNIQUE, this is the way to go. You do not even have to worry about name collisions on indexes. If you run something like CREATE INDEX my_index ON #tableName(MyColumn) in multiple sessions which have each created their own table called #tableName, SQL Server will do some magic so that the reuse of the global-looking identifier my_index does not explode.
Additionally, temporary tables will automatically build statistics, etc., like normal tables. The query optimizer will recognize that temporary tables can have more than just 1 row in them, which can in itself result in great performance gains over table variables. Of course, this also is a tiny amount of overhead. Though this overhead is likely worth it and not noticeable if your query’s runtime is longer than one second.
To extend Alex K.'s answer, you can create the PRIMARY KEY on a temp table
IF OBJECT_ID('tempdb..#tempTable') IS NOT NULL
DROP TABLE #tempTable
CREATE TABLE #tempTable
(
Id INT PRIMARY KEY
,Value NVARCHAR(128)
)
INSERT INTO #tempTable
VALUES
(1, 'first value')
,(3, 'second value')
-- will cause Violation of PRIMARY KEY constraint 'PK__#tempTab__3214EC071AE8C88D'. Cannot insert duplicate key in object 'dbo.#tempTable'. The duplicate key value is (1).
--,(1, 'first value one more time')
SELECT * FROM #tempTable