Using User Defined Functions and performance? - sql

I'm using stored procedure to fetch data and i needed to filter dynamically. For example if i dont want to fetch some data which's id is 5, 10 or 12 im sending it as string to procedure and im converting it to table via user defined function. But i must consider performance so here is a example:
Solution 1:
SELECT *
FROM Customers
WHERE CustomerID NOT IN (SELECT Value
FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',','));
Solution 2:
CREATE TABLE #tempTable (Value NVARCHAR(4000));
INSERT INTO #tempTable
SELECT Value FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',',')
SELECT *
FROM BusinessAds
WHERE AdID NOT IN (SELECT Value FROM #tempTable)
DROP TABLE #tempTable
Which solution is better for performance?

You would probably be better off creating the #temp table with a clustered index and appropriate datatype
CREATE TABLE #tempTable (Value int primary key);
INSERT INTO #tempTable
SELECT DISTINCT Value
FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',',')
You can also put a clustered index on the table returned by the TVF.
As for which is better SQL Server will always assume that the TVF will return 1 row rather than recompiling after the #temp table is populated, so you would need to consider whether this assumption might cause sub optimal query plans for the case that the list is large.

Related

SQL Server - Select INTO statement stored in sys.tables

I know how to find the CREATE statement for a table in SQL Server but is there any place that stores the actual SQL code if I use SELECT INTO ... to create a table and if so how do I access it?
I see two ways of creating tables with SELECT INTO.
First: You know the Schema, then you can declare a #Table Variable and perform the Select INSERT
Second: You can create a temp table:
SELECT * INTO #TempTable FROM Customer
There are some limitations on the second choice:
- You need to drop the temp table afterwards.
- If there is a VARCHAR Column and the maximum number of characters of that given SELECT is 123 characters (example), and then you try to insert into the TEMP table afterwards with a greater number of characters, it will throw an error.
My recommendation is always declare a table in order to use, it makes it clear what is the intentions and increases readability.

SQL - Split query data stream into 2 separate tables [Theoretical Optimisation]

I am writing some SQL code to be run in MapBasic (MapInfo's Programming language). The best way to describe the question is with an example:
I want to select all records where ShipType="Barge" into a query named Barges and I want all the remaining records to be put in a query OtherShips.
I could simply use the following SQL commands:
select * from ShipsTable where ShipType = "Barge" into Barges
select * from ShipsTable where ShipType <> "Barge" into OtherShips
That's fine and all but I can't help but feel that this is inefficient. Won't SQL be searching through the database twice? Won't it find the rows of data that fit the 2nd Query during the processing of the 1st?
Instead, it would be faster if there was a command like:
select * from ShipsTable where ShipType = "Barge" into Barges ELSE into OtherShips
My question is, can you do this? Is there a command that fits this spec?
Thanks,
You could do this quite easily in SSIS with a conditional split and two different destinations.
But not really in TSQL.
However for "fun" some possibilities are looked at below.
You could create a partitioned view but the requirements that you need to meet for this are quite arduous and the execution plan just loads it all into a spool and then reads the spool twice with two different filters anyway.
CREATE TABLE Barges
(
Id INT,
ShipType VARCHAR(50) NOT NULL CHECK (ShipType = 'Barge'),
PRIMARY KEY (Id, ShipType)
)
CREATE TABLE OtherShips
(
Id INT,
ShipType VARCHAR(50) NOT NULL CHECK (ShipType <> 'Barge'),
PRIMARY KEY (Id, ShipType)
)
CREATE TABLE ShipsTable
(
ShipType VARCHAR(50) NOT NULL
)
go
CREATE VIEW ShipsView
AS
SELECT *
FROM Barges
UNION ALL
SELECT *
FROM OtherShips
GO
INSERT INTO ShipsView(Id, ShipType)
SELECT ROW_NUMBER() OVER(ORDER BY ##SPID), ShipType
FROM ShipsTable
Or you could use the OUTPUT clause and composable DML but that would require inserting both sets of rows into the first table and then cleaning out the unwanted rows afterwards (the second table would only get the correct rows and not need any clean up).
CREATE TABLE Barges2
(
ShipType VARCHAR(50) NOT NULL
)
CREATE TABLE OtherShips2
(
ShipType VARCHAR(50) NOT NULL
)
CREATE TABLE ShipsTable2
(
ShipType VARCHAR(50) NOT NULL
)
INSERT INTO Barges2
SELECT *
FROM
(
INSERT INTO OtherShips2
OUTPUT INSERTED.*
SELECT *
FROM ShipsTable2
) D
WHERE D.ShipType = 'Barge';
DELETE FROM OtherShips2 WHERE ShipType = 'Barge';
MapBasic does provide you access to MapInfo's 'Invert Selection' which would give you anything that wasn't selected from your first query (assuming your first query does return results). You can call it by using it's menu ID (found in Menu.def) which is 311 or if you include menu.def at the top of the file you can reference it through the constant M_QUERY_INVERTSELECT.
eg.
Select * from ShipsTable where ShipType = "Barge" into Barges
Run Menu Command 311
or
Run Menu Command M_QUERY_INVERTSELECT if you have included the menu definitions file.
I believe this would give you better performance than doing a second selection as per your example but you wouldn't be able to then name the results table with an alias without doing another selection. Depends on your use case whether this is worth using or not, for a large query that takes quite a while it could well save on some processing time.

Nested Loop in Where Statement killing performance

I am having serious performance issues when using a nested loop in a WHERE clause.
When I run the below code as is, it takes several minutes. The trick is I'm using the WHERE clause to pull ALL data if the report_id is NULL, but only certain report_id's if I set them in the parameter string.
The function [fn_Parse_List] turns a VARCHAR string such as '123,456,789' into a table where each row is each number in integer form, which is then used in the IN clause.
When I run the code below with report_id = '456' (the dashed out portion), the code takes seconds, but passing the temporary table and using the SELECT statement in the WHERE clause kills it.
alter procedure dbo.p_revenue
(#report_id varchar(max) = NULL)
as
select cast(value as int) Report_ID
into #report_ID_Temp
from [fn_Parse_List] (#report_id)
SELECT *
FROM BIGTABLE
where #report_id is null
or a.report_id in (select Report_ID from #report_ID_Temp)
--Where #report_id is null or a.report_id in (456)
exec p_revenue #report_id = '456'
Is there a way to optimize this? I tried a JOIN with the table #report_ID_Temp, but it still takes just as long and doesn't work when the report_id is NULL.
You're breaking three different rules.
If you want two query plans, you need two queries: OR does not give you two query plans. IF does.
If you have a temporary table, make sure it has a primary key and any appropriate indexes. In your case, you need an ALTER TABLE statement to add the primary key clustered index. Or you can CREATE TABLE to declare the structure in the first place.
If you think fn_Parse_List is a good idea, you haven't read enough Sommarskog
If I were to write the Stored Procedure for your case, I would use a Table Valued Parameter (TVP) instead of passing multiple values as a comma-seperated string.
Something like the following:
-- Create a type for the TVP
CREATE TYPE REPORT_IDS_PAR AS TABLE(
report_id INT
);
GO
-- Use the TVP type instead of VARCHAR
CREATE PROCEDURE dbo.revenue
#report_ids REPORT_IDS_PAR READONLY
AS
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS(SELECT 1 FROM #report_ids)
SELECT
*
FROM
BIGTABLE;
ELSE
SELECT
*
FROM
#report_ids AS ids
INNER JOIN BIGTABLE AS bt ON
bt.report_id=ids.report_id;
-- OPTION(RECOMPILE) -- see remark below
END
GO
-- Execute the Stored Procedure
DECLARE #ids REPORT_IDS_PAR;
-- Empty table for all rows:
EXEC dbo.revenue #ids;
-- Specific report_id's for specific rows:
INSERT INTO #ids(report_id)VALUES(123),(456),(789);
EXEC dbo.revenue #ids;
GO
If you run this procedure with a TVP with a lot of rows or a wildly varying number of rows, I suggest you add the option OPTION(RECOMPILE) to the query.
I see 2 possible things that could help improve performance. Depends on which part is taking the longest. First off, SELECT INTO is a single threaded operation until SQL Server 2014. If this is taking a long time, create an explicitly defined temp table with CREATE TABLE. Secondly, depending on the number of records inserted into the temp table, you probably need an index on the Report_ID column. That can all be done in the body of the stored procedure. If you do end up using an explicitly defined temp table, I would create the index after the data is loaded.
If that doesn't help, first check that the report_id column on the BIGTABLE is indexed. Then try splitting the select into 2 and combining with a UNION ALL like this:
ALTER PROCEDURE dbo.p_revenue
(
#report_id VARCHAR(MAX) = NULL
)
AS
SELECT CAST(value AS INT) Report_ID
INTO #report_ID_Temp
FROM fn_Parse_List(#report_id);
SELECT *
FROM BIGTABLE
WHERE #report_id IS NULL
UNION ALL
SELECT *
FROM BIGTABLE
WHERE a.report_id IN ( SELECT Report_ID
FROM #report_ID_Temp );
GO
EXEC p_revenue #report_id = '456';
Are you saying I should have two queries, one where it pulls if the report_id doesn't exists and one where there is a list of report_ids?
Yes, yes, yes. The fact, that it somehow works when You enter the numbers directly, distracts You from the core problem. You need table scan when #report_id is null and index seek when it is not and You can not have both in one execution plan. The performance would inevitably have to suffer, one way or another.
I would prefer not to, as the table i'm pulling from is actually a
view with 800 lines with an additional parameter not shown above.
I do not see where is the problem, SELECT * FROM BIGTABLE and SELECT * FROM BIGVIEW seems the same. If You need parameters You can use inline table valued function. If You have more parameters with variable selectivity like #report_id, I guess You would end up with dynamic sql anyway, sooner or later.
UNION ALL as proposed by #db_brad would help, but one of those subquery is executed even when there is no need for it.
As a quick patch You can append OPTION(RECOMPILE) to the SELECT and have table scan one time and index seek the other time, but recompiling every time would induce nontrivial overhead.

Using with vs declare a temporary table: performance / difference?

I have created a sql function in SQLServer 2008 that declared a temporary table and uses it to compute a moving average on the values inside
declare #tempTable table
(
GeogType nvarchar(5),
GeogValue nvarchar(7),
dtAdmission date,
timeInterval int,
fromTime nvarchar(5),
toTime nvarchar(5),
EDSyndromeID tinyint,
nVisits int
)
insert #tempTable select * from aces.dbo.fEDVisitCounts(#geogType, #hospID,DATEADD(DD,-#windowDays + 1,#fromDate),
#toDate,#minAge,#maxAge,#gender,#nIntervalsPerDay, #nSyndromeID)
INSERT #table (dtAdmission,EDSyndromeID, MovingAvg)
SELECT list.dtadmission
, #nSyndromeID
, AVG(data.nVisits) as MovingAvg
from #tempTable as list
inner join #tempTable as data
ON list.dtAdmission between data.dtAdmission and DATEADD(DD,#windowDays - 1,data.dtAdmission)
where list.dtAdmission >= #fromDate
GROUP BY list.dtAdmission
but I also found out that you can declare the tempTable like this:
with tempTable as
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
)
Question: Is there a major difference in these two approaches? Is one faster than the other or more common / standard? I would think the declare is faster since you define what the columns you are looking for are.. Would it also be even faster if I were to omit the columns that were not used in the calculations of moving average?(not sure about this one since it has to get all of the rows anyways, though selecting less columns makes intuitive sense that it would be faster/less to do)
I also have found a create temporary table #table from here How to declare Internal table in MySQL? but I don't want the table to persist outside of the function (I am not sure if the create temporary table does this or not.)
The #table syntax creates a table variable (an actual table in tempdb) and materialises the results to it.
The WITH syntax defines a Common Table Expression which is not materialised and is just an inline View.
Most of the time you would be better off using the second option. You mention that this is inside a function. If this is a TVF then most of the time you want these to be inline rather than multi statement so they can be expanded out by the optimiser - this would instantly disallow the use of table variables.
Sometimes however (say the underlying query is expensive and you want to avoid it being executed multiple times) you might determine that materializing the intermediate results improves performance in some specific cases. There is currently no way of forcing this for CTEs (without forcing a plan guide at least)
In that eventuality you (in general) have 3 options. A #tablevariable, #localtemp table and a ##globaltemp table. However only the first of these is permitted for use inside a function.
For further information regarding the differences between table variables and #temp tables see here.
In addition to what Martin answered
;with tempTable as
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
)
SELECT * FROM tempTable
can also be written like this
SELECT * FROM
(
select * from aces.dbo.fEDVisitCounts('ALL', null,DATEADD(DD,-7,'01-09-2010'),
'04-09-2010',0,130,null,1, 0)
) AS tempTable --now you can join here with other tables
In addition,and correcting to Martin
The #table syntax creates a table variable IN MEMORY
The #Temp syntax creates a table variable in Tempdb
Thats why #tables are faster than #temp tables

Insert into ... Select *, how to ignore identity?

I have a temp table with the exact structure of a concrete table T. It was created like this:
select top 0 * into #tmp from T
After processing and filling in content into #tmp, I want to copy the content back to T like this:
insert into T select * from #tmp
This is okay as long as T doesn't have identity column, but in my case it does. Is there any way I can ignore the auto-increment identity column from #tmp when I copy to T? My motivation is to avoid having to spell out every column name in the Insert Into list.
EDIT: toggling identity_insert wouldn't work because the pkeys in #tmp may collide with those in T if rows were inserted into T outside of my script, that's if #tmp has auto-incremented the pkey to sync with T's in the first place.
SET IDENTITY_INSERT ON
INSERT command
SET IDENTITY_INSERT OFF
As identity will be generated during insert anyway, could you simply remove this column from #tmp before inserting the data back to T?
alter table #tmp drop column id
UPD: Here's an example I've tested in SQL Server 2008:
create table T(ID int identity(1,1) not null, Value nvarchar(50))
insert into T (Value) values (N'Hello T!')
select top 0 * into #tmp from T
alter table #tmp drop column ID
insert into #tmp (Value) values (N'Hello #tmp')
insert into T select * from #tmp
drop table #tmp
select * from T
drop table T
See answers here and here:
select * into without_id from with_id
union all
select * from with_id where 1 = 0
Reason:
When an existing identity column is selected into a new table, the new column inherits the IDENTITY property, unless one of the following conditions is true:
The SELECT statement contains a join, GROUP BY clause, or aggregate function.
Multiple SELECT statements are joined by using UNION.
The identity column is listed more than one time in the select list.
The identity column is part of an expression.
The identity column is from a remote data source.
If any one of these conditions is true, the column is created NOT NULL instead of inheriting the IDENTITY property. If an identity column is required in the new table but such a column is not available, or you want a seed or increment value that is different than the source identity column, define the column in the select list using the IDENTITY function. See "Creating an identity column using the IDENTITY function" in the Examples section below.
All credit goes to Eric Humphrey and bernd_k
Not with SELECT * - if you selected every column but the identity, it will be fine. The only way I can see is that you could do this by dynamically building the INSERT statement.
Just list the colums you want to re-insert, you should never use select * anyway. If you don't want to type them ,just drag them from the object browser (If you expand the table and drag the word, columns, you will get all of them, just delete the id column)
INSERT INTO #Table
SELECT MAX(Id) + ROW_NUMBER() OVER(ORDER BY Id)
set identity_insert on
Use this.
Might an "update where T.ID = #tmp.ID" work?
it gives me a chance to preview the data before I do the insert
I have joins between temp tables as part of my calculation; temp tables allows me to focus on the exact set data that I am working with. I think that was it. Any suggestions/comments?
For part 1, as mentioned by Kolten in one of the comments, encapsulating your statements in a transaction and adding a parameter to toggle between display and commit will meet your needs. For Part 2, I would needs to see what "calculations" you are attempting. Limiting your data to a temp table may be over complicating the situation.