SQL Server Table-Value Function and Except Combination performance - sql

I have a table (Resources with about 18000 records) and a Table-Value Function with this body :
ALTER FUNCTION [dbo].[tfn_GetPackageResources]
(
#packageId int=null,
#resourceTypeId int=null,
#resourceCategoryId int=null,
#resourceGroupId int=null,
#resourceSubGroupId int=null
)
RETURNS TABLE
AS
RETURN
(
SELECT Resources.*
FROM Resources
INNER JOIN ResourceSubGroups ON Resources.ResourceSubGroupId=ResourceSubGroups.Id
INNER JOIN ResourceGroups ON ResourceSubGroups.ResourceGroupId=ResourceGroups.Id
INNER JOIN ResourceCategories ON ResourceGroups.ResourceCategoryId=ResourceCategories.Id
INNER JOIN ResourceTypes ON ResourceCategories.ResourceTypeId=ResourceTypes.Id
WHERE
(#resourceSubGroupId IS NULL OR ResourceSubGroupId=#resourceSubGroupId) AND
(#resourceGroupId IS NULL OR ResourceGroupId=#resourceGroupId) AND
(#resourceCategoryId IS NULL OR ResourceCategoryId=#resourceCategoryId) AND
(#resourceTypeId IS NULL OR ResourceTypeId=#resourceTypeId) AND
(#packageId IS NULL OR PackageId=#packageId)
)
now I make a query like this :
SELECT id
FROM dbo.tfn_GetPackageResources(#sourcePackageId,null,null,null,null)
WHERE id not in(
SELECT a.Id
FROM dbo.tfn_GetPackageResources(#sourcePackageId,null,null,null,null) a INNER JOIN
dbo.tfn_GetPackageResources(#comparePackageId,null,null,null,null) b
ON a.No = b.No AND
a.UnitCode=b.UnitCode AND
a.IsCompound=b.IsCompound AND
a.Title=b.Title
)
This query takes about 10 seconds!(Although each part query runs extremely fast but the whole one take time) I check it with LEFT JOIN and NOT EXISTS but the result was same.
but if I run the query on the Resources table directly it only takes one second or less! the fast query is :
select * from resources where id not in (select id from resources)
how can I solve it?

Your UDF is expanded like a macro.
So your complete query has
9 INNER JOINs in the IN clause
4 INNER JOINs in the main SELECT.
You apply (... IS NULL OR ...) 15 times in total for each of your WHERE clauses.
Your idea of clever code reuse fails because of this expansionSQL does not usually lend itself to this reuse.
Keep it simple:
SELECT
R.id
FROM
Resources R
WHERE
R.PackageId = #sourcePackageId
AND
R.id not in (
SELECT a.Id
FROM Resources a
INNER JOIN
Resources b
ON a.No = b.No AND
a.UnitCode=b.UnitCode AND
a.IsCompound=b.IsCompound AND
a.Title=b.Title
WHERE
a.PackageId = #sourcePackageId
AND
b.PackageId = #comparePackageId
)
For more, see my other answers here:
Why is a UDF so much slower than a subquery?
Profiling statements inside a User-Defined Function
Does query plan optimizer works well with joined/filtered table-valued functions?
Table Valued Function where did my query plan go?

In your function, declare the type of the table it returns, and include a primary key. This way, the ID filter will be able to look up the IDs more efficiently.
See http://msdn.microsoft.com/en-us/library/ms191165(v=sql.105).aspx for the syntax.

Thing you should try is to break one complicated query into multiple simple ones that store their results in temporary tables, this way one complicated execution plan will be replaced by several simple plans whose total execution time might be shorter then the execution time of a complicated execution plan:
SELECT *
INTO #temp1
FROM dbo.tfn_GetPackageResources(#sourcePackageId,null,null,null,null)
SELECT *
INTO #temp2
FROM dbo.tfn_GetPackageResources(#comparePackageId,null,null,null,null)
SELECT a.Id
INTO #ids
FROM #temp1 a
INNER JOIN
#temp2 b ON
a.No = b.No
AND a.UnitCode=b.UnitCode
AND a.IsCompound=b.IsCompound
AND a.Title=b.Title
SELECT id
FROM #temp1
WHERE id not in(
SELECT Id
FROM #ids
)
-- you can also try replacing the above query with this one if it performs faster
SELECT id
FROM #temp1 t
WHERE NOT EXISTS
(
SELECT Id FROM #ids i WHERE i.Id = t.id
)

Related

Use inner join only if temporary table have values

I have dynamic query where first I create a temporary table then I fill it:
CREATE TABLE [#SearchKeys] ([DesignKey] INT);
INSERT INTO [#SearchKeys]
SELECT
[pd].[DesignKey]
FROM
[Project] AS [p]
....
Once it have data I just use into INNER JOIN section of my dynamic query like:
INNER JOIN
(SELECT DesignKey FROM #SearchKeys) AS [S] ON [S].[DesignKey] = [PD].[DesignKey]
Problem is I only want to add this INNER JOIN if temporary table have values, if not just don't execute it. How can I achieve that? Regards
I dont see dynamic query from your question. But I just give you pesudocode here
--Check for If Data Exist in #SearchKeys
IF Exists (SELECT 1 FROM #SearchKeys)--Condition to check value available in temp table
BEGIN
--without INNER JOIN Query
END
ELSE
BEGIN
-- with INNER JOIN Query
INNER JOIN (SELECT DesignKey FROM #SearchKeys) AS [S] ON [S].[DesignKey] = [PD].[DesignKey]
END
My suggestion, which is fully inlined, was this:
DECLARE #mockupTable TABLE(ID INT IDENTITY,Content INT);
INSERT INTO #mockupTable VALUES(10),(20),(30);
DECLARE #SearchKeys TABLE(DesignKey INT);
--Keep it empty in the first run, then decomment the insert to see the difference
--INSERT INTO #SearchKeys VALUES(20)
SELECT *
FROM #mockupTable t
LEFT JOIN #SearchKeys k ON t.Content=k.DesignKey
WHERE ((SELECT COUNT(*) FROM #SearchKeys)=0 OR DesignKey IS NOT NULL);
The LEFT JOIN will return all rows in any case. The WHERE will decide if there are filters in the SearchKey-table. In this case only rows with a corresponding key are returned.
Hint: If needed, you can easily turn your keys to an anti-pattern by using IS NULL instead of IS NOT NULL. In this case you'd introduce a variable and use something like OR ((#antipattern=0 AND ...) OR (#antipattern=1 AND ...))
The other answer by Developer_29 will be better optimized, thus faster. But in many cases we don't want multi-statement approaches

can I use a variable for the integer expression in a left sql function

I have the following query:
SELECT top 2500 *
FROM table a
LEFT JOIN table b
ON a.employee_id = b.employee_id
WHERE left(a.employee_rc,6) IN
(
SELECT employeeID, access
FROM accesslist
WHERE employeeID = '#client.id#'
)
The sub select in the where clause can return one or several access values, ex:
js1234 BLKHSA
js1234 HDF48R7
js1234 BLN6
In the primary where clause I need to be able to change the integer expression from 6 to 5 or 4 or 7 depending on what the length of the values returned in the sub select. I am at a loss if this is the right way to go about it. I have tried using OR statements but it really slows down the query.
Try using exists instead:
SELECT top 2500 *
FROM table a LEFT JOIN
table b
ON a.employee_id = b.employee_id
WHERE EXISTS (Select 1
FROM accesslist
WHERE employeeID = '#client.id#' and
a.employee_rc like concat(employeeID, '%')
) ;
I don't see how your original query worked. The subquery is returning two columns and that normally isn't allowed in SQL for an in.
Move the subquery to a JOIN:
SELECT TOP 2500 *
FROM table a
LEFT JOIN table b ON a.employee_id = b.employee_id
LEFT JOIN accesslist al ON al.access LIKE concat('%', a.employee_id)
WHERE al.employeeID = '#client.id#'
Like Gordon, I don't quite see how your query worked, so I'm not quite sure if it should be access or employeeID which is matched.
This construct will enable you to do what you said you want to do, have an integer value depend on somethign from a subquery. It's the general idea only, the details are up to you.
select field1, field2
, case when subqueryField1 = 'fred' then 1
when subqueryField1 = 'barney' then 2
else 3 end integerValue
from table1 t1 join (
select idField subqueryField1, etc
from whereever ) t2 on t1.idFeld = t2.idField
where whatever
Also, a couple of things in your query are questionable. First, a top n query without an order by clause doesn't tell the database what records to return. Second, 2500 rows is a lot of data to return to ColdFusion. Are you sure you need it all? Third, selecting * instead of just the fields you need slows down performance. If you think you need every field, think again. Since the employee ids will always match, you don't need both of them.

Select count query or computed column

So, i have 2 tables:
Categories table
Analit int
PositionInMenu int
Item table
Analit int
CategoryAn int
So, i have about 50 categories and 2000 items. I need to take for each category count of items it includes.
1)SELECT *, (SELECT COUNT(Analit) FROM Item_table t2 WHERE t2.CategoryAn = t1.Analit) as tCount FROM Categories_table t1 ORDER BY PositionInMenu
2) Add into Categories table computed column with function call:
([dbo].[Categories_GetItemsCountInCategory]([Analit]))
and function code:
CREATE FUNCTION dbo.Categories_GetItemsCountInCategory
(
#categoryId int = null
)
RETURNS int
AS
BEGIN
RETURN (SELECT COUNT(Analit)
FROM Items
WHERE CategoryAn = #categoryId)
END
And then i can simply take value of added column into my query:
SELECT *
FROM Categories_table
ORDER BY PositionInMenu
So, the question. What's better for me?
You are pretty much always better off having data access inline in the query rather than in scalar UDFs.
The query optimiser does not expand out scalar (or multi statement) UDFs so you always enforce a nested loops join plan rather than allowing it to consider alternatives.
You could also consider an OUTER JOIN ... GROUP BY rather than a correlated subquery.
SELECT t1.*,
ISNULL(tCount, 0) AS tCount
FROM Categories_table t1
LEFT JOIN (SELECT COUNT(Analit) AS tCount,
CategoryAn
FROM Item_table
GROUP BY CategoryAn) t2
ON t2.CategoryAn = t1.Analit
ORDER BY PositionInMenu

Converting a nested sql where-in pattern to joins

I have a query that is returning the correct data to me, but being a developer rather than a DBA I'm wondering if there is any reason to convert it to joins rather than nested selects and if so, what it would look like.
My code currently is
select * from adjustments where store_id in (
select id from stores where original_id = (
select original_id from stores where name ='abcd'))
Any references to the better use of joins would be appreciated too.
Besides any likely performance improvements, I find following much easier to read.
SELECT *
FROM adjustments a
INNER JOIN stores s ON s.id = a.store_id
INNER JOIN stores s2 ON s2.original_id = s.original_id
WHERE s.name = 'abcd'
Test script showing my original fault in ommitting original_id
DECLARE #Adjustments TABLE (store_id INTEGER)
DECLARE #Stores TABLE (id INTEGER, name VARCHAR(32), original_id INTEGER)
INSERT INTO #Adjustments VALUES (1), (2), (3)
INSERT INTO #Stores VALUES (1, 'abcd', 1), (2, '2', 1), (3, '3', 1)
/*
OP's Original statement returns store_id's 1, 2 & 3
due to original_id being all the same
*/
SELECT * FROM #Adjustments WHERE store_id IN (
SELECT id FROM #Stores WHERE original_id = (
SELECT original_id FROM #Stores WHERE name ='abcd'))
/*
Faulty first attempt with removing original_id from the equation
only returns store_id 1
*/
SELECT a.store_id
FROM #Adjustments a
INNER JOIN #Stores s ON s.id = a.store_id
WHERE s.name = 'abcd'
If you would use joins, it would look like this:
select *
from adjustments
inner join stores on stores.id = adjustments.store_id
inner join stores as stores2 on stores2.original_id = stores.original_id
where stores2.name = 'abcd'
(Apparently you can omit the second SELECT on the stores table (I left it out of my query) because if I'm interpreting your table structure correctly,
select id from stores where original_id = (select original_id from stores where name ='abcd')
is the same as
select * from stores where name ='abcd'.)
--> edited my query back to the original form, thanks to Lieven for pointing out my mistake in his answer!
I prefer using joins, but for simple queries like that, there is normally no performance difference. SQL Server treats both queries the same internally.
If you want to be sure, you can look at the execution plan.
If you run both queries together, SQL Server will also tell you which query took more resources than the other (in percent).
A slightly different approach:
select * from adjustments a where exists
(select null from stores s1, stores s2
where a.store_id = s1.id and s1.original_id = s2.original_id and s2.name ='abcd')
As say Microsoft here:
Many Transact-SQL statements that include subqueries can be
alternatively formulated as joins. Other questions can be posed only
with subqueries. In Transact-SQL, there is usually no performance
difference between a statement that includes a subquery and a
semantically equivalent version that does not. However, in some cases
where existence must be checked, a join yields better performance.
Otherwise, the nested query must be processed for each result of the
outer query to ensure elimination of duplicates. In such cases, a join
approach would yield better results.
Your case is exactly when Join and subquery gives the same performance.
Example when subquery can not be converted to "simple" JOIN:
select Country,TR_Country.Name as Country_Translated_Name,TR_Country.Language_Code
from Country
JOIN TR_Country ON Country.Country=Tr_Country.Country
where country =
(select top 1 country
from Northwind.dbo.Customers C
join
Northwind.dbo.Orders O
on C.CustomerId = O.CustomerID
group by country
order by count(*))
As you can see, every country can have different name translations so we can not just join and count records (in that case, countries with larger quantities of translations will have more record counts)
Of cource, you can can transform this example to:
JOIN with derived table
CTE
but it is an other tale-)

INNER JOIN vs IN

SELECT C.* FROM StockToCategory STC
INNER JOIN Category C ON STC.CategoryID = C.CategoryID
WHERE STC.StockID = #StockID
VS
SELECT * FROM Category
WHERE CategoryID IN
(SELECT CategoryID FROM StockToCategory WHERE StockID = #StockID)
Which is considered the correct (syntactically) and most performant approach and why?
The syntax in the latter example seems more logical to me but my assumption is the JOIN will be faster.
I have looked at the query plans and havent been able to decipher anything from them.
Query Plan 1
Query Plan 2
The two syntaxes serve different purposes. Using the Join syntax presumes you want something from both the StockToCategory and Category table. If there are multiple entries in the StockToCategory table for each category, the Category table values will be repeated.
Using the IN function presumes that you want only items from the Category whose ID meets some criteria. If a given CategoryId (assuming it is the PK of the Category table) exists multiple times in the StockToCategory table, it will only be returned once.
In your exact example, they will produce the same output however IMO, the later syntax makes your intent (only wanting categories), clearer.
Btw, yet a third syntax which is similar to using the IN function:
Select ...
From Category
Where Exists (
Select 1
From StockToCategory
Where StockToCategory.CategoryId = Category.CategoryId
And StockToCategory.Stock = #StockId
)
Syntactically (semantically too) these are both correct. In terms of performance they are effectively equivalent, in fact I would expect SQL Server to generate the exact same physical plans for these two queries.
T think There are just two ways to specify the same desired result.
for sqlite
table device_group_folders contains 10 records
table device_groups contains ~100000 records
INNER JOIN: 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs INNER JOIN device_groups ON device_groups.parent = select_childs.uuid;
WHERE 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs, device_groups WHERE device_groups.parent = select_childs.uuid;
IN <1 ms
SELECT device_groups.uuid FROM device_groups WHERE device_groups.parent IN (WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT * FROM select_childs);