So, i have 2 tables:
Categories table
Analit int
PositionInMenu int
Item table
Analit int
CategoryAn int
So, i have about 50 categories and 2000 items. I need to take for each category count of items it includes.
1)SELECT *, (SELECT COUNT(Analit) FROM Item_table t2 WHERE t2.CategoryAn = t1.Analit) as tCount FROM Categories_table t1 ORDER BY PositionInMenu
2) Add into Categories table computed column with function call:
([dbo].[Categories_GetItemsCountInCategory]([Analit]))
and function code:
CREATE FUNCTION dbo.Categories_GetItemsCountInCategory
(
#categoryId int = null
)
RETURNS int
AS
BEGIN
RETURN (SELECT COUNT(Analit)
FROM Items
WHERE CategoryAn = #categoryId)
END
And then i can simply take value of added column into my query:
SELECT *
FROM Categories_table
ORDER BY PositionInMenu
So, the question. What's better for me?
You are pretty much always better off having data access inline in the query rather than in scalar UDFs.
The query optimiser does not expand out scalar (or multi statement) UDFs so you always enforce a nested loops join plan rather than allowing it to consider alternatives.
You could also consider an OUTER JOIN ... GROUP BY rather than a correlated subquery.
SELECT t1.*,
ISNULL(tCount, 0) AS tCount
FROM Categories_table t1
LEFT JOIN (SELECT COUNT(Analit) AS tCount,
CategoryAn
FROM Item_table
GROUP BY CategoryAn) t2
ON t2.CategoryAn = t1.Analit
ORDER BY PositionInMenu
Related
I have a table (Resources with about 18000 records) and a Table-Value Function with this body :
ALTER FUNCTION [dbo].[tfn_GetPackageResources]
(
#packageId int=null,
#resourceTypeId int=null,
#resourceCategoryId int=null,
#resourceGroupId int=null,
#resourceSubGroupId int=null
)
RETURNS TABLE
AS
RETURN
(
SELECT Resources.*
FROM Resources
INNER JOIN ResourceSubGroups ON Resources.ResourceSubGroupId=ResourceSubGroups.Id
INNER JOIN ResourceGroups ON ResourceSubGroups.ResourceGroupId=ResourceGroups.Id
INNER JOIN ResourceCategories ON ResourceGroups.ResourceCategoryId=ResourceCategories.Id
INNER JOIN ResourceTypes ON ResourceCategories.ResourceTypeId=ResourceTypes.Id
WHERE
(#resourceSubGroupId IS NULL OR ResourceSubGroupId=#resourceSubGroupId) AND
(#resourceGroupId IS NULL OR ResourceGroupId=#resourceGroupId) AND
(#resourceCategoryId IS NULL OR ResourceCategoryId=#resourceCategoryId) AND
(#resourceTypeId IS NULL OR ResourceTypeId=#resourceTypeId) AND
(#packageId IS NULL OR PackageId=#packageId)
)
now I make a query like this :
SELECT id
FROM dbo.tfn_GetPackageResources(#sourcePackageId,null,null,null,null)
WHERE id not in(
SELECT a.Id
FROM dbo.tfn_GetPackageResources(#sourcePackageId,null,null,null,null) a INNER JOIN
dbo.tfn_GetPackageResources(#comparePackageId,null,null,null,null) b
ON a.No = b.No AND
a.UnitCode=b.UnitCode AND
a.IsCompound=b.IsCompound AND
a.Title=b.Title
)
This query takes about 10 seconds!(Although each part query runs extremely fast but the whole one take time) I check it with LEFT JOIN and NOT EXISTS but the result was same.
but if I run the query on the Resources table directly it only takes one second or less! the fast query is :
select * from resources where id not in (select id from resources)
how can I solve it?
Your UDF is expanded like a macro.
So your complete query has
9 INNER JOINs in the IN clause
4 INNER JOINs in the main SELECT.
You apply (... IS NULL OR ...) 15 times in total for each of your WHERE clauses.
Your idea of clever code reuse fails because of this expansionSQL does not usually lend itself to this reuse.
Keep it simple:
SELECT
R.id
FROM
Resources R
WHERE
R.PackageId = #sourcePackageId
AND
R.id not in (
SELECT a.Id
FROM Resources a
INNER JOIN
Resources b
ON a.No = b.No AND
a.UnitCode=b.UnitCode AND
a.IsCompound=b.IsCompound AND
a.Title=b.Title
WHERE
a.PackageId = #sourcePackageId
AND
b.PackageId = #comparePackageId
)
For more, see my other answers here:
Why is a UDF so much slower than a subquery?
Profiling statements inside a User-Defined Function
Does query plan optimizer works well with joined/filtered table-valued functions?
Table Valued Function where did my query plan go?
In your function, declare the type of the table it returns, and include a primary key. This way, the ID filter will be able to look up the IDs more efficiently.
See http://msdn.microsoft.com/en-us/library/ms191165(v=sql.105).aspx for the syntax.
Thing you should try is to break one complicated query into multiple simple ones that store their results in temporary tables, this way one complicated execution plan will be replaced by several simple plans whose total execution time might be shorter then the execution time of a complicated execution plan:
SELECT *
INTO #temp1
FROM dbo.tfn_GetPackageResources(#sourcePackageId,null,null,null,null)
SELECT *
INTO #temp2
FROM dbo.tfn_GetPackageResources(#comparePackageId,null,null,null,null)
SELECT a.Id
INTO #ids
FROM #temp1 a
INNER JOIN
#temp2 b ON
a.No = b.No
AND a.UnitCode=b.UnitCode
AND a.IsCompound=b.IsCompound
AND a.Title=b.Title
SELECT id
FROM #temp1
WHERE id not in(
SELECT Id
FROM #ids
)
-- you can also try replacing the above query with this one if it performs faster
SELECT id
FROM #temp1 t
WHERE NOT EXISTS
(
SELECT Id FROM #ids i WHERE i.Id = t.id
)
I have a sql view, which I'm using to retrieve data. Lets say its a large list of products, which are linked to the customers who have bought them. The view should return only one row per product, no matter how many customers it is linked to. I'm using the row_number function to achieve this. (This example is simplified, the generic situation would be a query where there should only be one row returned for each unique value of some column X. Which row is returned is not important)
CREATE VIEW productView AS
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE temp.prodcut_numbering = 1
Now lets say that the total number of rows in this view is ~1 million, and running select * from productView takes 10 seconds. Performing a query such as select * from productView where productID = 10 takes the same amount of time. I believe this is because the query gets evaluated to this
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE prodcut_numbering = 1 and prodcut.Id = 10
I think this is causing the inner subquery to be evaluated in full each time. Ideally I'd like to use something along the following lines
SELECT
Row_number() OVER(PARTITION BY products.productID ORDER BY products.productID) AS product_numbering,
customer.id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
WHERE prodcut_numbering = 1
But this doesn't seem to be allowed. Is there any way to do something similar?
EDIT -
After much experimentation, the actual problem I believe I am having is how to force a join to return exactly 1 row. I tried to use outer apply, as suggested below. Some sample code.
CREATE TABLE Products (id int not null PRIMARY KEY)
CREATE TABLE Customers (
id int not null PRIMARY KEY,
productId int not null,
value varchar(20) NOT NULL)
declare #count int = 1
while #count <= 150000
begin
insert into Customers (id, productID, value)
values (#count,#count/2, 'Value ' + cast(#count/2 as varchar))
insert into Products (id)
values (#count)
SET #count = #count + 1
end
CREATE NONCLUSTERED INDEX productId ON Customers (productID ASC)
With the above sample set, the 'get everything' query below
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
takes ~1000ms to run. Adding an explicit condition:
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
where Customers.value = 'Value 45872'
Takes some identical amount of time. This 1000ms for a fairly simple query is already too much, and scales the wrong way (upwards) when adding additional similar joins.
Try the following approach, using a Common Table Expression (CTE). With the test data you provided, it returns specific ProductIds in less than a second.
create view ProductTest as
with cte as (
select
row_number() over (partition by p.id order by p.id) as RN,
c.*
from
Products p
inner join Customers c
on p.id = c.productid
)
select *
from cte
where RN = 1
go
select * from ProductTest where ProductId = 25
What if you did something like:
SELECT ...
FROM products
OUTER APPLY (SELECT TOP 1 * from customer where customerid = products.buyerid) as customer
...
Then the filter on productId should help. It might be worse without filtering, though.
The problem is that your data model is flawed. You should have three tables:
Customers (customerId, ...)
Products (productId,...)
ProductSales (customerId, productId)
Furthermore, the sale table should probably be split into 1-to-many (Sales and SalesDetails). Unless you fix your data model you're just going to run circles around your tail chasing red-herring problems. If the system is not your design, fix it. If the boss doesn't let your fix it, then fix it. If you cannot fix it, then fix it. There isn't a easy way out for the bad data model you're proposing.
this will probably be fast enough if you really don't care which customer you bring back
select p1.*, c1.*
FROM products p1
Left Join (
select p2.id, max( c2.id) max_customer_id
From product p2
Join customer c2 on
c2.productID = p2.id
group by 1
) product_max_customer
Left join customer c1 on
c1.id = product_max_customer.max_customer_id
;
SELECT C.* FROM StockToCategory STC
INNER JOIN Category C ON STC.CategoryID = C.CategoryID
WHERE STC.StockID = #StockID
VS
SELECT * FROM Category
WHERE CategoryID IN
(SELECT CategoryID FROM StockToCategory WHERE StockID = #StockID)
Which is considered the correct (syntactically) and most performant approach and why?
The syntax in the latter example seems more logical to me but my assumption is the JOIN will be faster.
I have looked at the query plans and havent been able to decipher anything from them.
Query Plan 1
Query Plan 2
The two syntaxes serve different purposes. Using the Join syntax presumes you want something from both the StockToCategory and Category table. If there are multiple entries in the StockToCategory table for each category, the Category table values will be repeated.
Using the IN function presumes that you want only items from the Category whose ID meets some criteria. If a given CategoryId (assuming it is the PK of the Category table) exists multiple times in the StockToCategory table, it will only be returned once.
In your exact example, they will produce the same output however IMO, the later syntax makes your intent (only wanting categories), clearer.
Btw, yet a third syntax which is similar to using the IN function:
Select ...
From Category
Where Exists (
Select 1
From StockToCategory
Where StockToCategory.CategoryId = Category.CategoryId
And StockToCategory.Stock = #StockId
)
Syntactically (semantically too) these are both correct. In terms of performance they are effectively equivalent, in fact I would expect SQL Server to generate the exact same physical plans for these two queries.
T think There are just two ways to specify the same desired result.
for sqlite
table device_group_folders contains 10 records
table device_groups contains ~100000 records
INNER JOIN: 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs INNER JOIN device_groups ON device_groups.parent = select_childs.uuid;
WHERE 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs, device_groups WHERE device_groups.parent = select_childs.uuid;
IN <1 ms
SELECT device_groups.uuid FROM device_groups WHERE device_groups.parent IN (WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT * FROM select_childs);
Is it allowed to reference external field from nested select?
E.g.
SELECT
FROM ext1
LEFT JOIN (SELECT * FROM int2 WHERE int2.id = ext1.some_id ) as x ON 1=1
in this case, this is referencing ext1.some_id in nested select.
I am getting errors in this case that field ext1.some_id is unknow.
Is it possible? Is there some other way?
UPDATE:
Unfortunately, I have to use nested select, since I am going to add more conditions to it, such as LIMIT 0,1
and then I need to use a second join on the same table with LIMIT 1,1 (to join another row)
The ultimate goal is to join 2 rows from the same table as if these were two tables
So I am kind of going to "spread" a few related rows into one long row.
The answer to your initial question is: No, remove your sub-query and put the condition into the ON-clause:
SELECT *
FROM ext1
LEFT JOIN int2 ON ( int2.id = ext1.some_id )
One solution could be to use variables to find the first (or second) row, but this solution would not work efficiently with indexes, so you might end up with performance problems.
SELECT ext1.some_id, int2x.order_col, int2x.something_else
FROM ext1
LEFT JOIN (SELECT `int2`.*, #i:=IF(#id=(#id:=id), #i+1, 0) As rank
FROM `int2`,
( SELECT #i:=0, #id:=-1 ) v
ORDER BY id, order_col ) AS int2x ON ( int2x.id = ext1.some_id
AND int2x.rank = 0 )
;
This assumes that you have a column that you want to order by (order_col) and Left Joins the first row per some_id.
Do you mean this?
SELECT ...
FROM ext1
LEFT JOIN int2 ON int2.id=ext1.some_id
That's what the ON clause is for:
SELECT
FROM ext1
LEFT JOIN int2 AS x ON x.id = ext1.some_id
I have 2 tables in SQL Server 2005 db with structures represented as such:
CAR:
CarID bigint,
CarField bigint,
CarFieldValue varchar(50);
TEMP: CarField bigint, CarFieldValue varchar(50);
Now the TEMP table is actually a table variable containing data collected through a search facility. Based on the data contained in TEMP, I wish to filter out and get all DISTINCT CarID's from the CAR table exactly matching those rows in the TEMP table. A simple Inner Join works well, but I want to only get back the CarID's that match ALL the rows in TEMP exactly. Basically, each row in TEMP is supposed to be denote an AND filter, whereas, with the current inner join query, they are acting more like OR filters. The more rows in TEMP, the less rows I expect showing in my result-set for CAR. I hope Im making sense with this...if not please let me know and I'll try to clarify.
Any ideas on how I can make this work?
Thank u!
You use COUNT, GROUP BY and HAVING to find the cars that have exactly that many mathicng rows as you expect:
select CarID
from CAR c
join TEMP t on c.CarField = t.CarField and c.CarFieldValue = t.CarFieldValue
group by CarID
having COUNT(*) = <the number you expect>;
You can even make <the number you expect> be a scalar subquery like select COUNT(*) from TEMP.
SELECT *
FROM (
SELECT CarID,
COUNT(CarID) NumberMatches
FROM CAR c INNER JOIN
TEMP t ON c.CarField = t.CarField
AND c.CarFieldValue = t.CarFieldValue
GROUP BY CarID
) CarNums
WHERE NumberMatches = (SELECT COUNT(1) FROM TEMP)
Haven't tested this, but I don't think you need a count to do what you want. This query ought to be substantially faster because it avoids a potentially huge number of counts. This query finds all the cars which are missing a value and then filters them out.
select distinct carid from car where carid not in
(
select
carid
from
car c
left outer join temp t on
c.carfield = t.carfield
and c.carfieldvalue = t.carfieldvalue
where
t.carfield is null
)
Hrm...
;WITH FilteredCars
AS
(
SELECT C.CarId
FROM Car C
INNER JOIN Temp Criteria
ON C.CarField = Criteria.CarField
AND C.CarFieldValue = Critera.CarFieldValue
GROUP BY C.CarId
HAVING COUNT(*) = (SELECT COUNT(*) FROM Temp)
)
SELECT *
FROM FilteredCars F
INNER JOIN Car C ON F.CarId = C.CarId
The basic premise is that for ALL criteria to match an INNER JOIN against your temp table must produce as many records as there are within that table. The HAVING clause at the end of the FilteredCars query should widdle the results down to those that match all criteria.