Query on Many to Many tables with multiple OR/AND - sql

I have a basic many-to-many table that is:
tbFilter
filterId | filterName
tbProduct
productId | productName
tbProductFilter
filterId | productId
So, I have many products with many filters (colors, sizes, etc). Now, I need to create a procedure to finde products with some filter combination, like:
All products that is(blue OR green) and (large OR xlarge) and (forMen)
The only way that I found to create this query is with multiple joins of same table, each join for a "group" filter or with multiples subqueries, each one for a group. The biggest problem is that the many-to-many table have more them 100k records, so this approaches give a poor performance.
How is the best way to do this query? I'm using sql 2012.
Thanks
This is how I get working now:
select [produtos].* FROM [dbo].[tbProdutos] AS [produtos] JOIN [dbo].tbJuncaoProdutoCategoria] AS [juncaoProdutoCategoria] ON [produtos].[produtoId] = juncaoProdutoCategoria].[produtoId] JOIN [dbo].[tbJuncaoProdutoCategoria] AS juncaoProdutoCategoria2] ON [produtos].[produtoId] = [juncaoProdutoCategoria2].[produtoId] JOIN [dbo].[tbProdutoCategoria] AS [produtoCategoria] ON [produtoCategoria].[categoriaId] = [juncaoProdutoCategoria].[categoriaId] where [juncaoProdutoCategoria].categoriaId = 1 AND ([juncaoProdutoCategoria2].categoriaId = 300 OR [juncaoProdutoCategoria2].categoriaId = 301)

Put the filters into a table (or a table-valued parameter), join that to the ProductFilter table, group by product, and count the unique filters that get joined.
This method can handle an arbitrary number of filters and perform fuzzy matching, i.e. "show me the products that match three of these four filters"
DECLARE #filterCount int = 3
DECLARE #filterSet TABLE ( filterNum int, filterName varchar(max) )
INSERT #filterSet VALUES
(1,'blue'),(1,'green'),
(2,'large'),(2,'xlarge'),
(3,'forMen')
SELECT pf.ProductId
FROM tbProductFilter pf
INNER JOIN tbFilter f ON f.filterId = pf.filterId
INNER JOIN #filterSet s ON s.filterName = f.filterName
GROUP BY pf.productId
HAVING COUNT(DISTINCT s.filterNum) = #filterCount

First of all, you most certainly want to check your indexing - you'll want indexes on all foreign key fields, as well as on the fitlerName.
Assuming your indexing is in good shape, here's one way you might do this:
SELECT p.* -- preferably just select the fields you need here...
FROM products p
WHERE p.productId IN (
SELECT pf.product_id
FROM tbProductFilter pf
WHERE EXISTS (SELECT 1 FROM tbFilter f
WHERE pf.filterId = f.filterId AND f.filterName IN ('blue', 'green'))
AND EXISTS (SELECT 1 FROM tbFilter f
WHERE pf.filterId = f.filterId AND f.filterName IN ('large', 'xlarge'))
AND EXISTS (SELECT 1 FROM tbFilter f
WHERE pf.filterId = f.filterId AND f.filterName = 'forMen')
)

Related

SQL Server Search Filters Slow With Big Table Joins

I have a catalog of products available to the user (close to 1000 products). These products are in categories and subcategories, have colors, prices, etc. I want the user to be able to filter the results of this catalog based on some of this criteria. Some of this filter criteria is keywords (i.e. find me a blue pen). However, I'm running in to performance issues when joining a large table to search over in this query, specifically my colors table. Here is my query:
DECLARE #keyword nvarchar(100)
SET #keyword = 'bag'
SELECT DISTINCT
SP.pk_storeProductID,
dbo.getMaxPrice(SP.pk_storeProductID) AS maxPrice
FROM
tblProduct P
INNER JOIN
tblStoreProduct SP ON P.pk_productID = SP.fk_productID
INNER JOIN
tblProductDescription PD ON P.pk_productID = PD.fk_productID
INNER JOIN
tblProductSubCategory PSC ON SP.pk_storeProductID = PSC.fk_productID
INNER JOIN
tblSubCategory SC ON PSC.fk_subCategoryID = SC.pk_subCategoryID
INNER JOIN
tblCategory C ON SC.fk_categoryID = C.pk_categoryID
LEFT OUTER JOIN
tblProductImprint PI ON P.pk_productID = PI.fk_productID
LEFT OUTER JOIN
tblProductColor PC ON P.pk_productID = PC.fk_productID
LEFT OUTER JOIN
tblColor COL ON PC.fk_colorID = COL.pk_colorID
WHERE
P.blnActive = 1
AND SP.blnStoreActive = 1
AND SC.blnActive = 1
AND C.blnActive = 1
AND C.blnLocked = 0
AND SP.fk_storeID = 74
AND 1 = CASE WHEN CONTAINS(P.productNumber, #keyword) THEN 1
WHEN CONTAINS(P.productName, #keyword) THEN 1
WHEN CONTAINS(PD.productDesc, #keyword) THEN 1
WHEN CONTAINS(PD.keywords, #keyword) THEN 1
ELSE 0
END
The query seems to get significantly bogged down when I join tblColor. The way my system handles product colors is via three tables: a product table, a product color table (which is just two foreign keys for product ID and color ID), and a table for just colors (has a primary key for color and the name of the color). This table has 13,144 rows in it. I need to join this table in order to search the color names for a keyword that is like it based on the user input (this is not shown in the query above). Also, a product may not have colors associated with it, but I don't want to leave it out of the results, so I'm doing a left outer join on the color tables.
Is there a more efficient way to do this? As it is now, a simple search can take up to 4 minutes to return results, and the culprit always seems to be that color table. I have proper indexes on all of these tables, and full text index on the product, product description, and color tables.
Any help would be greatly appreciated!

SQL Query between three tables, get data only from one table

I have these 3 tables:
product table
id
siteId
optionsSet table
id
productId
...
option table
id
optionsSetId
code
...
Question:
How can I make a SQL query to select all from option table by knowing these two: option.code and product.siteId ?
I know how to do a query with JOIN on two tables, but I am struggling with joining these three tables.
Something like
SELECT
*
FROM
option
WHERE
code = #code
AND optionsSetId IN
(SELECT
os.id
FROM
optionsSet os
JOIN product p ON os.productId = p.Id
WHERE
p.siteId = #siteId)
where #code is your code parameter and #siteId is your siteid parameter
to use inner joins you would have to join all 3 tables together and that would like
SELECT
DISTINCT o.*
FROM
option o
JOIN optionsSet os ON o.optionsSetId = os.Id
JOIN product p ON os.productId = p.Id
WHERE
o.code = #code
AND p.siteId = #site
if you notice that requires a DISTINCT to only get the data from option. It may be simpler and easier to understand but not very efficient.
another option that someone will probably say is way more awesome is using EXISTS
SELECT
o.*
FROM
OPTION o
WHERE
o.code = #code
AND EXISTS(
SELECT
1
FROM
optionsSet os
JOIN product p ON os.productId = p.Id
WHERE
o.optionSetId = os.Id
AND p.siteId = #siteId
)
I used EXISTS exclusively for a few years and the started working on databases with tables that had +100million records and IN was faster than EXISTS in some cases and identical in the others. Plus IN is less code.
SELECT * FROM option
LEFT JOIN product
ON option.code = product.siteId (+)
--(+) is a left outter join. This should include all of the values in option.code and all of the values in product that have the same siteId as values in option.
I'm unsure on how you want OptionSet to relate to the other 2 databases though?
if you want to include the third tables result you can just add another join on that table for the condition you want.

Filter a SQL Server table dynamically using multiple joins

I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.

SQL query look for exact match using inner join

I have 2 tables in SQL Server 2005 db with structures represented as such:
CAR:
CarID bigint,
CarField bigint,
CarFieldValue varchar(50);
TEMP: CarField bigint, CarFieldValue varchar(50);
Now the TEMP table is actually a table variable containing data collected through a search facility. Based on the data contained in TEMP, I wish to filter out and get all DISTINCT CarID's from the CAR table exactly matching those rows in the TEMP table. A simple Inner Join works well, but I want to only get back the CarID's that match ALL the rows in TEMP exactly. Basically, each row in TEMP is supposed to be denote an AND filter, whereas, with the current inner join query, they are acting more like OR filters. The more rows in TEMP, the less rows I expect showing in my result-set for CAR. I hope Im making sense with this...if not please let me know and I'll try to clarify.
Any ideas on how I can make this work?
Thank u!
You use COUNT, GROUP BY and HAVING to find the cars that have exactly that many mathicng rows as you expect:
select CarID
from CAR c
join TEMP t on c.CarField = t.CarField and c.CarFieldValue = t.CarFieldValue
group by CarID
having COUNT(*) = <the number you expect>;
You can even make <the number you expect> be a scalar subquery like select COUNT(*) from TEMP.
SELECT *
FROM (
SELECT CarID,
COUNT(CarID) NumberMatches
FROM CAR c INNER JOIN
TEMP t ON c.CarField = t.CarField
AND c.CarFieldValue = t.CarFieldValue
GROUP BY CarID
) CarNums
WHERE NumberMatches = (SELECT COUNT(1) FROM TEMP)
Haven't tested this, but I don't think you need a count to do what you want. This query ought to be substantially faster because it avoids a potentially huge number of counts. This query finds all the cars which are missing a value and then filters them out.
select distinct carid from car where carid not in
(
select
carid
from
car c
left outer join temp t on
c.carfield = t.carfield
and c.carfieldvalue = t.carfieldvalue
where
t.carfield is null
)
Hrm...
;WITH FilteredCars
AS
(
SELECT C.CarId
FROM Car C
INNER JOIN Temp Criteria
ON C.CarField = Criteria.CarField
AND C.CarFieldValue = Critera.CarFieldValue
GROUP BY C.CarId
HAVING COUNT(*) = (SELECT COUNT(*) FROM Temp)
)
SELECT *
FROM FilteredCars F
INNER JOIN Car C ON F.CarId = C.CarId
The basic premise is that for ALL criteria to match an INNER JOIN against your temp table must produce as many records as there are within that table. The HAVING clause at the end of the FilteredCars query should widdle the results down to those that match all criteria.

A messy SQL statement

I have a case where I wanna choose any database entry that have an invalid Country, Region, or Area ID, by invalid, I mean an ID for a country or region or area that no longer exists in my tables, I have four tables: Properties, Countries, Regions, Areas.
I was thinking to do it like this:
SELECT * FROM Properties WHERE
Country_ID NOT IN
(
SELECT CountryID FROM Countries
)
OR
RegionID NOT IN
(
SELECT RegionID FROM Regions
)
OR
AreaID NOT IN
(
SELECT AreaID FROM Areas
)
Now, is my query right? and what do you suggest that i can do and achieve the same result with better performance?!
Your query in fact is optimal.
LEFT JOIN's proposed by others are worse, as they select ALL values and then filter them out.
Most probably your subquery will be optimized to this:
SELECT *
FROM Properties p
WHERE NOT EXISTS
(
SELECT 1
FROM Countries i
WHERE i.CountryID = p.CountryID
)
OR
NOT EXISTS
(
SELECT 1
FROM Regions i
WHERE i.RegionID = p.RegionID
)
OR
NOT EXISTS
(
SELECT 1
FROM Areas i
WHERE i.AreaID = p.AreaID
)
, which you should use.
This query selects at most 1 row from each table, and jumps to the next iteration right as it finds this row (i. e. if it does not find a Country for a given Property, it will not even bother checking for a Region).
Again, SQL Server is smart enough to build the same plan for this query and your original one.
Update:
Tested on 512K rows in each table.
All corresponding ID's in dimension tables are CLUSTERED PRIMARY KEY's, all measure fields in Properties are indexed.
For each row in Property, PropertyID = CountryID = RegionID = AreaID, no actual missing rows (worst case in terms of execution time).
NOT EXISTS 00:11 (11 seconds)
LEFT JOIN 01:08 (68 seconds)
You could rewrite it differently as follows:
SELECT p.*
FROM Properties p
LEFT JOIN Countries c ON p.Country_ID = c.CountryID
LEFT JOIN Regions r on p.RegionID = r.RegionID
LEFT JOIN Areas a on p.AreaID = a.AreaID
WHERE c.CountryID IS NULL
OR r.RegionID IS NULL
OR a.AreaID IS NULL
Test the performance difference (if there is any - there should be as NOT IN is a nasty search, especially over a lot of items as it HAS to test every single one).
You can also make this faster by indexing the IDS being searched - in each master table (Country, Region, Area) they should be clustered primary keys.
Since this seems to be cleanup sql, this should be ok. But how about using foreign keys so that it does not bother you next time around?
Well, you could try things like UNION (instead of OR) - but I expect that the optimizer is already doing the best it can given the information available:
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
Subqueries in the conditions can be quite inefficient. Instead you can do left joins against the related tables. Where there are no matching record you get a null value. You can use this in the condition to select only the records where there is a matching record missing:
select p.*
from Properties p
left join Countries c on c.CountryID = p.Country_ID
left join Regions r on r.RegionID = p.RegionID
left join Areas a on a.AreaID = p.AreaID
where c.CountryID is null or r.RegionID is null or a.AreaID is null
If you're not grabbing the row data from countries/regions/areas you can try using "exists":
SELECT Properties.*
FROM Properties
WHERE Properties.CountryID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
OR Properties.RegionID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
OR Properties.AreaID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
This will typically hint to use the pkey indices of countries et al for the existence check... but whether that is an improvement depends on your data stats, you simply have to plug it into query analyzer and try it.