A messy SQL statement - sql

I have a case where I wanna choose any database entry that have an invalid Country, Region, or Area ID, by invalid, I mean an ID for a country or region or area that no longer exists in my tables, I have four tables: Properties, Countries, Regions, Areas.
I was thinking to do it like this:
SELECT * FROM Properties WHERE
Country_ID NOT IN
(
SELECT CountryID FROM Countries
)
OR
RegionID NOT IN
(
SELECT RegionID FROM Regions
)
OR
AreaID NOT IN
(
SELECT AreaID FROM Areas
)
Now, is my query right? and what do you suggest that i can do and achieve the same result with better performance?!

Your query in fact is optimal.
LEFT JOIN's proposed by others are worse, as they select ALL values and then filter them out.
Most probably your subquery will be optimized to this:
SELECT *
FROM Properties p
WHERE NOT EXISTS
(
SELECT 1
FROM Countries i
WHERE i.CountryID = p.CountryID
)
OR
NOT EXISTS
(
SELECT 1
FROM Regions i
WHERE i.RegionID = p.RegionID
)
OR
NOT EXISTS
(
SELECT 1
FROM Areas i
WHERE i.AreaID = p.AreaID
)
, which you should use.
This query selects at most 1 row from each table, and jumps to the next iteration right as it finds this row (i. e. if it does not find a Country for a given Property, it will not even bother checking for a Region).
Again, SQL Server is smart enough to build the same plan for this query and your original one.
Update:
Tested on 512K rows in each table.
All corresponding ID's in dimension tables are CLUSTERED PRIMARY KEY's, all measure fields in Properties are indexed.
For each row in Property, PropertyID = CountryID = RegionID = AreaID, no actual missing rows (worst case in terms of execution time).
NOT EXISTS 00:11 (11 seconds)
LEFT JOIN 01:08 (68 seconds)

You could rewrite it differently as follows:
SELECT p.*
FROM Properties p
LEFT JOIN Countries c ON p.Country_ID = c.CountryID
LEFT JOIN Regions r on p.RegionID = r.RegionID
LEFT JOIN Areas a on p.AreaID = a.AreaID
WHERE c.CountryID IS NULL
OR r.RegionID IS NULL
OR a.AreaID IS NULL
Test the performance difference (if there is any - there should be as NOT IN is a nasty search, especially over a lot of items as it HAS to test every single one).
You can also make this faster by indexing the IDS being searched - in each master table (Country, Region, Area) they should be clustered primary keys.

Since this seems to be cleanup sql, this should be ok. But how about using foreign keys so that it does not bother you next time around?

Well, you could try things like UNION (instead of OR) - but I expect that the optimizer is already doing the best it can given the information available:
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)

Subqueries in the conditions can be quite inefficient. Instead you can do left joins against the related tables. Where there are no matching record you get a null value. You can use this in the condition to select only the records where there is a matching record missing:
select p.*
from Properties p
left join Countries c on c.CountryID = p.Country_ID
left join Regions r on r.RegionID = p.RegionID
left join Areas a on a.AreaID = p.AreaID
where c.CountryID is null or r.RegionID is null or a.AreaID is null

If you're not grabbing the row data from countries/regions/areas you can try using "exists":
SELECT Properties.*
FROM Properties
WHERE Properties.CountryID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
OR Properties.RegionID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
OR Properties.AreaID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
This will typically hint to use the pkey indices of countries et al for the existence check... but whether that is an improvement depends on your data stats, you simply have to plug it into query analyzer and try it.

Related

Query on Many to Many tables with multiple OR/AND

I have a basic many-to-many table that is:
tbFilter
filterId | filterName
tbProduct
productId | productName
tbProductFilter
filterId | productId
So, I have many products with many filters (colors, sizes, etc). Now, I need to create a procedure to finde products with some filter combination, like:
All products that is(blue OR green) and (large OR xlarge) and (forMen)
The only way that I found to create this query is with multiple joins of same table, each join for a "group" filter or with multiples subqueries, each one for a group. The biggest problem is that the many-to-many table have more them 100k records, so this approaches give a poor performance.
How is the best way to do this query? I'm using sql 2012.
Thanks
This is how I get working now:
select [produtos].* FROM [dbo].[tbProdutos] AS [produtos] JOIN [dbo].tbJuncaoProdutoCategoria] AS [juncaoProdutoCategoria] ON [produtos].[produtoId] = juncaoProdutoCategoria].[produtoId] JOIN [dbo].[tbJuncaoProdutoCategoria] AS juncaoProdutoCategoria2] ON [produtos].[produtoId] = [juncaoProdutoCategoria2].[produtoId] JOIN [dbo].[tbProdutoCategoria] AS [produtoCategoria] ON [produtoCategoria].[categoriaId] = [juncaoProdutoCategoria].[categoriaId] where [juncaoProdutoCategoria].categoriaId = 1 AND ([juncaoProdutoCategoria2].categoriaId = 300 OR [juncaoProdutoCategoria2].categoriaId = 301)
Put the filters into a table (or a table-valued parameter), join that to the ProductFilter table, group by product, and count the unique filters that get joined.
This method can handle an arbitrary number of filters and perform fuzzy matching, i.e. "show me the products that match three of these four filters"
DECLARE #filterCount int = 3
DECLARE #filterSet TABLE ( filterNum int, filterName varchar(max) )
INSERT #filterSet VALUES
(1,'blue'),(1,'green'),
(2,'large'),(2,'xlarge'),
(3,'forMen')
SELECT pf.ProductId
FROM tbProductFilter pf
INNER JOIN tbFilter f ON f.filterId = pf.filterId
INNER JOIN #filterSet s ON s.filterName = f.filterName
GROUP BY pf.productId
HAVING COUNT(DISTINCT s.filterNum) = #filterCount
First of all, you most certainly want to check your indexing - you'll want indexes on all foreign key fields, as well as on the fitlerName.
Assuming your indexing is in good shape, here's one way you might do this:
SELECT p.* -- preferably just select the fields you need here...
FROM products p
WHERE p.productId IN (
SELECT pf.product_id
FROM tbProductFilter pf
WHERE EXISTS (SELECT 1 FROM tbFilter f
WHERE pf.filterId = f.filterId AND f.filterName IN ('blue', 'green'))
AND EXISTS (SELECT 1 FROM tbFilter f
WHERE pf.filterId = f.filterId AND f.filterName IN ('large', 'xlarge'))
AND EXISTS (SELECT 1 FROM tbFilter f
WHERE pf.filterId = f.filterId AND f.filterName = 'forMen')
)

How to return rows matched in a table without multiple EXISTS clauses?

I want to pull back results from one table that match ALL specified values where the specified values are in another table. I can do it like this:
SELECT * FROM Contacts
WHERE
EXISTS (SELECT 1 FROM dbo.ContactClassifications WHERE ContactID = Contacts.ID AND ClassificationID = '8C62E5DE-00FC-4994-8127-000B02E10DA5')
AND EXISTS (SELECT 1 FROM dbo.ContactClassifications WHERE ContactID = Contacts.ID AND ClassificationID = 'D2E90AA0-AC93-4406-AF93-0020009A34BA')
AND EXISTS etc...
However that falls over when I get up to about 40 EXISTS clauses. The error message is "The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query."
The gist of this is to
Select all contacts with any GUID from the IN statement
Use a DISTINCT COUNT to get a count for each contactid on matching GUID's
Use the HAVING to retain only those contacts that equal the amount of matching GUID's you've put into the IN statement
SQL Statement
SELECT *
FROM dbo.Contacts c
INNER JOIN (
SELECT c.ID
FROM dbo.Contacts c
INNER JOIN dbo.ContactClassifications cc ON c.ID = cc.ContactID
WHERE cc.ClassificationID IN ('..', '..', 38 other GUIDS)
GROUP BY
c.ID
HAVING COUNT(DISTINCT cc.ClassificationID) = 40
) cc ON cc.ID = c.ID
Test script at data.stackexchange
One solution is to demand that no classification exists without a matching contact. That's a double negation:
select *
from contacts c
where not exists
(
select *
from ContactClassifications cc
where not exists
(
select *
from ContactClassifications cc2
where cc2.ContactID = c.ID
and cc2.ClassificationID = cc.ClassificationID
)
)
This type of problem is known as relational division.
SELECT c.*
FROM Contacts c
INNER JOIN
(cc.ContactID, COUNT(DISTINCT cc.ClassificationID) as num_class
FROM ContactClassifications
WHERE ClassificationID IN (....)
GROUP BY cc.ContactID
) b ON c.ID = b.ContactID
WHERE b.num_class = [number of distinct values - how many different values you put in "IN"]
If you run SQLServer 2005 and higher, you can do pretty much the same with CROSS APPLY, supposedly more efficiently

Filter a SQL Server table dynamically using multiple joins

I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.

What type of mysql join do I need in the following query

I have the following tables:
platforms(id,platformname)
games(id,title,platformid)
gameslists(id,memberid,gameid)
I would like to select all records from the games table, but exclude records where games.gameid is present in gameslists for a specific member. So in plain English: select all records from the games table except for those where the gameid is present in gameslists table where the memberid equals 999 (memberid will come from the session). I also need the platform name, but I think that's a simple inner join.
I tried this, and other variations but to no avail!
SELECT DISTINCT games.id, platforms.platformname, gameslists.gameid
FROM games
INNER JOIN platforms ON games.platformid = platforms.id
INNER JOIN gameslists ON games.id = gameslists.gameid
WHERE platformid = 1 and gameslists.memberid <> 999
ORDER BY games.releasedate DESC
LIMIT 8
Using LEFT JOIN/IS NULL
SELECT g.*
FROM GAMES g
LEFT JOIN GAMESLISTS gl ON gl.gameid = g.id
AND gl.memberid = ?
WHERE gl.id IS NULL
Using NOT IN
SELECT g.*
FROM GAMES g
WHERE g.id NOT IN (SELECT gl.gameid
FROM GAMESLISTS gl
WHERE gl.memberid = ?)
Using NOT EXISTS
SELECT g.*
FROM GAMES g
WHERE NOT EXISTS(SELECT NULL
FROM GAMESLISTS gl
WHERE gl.gameid = g.id
AND gl.memberid = ?)
Summary
In MySQL, the LEFT JOIN/IS NULL is the most efficient means of getting a list based on what to exclude, but only if the columns being compared are not nullable (the values in the join criteria can't be null). Otherwise, the NOT IN and NOT EXISTS are more efficient.
Conclusion
Because of the foreign keys, it's unlikely the columns compared will be NULL so use the LEFT JOIN/IS NULL. But be aware that you don't need to use JOINs specifically for excluding data.
Check out this guide from Stack Overflow's very own Jeff Atwood

SQL Logical AND operator for bit fields

I have 2 tables that have a many to many relationship; An Individual can belong to many Groups. A Group can have many Individuals.
Individuals basically just have their Primary Key ID
Groups have a Primary Key ID, IndividualID (same as the ID in the Individual Table), and a bit flag for if that group is the primary group for the individual
In theory, all but one of the entries for any given individual in the group table should have that bit flag set to false, because every individual must have exactly 1 primary group.
I know that for my current dataset, this assumption doesn't hold true, and I have some individuals that have the primary flag for ALL their groups set to false.
I'm having trouble generating a query that will return those individuals to me.
The closest I've gotten is:
SELECT * FROM Individual i
LEFT JOIN Group g ON g.IndividualID = i.ID
WHERE g.IsPrimaryGroup = 0
but going further than that with SUM or MAX doesn't work, because the field is a bit field, and not a numeric.
Any suggestions?
Don't know your data...but....that LEFT JOIN is an INNER JOIN
what happens when you change the WHERE to AND
SELECT * FROM Individual i
LEFT JOIN Group g ON g.IndividualID = i.ID
AND g.IsPrimaryGroup = 0
Here try running this....untested of course since you didn't provide any ample data
SELECT SUM(convert(int,g.IsPrimaryGroup)), i.ID
FROM Individual i
LEFT JOIN [Group] g ON g.IndividualID = i.ID
AND g.IsPrimaryGroup = 0
GROUP BY i.ID
HAVING COUNT(*) > 1
Try not using a bit field if you need to do SUM and MAX - use a TINYINT instead. In addition, from what I remember bit fields can not be indexed, so you will loose some performance in your joins.
Update: Got it working with a subselect. Select IndividualID from Group where the primary group is false, and individualID NOT IN (select IndividualID from Group where primary group is true)
You need to include the IsPrimaryGroup condition into the JOIN clause. This query finds all individuals with no PrimaryGroup set:
SELECT * FROM Individual i
LEFT OUTER JOIN Group g ON g.IndividualID = i.ID AND g.IsPrimaryGroup = 1
WHERE g.ID IS NULL
However, the ideal way to solve your problem (in terms of relational db) is to have a PrimaryGroupID in the Individual table.
SELECT COUNT(bitflag),individualId
FROM Groups
WHERE bitflag = 1
GROUP BY individualId
ORDER BY SUM(bitFlag)
HAVING COUNT(bitFlag) <> 1
That will give you each individual and how many primary groups they have
I don't know if this is optimal from a performance standpoint, but I believe something along these lines should work. I'm using OrgIndividual as the name of the resolution table between the Individal and the Group.SELECT DISTINCT(i.IndividualID)
FROM
Individual i INNER JOIN OrgIndividual oi
ON i.IndividualID = oi.IndividualID AND oi.PrimaryOrg = 0
LEFT JOIN OrgIndividual oip
ON oi.IndividualID = oip.IndividualID AND oi.PrimaryOrg = 1
WHERE
oi2.IndividualID IS NULL
SELECT IndividualID
FROM Group g
WHERE NOT EXISTS (
SELECT NULL FROM Group
WHERE PrimaryOrg = 1
AND IndividualID = g.IndividualID)
GROUP BY IndividualID