Inner Join Producing cartesian product - sql

Looking at the 2 queries below, I assumed they would return the same result set but they're way off. Why is the 2 query with the inner join producing so many records? What am I doing wrong? I've been staring at this a little too long and need a fresh pair of eyes to look at it.
SELECT COUNT(*)
FROM ZCQ Z
WHERE Z.QUOTE_CUSTOMER_ID IN (SELECT CUSTOMER_ID FROM CUST_ORDER)
-- returned 6,646 RECS
SELECT COUNT(*)
FROM ZCQ Z
INNER JOIN CUST_ORDER CO ON zquote_customer_id = co.customer_id
-- returned 4,232,473 RECS
Please note these are Oracle 10g tables but have no FK or PK setup by the DBA.

No, these will not generally return the same result.
The first counts the number of rows in ZCQ that match a customer in CUST_ORDER.
The second counts the total number of rows that match. If there are duplicate customers in CUST_ORDER, then all duplicates will be counted.
You could get the same result using:
SELECT COUNT(DISTINCT z.zquote_customer_id)
FROM ZCQ Z JOIN
CUST_ORDER CO
ON zquote_customer_id = co.customer_id;
But IN or EXISTS is probably more efficient than removing the duplicates after doing the match.

Related

SQLite - How to select records from one table that are not in another table

I have a database with 3 tables; tblCustomers, tblBookings, tblFlights.
I want to find the customer's last name (LName), from the Customers table where the customers do not appear in the bookings table. It should return just three names, but it returns the names 10 times each. There are 10 records in the bookings table, so I think the command is returning the correct names, but not once...
I have tried:
SELECT tblCustomers.LName
FROM tblCustomers, tblBookings
WHERE tblCustomers.CustID
NOT IN (SELECT CustID FROM tblBookings)
How do I return just one instance of the name, not the name repeated 10 times?
You are doing a CROSS JOIN of the 2 tables.
Use only NOT IN:
SELECT LName
FROM tblCustomers
WHERE CustID NOT IN (SELECT CustID FROM tblBookings)
The (implicit) cross join on The bookings table in the outer query makes no sense - and it multiplies the customer rows.
Also, I would recommend not exists for filtering instead of not in: it usually performs better - with the right index in place, and it is null-safe:
SELECT c.LName
FROM tblCustomers c
WHERE NOT EXISTS (SELECT 1 FROM tblBookings b WHERE b.CustID = c.CustID)
For performance, make sure to have an index on tblBookings(CustID) - if you have a proper foreign key declared, it should already be there.

SQL grouping. How to select row with the highest column value when joined. No CTEs please

I've been banging my head against the wall for something that I think should be simple but just cant get to work.
I'm trying to retrieve the row with the highest multi_flag value when I join table A and table B but I can't seem to get the SQL right because it returns all the rows rather than the one with the highest multi_flag value.
Here are my tables...
Table A
Table B
This is almost my desired result but only if I leave out the value_id row
SELECT CATALOG, VENDOR_CODE, INVLINK, NAME_ID, MAX(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE AS A
INNER JOIN TBLATTRIBUTE_VALUE AS B
ON A.VALUE_ID = B.VALUE_ID
GROUP BY CATALOG, VENDOR_CODE, INVLINK, NAME_ID
ORDER BY CATALOG DESC
This is close to what I want to retreive but not quite notice how it returns unique name_id and the highest multi_flag but I also need the value_id that belongs to such multi_flag / name_id grouping...
If I include the value_id in my SQL statement then it returns all rows and is no longer grouped
Notic ein the results below how it no longer returns the row for the highest multi_flag and how all the different values for name_id (Ex. name_id 1) are also returned
You can choose to use a sub-query, derived table or CTE to solve this problem. Performance will be depending on the amount of data you are querying. To achieve your goal of getting the max multiflag you must first get the max value based on the grouping you want to achieve this you can use a CTE or sub query. The below CTE will give the max multi_flag by value that you can use to get the max multi_flag and then you can use that to join back to your other tables. I have three joins in this example but this can be reduce and as far a performance it may be better to use a subquery but you want know until you get the se the actual execution plans side by side.
;with highest_multi_flag as
(
select value_id, max(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE
group by value_id
)
select A.CATALOG, a.VENDOR_CODE, a.INVLINK, b.NAME_ID,m.multiflag
from highest_multi_flag m
inner join TBLINVENT_ATTRIBUTE AS A on a.VALUE_ID =b. m.VALUE_ID
INNER JOIN TBLATTRIBUTE_VALUE AS B ON m.VALUE_ID = B.VALUE
You can use Lateral too, its an other solution
SELECT
A.CATALOG, A.VENDOR_CODE, A.INVLINK, B.NAME_ID, M.maxmultiflag
FROM TBLINVENT_ATTRIBUTE AS A
inner join lateral
(
select max(B.multi_flag) as maxmultiflag from TBLINVENT_ATTRIBUTE C
where A.VALUE_ID = C.VALUE_ID
) M on 1=1
INNER JOIN TBLATTRIBUTE_VALUE AS B ON M.maxmultiflag = B.VALUE

Difference between Two Queries - Join vs IN

I have the following two queries. Query1 is returning 1000 as row count where as Query2 is returning 4000 as row count. Can someone please explain the difference between both the queries. I was hoping both would return same count.
Query1:
SELECT COUNT(*)
FROM TableA A
WHERE A.VIN IN (
SELECT VIN
FROM TableB B, TableC C
WHERE B.MODEL_YEAR = '2014' AND B.VIN_NBR = C.VIN
)
Query2:
SELECT COUNT(*)
FROM TABLEA A, TableB B, TableC C
WHERE B.MODEL_YEAR = '2014' AND B.VIN_NBR = C.VIN AND A.VIN = C.VIN
In many cases, they will return the same answer, but not necessarily. The first counts the number of rows in A that match the conditions -- each row is counted only once, regardless of the number of matches. The second does a join, which can multiply the number of rows.
The second query would be equivalent in results if it used count(distinct A.id), where id is unique or a primary key.
That said, although they are similar in functionality, how they are executed can be quite different. Different SQL engines might do a better job of optimizing one version or the other.
By the way, you should avoid the archaic join syntax that you are using. Since 1992, explicit joins have been part of SQL syntax.

SQL, only if matching all foreign key values to return the record?

I have two tables
Table A
type_uid, allowed_type_uid
9,1
9,2
9,4
1,1
1,2
24,1
25,3
Table B
type_uid
1
2
From table A I need to return
9
1
Using a WHERE IN clause I can return
9
1
24
SELECT
TableA.type_uid
FROM
TableA
INNER JOIN
TableB
ON TableA.allowed_type_uid = TableB.type_uid
GROUP BY
TableA.type_uid
HAVING
COUNT(distinct TableB.type_uid) = (SELECT COUNT(distinct type_uid) FROM TableB)
Join the two tables togeter, so that you only have the records matching the types you are interested in.
Group the result set by TableA.type_uid.
Check that each group has the same number of allowed_type_uid values as exist in TableB.type_uid.
distinct is required only if there can be duplicate records in either table. If both tables are know to only have unique values, the distinct can be removed.
It should also be noted that as TableA grows in size, this type of query will quickly degrade in performance. This is because indexes are not actually much help here.
It can still be a useful structure, but not one where I'd recommend running the queries in real-time. Rather use it to create another persisted/cached result set, and use this only to refresh those results as/when needed.
Or a slightly cheaper version (resource wise):
SELECT
Data.type_uid
FROM
A AS Data
CROSS JOIN
B
LEFT JOIN
A
ON Data.type_uid = A.type_uid AND B.type_uid = A.allowed_type_uid
GROUP BY
Data.type_uid
HAVING
MIN(ISNULL(A.allowed_type_uid,-999)) != -999
Your explanation is not very clear. I think you want to get those type_uid's from table A where for all records in table B there is a matching A.Allowed_type_uid.
SELECT T2.type_uid
FROM (SELECT COUNT(*) as AllAllowedTypes FROM #B) as T1,
(SELECT #A.type_uid, COUNT(*) as AllowedTypes
FROM #A
INNER JOIN #B ON
#A.allowed_type_uid = #B.type_uid
GROUP BY #A.type_uid
) as T2
WHERE T1.AllAllowedTypes = T2.AllowedTypes
(Dems, you were faster than me :) )

SQL Table A Left Join Table B And top of table B

Im working myself into an SQL frenzy, hopefully someone out there can help!
I've got 2 tables which are basically Records and Outcomes, I want to join the 2 tables together, count the number of outcomes per record (0 or more) which I've got quite easily with:
Select records.Id, (IsNull(Count(outcomes.Id),0)) as outcomes
from records
Left Join
outcomes
on records.Id = outcomes.Id
group by
records.Id
The outcomes table also has a timestamp in it, what I want to do is include the last outcome in my result set, if I add that the my query it generates a record for every combination of records to outcomes.
Can any SQL expert point me in the right direction?
Cheers,
try:
SELECT
dt.Id, dt.outcomes,MAX(o.YourTimestampColumn) AS LastOne
FROM (SELECT --basically your original query, just indented differently
records.Id, (ISNULL(COUNT(outcomes.Id),0)) AS outcomes
from records
LEFT JOIN outcomes ON records.Id = outcomes.Id
GROUP BY records.Id
) dt
INNER JOIN outcomes o ON dt.Id = o.Id
GROUP BY dt.Id, dt.outcomes