SQL grouping. How to select row with the highest column value when joined. No CTEs please - sql

I've been banging my head against the wall for something that I think should be simple but just cant get to work.
I'm trying to retrieve the row with the highest multi_flag value when I join table A and table B but I can't seem to get the SQL right because it returns all the rows rather than the one with the highest multi_flag value.
Here are my tables...
Table A
Table B
This is almost my desired result but only if I leave out the value_id row
SELECT CATALOG, VENDOR_CODE, INVLINK, NAME_ID, MAX(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE AS A
INNER JOIN TBLATTRIBUTE_VALUE AS B
ON A.VALUE_ID = B.VALUE_ID
GROUP BY CATALOG, VENDOR_CODE, INVLINK, NAME_ID
ORDER BY CATALOG DESC
This is close to what I want to retreive but not quite notice how it returns unique name_id and the highest multi_flag but I also need the value_id that belongs to such multi_flag / name_id grouping...
If I include the value_id in my SQL statement then it returns all rows and is no longer grouped
Notic ein the results below how it no longer returns the row for the highest multi_flag and how all the different values for name_id (Ex. name_id 1) are also returned

You can choose to use a sub-query, derived table or CTE to solve this problem. Performance will be depending on the amount of data you are querying. To achieve your goal of getting the max multiflag you must first get the max value based on the grouping you want to achieve this you can use a CTE or sub query. The below CTE will give the max multi_flag by value that you can use to get the max multi_flag and then you can use that to join back to your other tables. I have three joins in this example but this can be reduce and as far a performance it may be better to use a subquery but you want know until you get the se the actual execution plans side by side.
;with highest_multi_flag as
(
select value_id, max(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE
group by value_id
)
select A.CATALOG, a.VENDOR_CODE, a.INVLINK, b.NAME_ID,m.multiflag
from highest_multi_flag m
inner join TBLINVENT_ATTRIBUTE AS A on a.VALUE_ID =b. m.VALUE_ID
INNER JOIN TBLATTRIBUTE_VALUE AS B ON m.VALUE_ID = B.VALUE

You can use Lateral too, its an other solution
SELECT
A.CATALOG, A.VENDOR_CODE, A.INVLINK, B.NAME_ID, M.maxmultiflag
FROM TBLINVENT_ATTRIBUTE AS A
inner join lateral
(
select max(B.multi_flag) as maxmultiflag from TBLINVENT_ATTRIBUTE C
where A.VALUE_ID = C.VALUE_ID
) M on 1=1
INNER JOIN TBLATTRIBUTE_VALUE AS B ON M.maxmultiflag = B.VALUE

Related

Inner Join Producing cartesian product

Looking at the 2 queries below, I assumed they would return the same result set but they're way off. Why is the 2 query with the inner join producing so many records? What am I doing wrong? I've been staring at this a little too long and need a fresh pair of eyes to look at it.
SELECT COUNT(*)
FROM ZCQ Z
WHERE Z.QUOTE_CUSTOMER_ID IN (SELECT CUSTOMER_ID FROM CUST_ORDER)
-- returned 6,646 RECS
SELECT COUNT(*)
FROM ZCQ Z
INNER JOIN CUST_ORDER CO ON zquote_customer_id = co.customer_id
-- returned 4,232,473 RECS
Please note these are Oracle 10g tables but have no FK or PK setup by the DBA.
No, these will not generally return the same result.
The first counts the number of rows in ZCQ that match a customer in CUST_ORDER.
The second counts the total number of rows that match. If there are duplicate customers in CUST_ORDER, then all duplicates will be counted.
You could get the same result using:
SELECT COUNT(DISTINCT z.zquote_customer_id)
FROM ZCQ Z JOIN
CUST_ORDER CO
ON zquote_customer_id = co.customer_id;
But IN or EXISTS is probably more efficient than removing the duplicates after doing the match.

I expect these 2 sql statements to return same number of rows

In my mind these 2 sql statements are equivalent.
My understanding is:
the first one i am pulling all rows from tmpPerson and filtering where they do not have an equivalent person id. This query returns 211 records.
The second one says give me all tmpPersons whose id isnt in person. this returns null.
Obviously they are not equivalent or theyd have the same results. so what am i missing? thanks
select p.id, bp.id
From person p
right join(
select distinct id
from tmpPerson
) bp
on p.id= bp.id
where p.id is null
select id
from tmpPerson
where id not in (select id from person)
I pulled some ids from the first result set and found no matching records for them in Person so im guessing the first one is accurate but im still surprised they're different
I much prefer left joins to right joins, so let's write the first query as:
select p.id, bp.id
From (select distinct id
from tmpPerson
) bp left join
person p
on p.id = bp.id
where p.id is null;
(The preference is because the result set keeps all the rows in the first table rather than the last table. When reading the from clause, I immediately know what the first table is.)
The second is:
select id
from tmpPerson
where id not in (select id from person);
These are not equivalent for two reasons. The most likely reason in your case is that you have duplicate ids in tmpPerson. The first version removes the duplicates. The second doesn't. This is easily fixed by putting distincts in the right place.
The more subtle reason has to do with the semantics of not in. If any person.id has a NULL value, then all rows will be filtered out. I don't think that is the case with your query, but it is a difference.
I strongly recommend using not exists instead of not in for the reason just described:
select tp.id
from tmpPerson tp
where not exists (select 1 from person p where p.id = tp.id);
select id
from tmpPerson
where id not in (select id from person)
If there is a null id in tmp person then they will not be captured in this query. But in your first query they will be captured. So using an isnull will be resolve the issue
where isnull(id, 'N') not in (select id from person)

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

How to find the most frequent value in a select statement as a subquery?

I am trying to get the most frequent Zip_Code for the Location ID from table B. Table A(transaction) has one A.zip_code per Transaction but table B(Location) has multiple Zip_code for one area or City. I am trying to get the most frequent B.Zip_Code for the Account using Location_D that is present in both table.I have simplified my code and changed the names of the columns for easy understanding but this is the logic for my query I have so far.Any help would be appreciated. Thanks in advance.
Select
A.Account_Number,
A.Utility_Type,
A.Sum(usage),
A.Sum(Cost),
A.Zip_Code,
( select B.zip_Code from B where A.Location_ID= B.Location_ID having count(*)= max(count(B.Zip_Code)) as Location_Zip_Code,
A.Transaction_Date
From
Transaction_Table as A Left Join
Location Table as B On A.Location_ID= B.Location_ID
Group By
A.Account_Number,
A.Utility_Type,
A.Zip_Code,
A.Transaction_Date
This is what I come up with:
Select tt.Account_Number, tt.Utility_Type, Sum(tt.usage), Sum(tt.Cost),
tt.Zip_Code,
(select TOP 1 l.zip_Code
Location_Table l
where tt.Location_ID = l.Location_ID
group by l.zip_code
order by count(*) desc
) as Location_Zip_Code,
tt.Transaction_Date
From Transaction_Table tt
Group By tt.Account_Number, tt.Utility_Type, tt.Zip_Code, tt.Transaction_Date;
Notes:
Table aliases are a good thing. However, they should be abbreviations for the tables referenced, rather than arbitrary letters.
The table alias qualifies the column name, not the function. Hence sum(tt.usage) rather than tt.sum(usage).
There is no need for a join in the outer query. You are doing all the work in the subquery.
An order by with top seems the way to go to get the most common zip code (which, incidentally, is called the mode in statistics).

SQL, only if matching all foreign key values to return the record?

I have two tables
Table A
type_uid, allowed_type_uid
9,1
9,2
9,4
1,1
1,2
24,1
25,3
Table B
type_uid
1
2
From table A I need to return
9
1
Using a WHERE IN clause I can return
9
1
24
SELECT
TableA.type_uid
FROM
TableA
INNER JOIN
TableB
ON TableA.allowed_type_uid = TableB.type_uid
GROUP BY
TableA.type_uid
HAVING
COUNT(distinct TableB.type_uid) = (SELECT COUNT(distinct type_uid) FROM TableB)
Join the two tables togeter, so that you only have the records matching the types you are interested in.
Group the result set by TableA.type_uid.
Check that each group has the same number of allowed_type_uid values as exist in TableB.type_uid.
distinct is required only if there can be duplicate records in either table. If both tables are know to only have unique values, the distinct can be removed.
It should also be noted that as TableA grows in size, this type of query will quickly degrade in performance. This is because indexes are not actually much help here.
It can still be a useful structure, but not one where I'd recommend running the queries in real-time. Rather use it to create another persisted/cached result set, and use this only to refresh those results as/when needed.
Or a slightly cheaper version (resource wise):
SELECT
Data.type_uid
FROM
A AS Data
CROSS JOIN
B
LEFT JOIN
A
ON Data.type_uid = A.type_uid AND B.type_uid = A.allowed_type_uid
GROUP BY
Data.type_uid
HAVING
MIN(ISNULL(A.allowed_type_uid,-999)) != -999
Your explanation is not very clear. I think you want to get those type_uid's from table A where for all records in table B there is a matching A.Allowed_type_uid.
SELECT T2.type_uid
FROM (SELECT COUNT(*) as AllAllowedTypes FROM #B) as T1,
(SELECT #A.type_uid, COUNT(*) as AllowedTypes
FROM #A
INNER JOIN #B ON
#A.allowed_type_uid = #B.type_uid
GROUP BY #A.type_uid
) as T2
WHERE T1.AllAllowedTypes = T2.AllowedTypes
(Dems, you were faster than me :) )