SQL - Remove duplicates after using a GROUP BY clause [duplicate] - sql

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 10 months ago.
Let's say I had two tables that looked like this:
Prod_SerialNo
Prod_TestOnAt
Prod_AccountNo
SN0001
2021-04-08
045678
SN0001
2021-01-14
067891
SN0001
2021-11-29
091234
SN0002
2022-01-19
045678
SN0002
2020-07-30
045678
SN0002
2022-03-30
012345
SN0003
2022-04-01
078912
SN0003
2022-02-19
089123
SN0003
2022-03-18
023456
S_AccountNo
S_AccountType
S_AccountName
012345
Homeowner
Adam Smith
023456
Homeowner
Jeremy Chan
034567
Manufacturer
Anne Hudson
045678
Distributor
Barney Jones
056789
Distributor
Jasmine Coleman
067891
Homeowner
Christian Lewis
078912
Distributor
Heather Ogden
089123
Homeowner
Stephen Gray
091234
Distributor
Antony Newman
The Prod Table tabulates specific product tests by what serial number was used, when the product was tested, and who tested it. (There are other things in the table, including a primary key not shown here)
The S Table is a list of subscribers with a variety of information about them. S_AccountNo is the parent to Prod_AccountNo.
I want to query when the last test was performed for each Serial Number and what account name it was that performed the test, but I don't want multiple results (duplicates) for the same serial number. I have tried the following code:
SELECT
Prod_SerialNo,
MAX(Prod_TestOnAt) AS "Last Time Tested",
S_AccountName
FROM Prod
INNER JOIN S ON S.S_AccountNo = Prod.Prod_AccountNo
GROUP BY Prod_SerialNo, S_AccountName
ORDER BY Prod_SerialNo
However, the query ends up outputting the same serial number on multiple rows even though I ask for the max TestOnAt date and I group by serial number. What am I getting wrong?

I think there is no need to use Group by you can get result with Row_Number like this:
SELECT
t.Prod_SerialNo,
t.Prod_TestOnAt AS "Last Time Tested",
t.S_AccountName
FROM (
SELECT
Prod_SerialNo,
Prod_TestOnAt,
S_AccountName,
ROW_NUMBER() OVER (PARTITION BY Prod_SerialNo ORDER BY Prod_TestOnAt DESC) rw
FROM Prod
INNER JOIN S ON S.S_AccountNo = Prod.Prod_AccountNo
) t
WHERE t.rw=1
ORDER BY t.Prod_SerialNo

You are grouping by Prod_SerialNo, S_AccountName so you will get duplicate Prod_SerialNo if multiple rows exist with that Prod_SerialNo and different S_AccountNames. You could do a MAX on Prod_TestOnAt and get that value with it's Prod_SerialNo, then join the result on the table to get your desired info using a subquery like so:
SELECT
p.[Prod_SerialNo],
max.[LastTimeTested],
s.[S_AccountName]
FROM PROD as p
INNER JOIN
(
SELECT
Prod_SerialNo,
MAX(Prod_TestOnAt) as [LastTimeTested]
FROM Prod
GROUP BY [Prod_SerialNo]
) as max
on max.[Prod_SerialNo] = p.[Prod_SerialNo] and max.[LastTimeTested] = p.[Prod_TestOnAt]
INNER JOIN S as s
ON s.[S_AccountNo] = p.[Prod_AccountNo]
ORDER BY p.[Prod_SerialNo]

If you don't like the solution using ROW_NUMBER an alternative is to use CROSS APPLY to identify the last Prod_TestOnAt and the associated Prod_AccountNo.
SELECT DISTINCT p.Prod_SerialNo, ca.Prod_TestOnAt, s.S_AccountName
FROM Prod p
CROSS APPLY (SELECT TOP 1 Prod_TestOnAt, Prod_AccountNo
FROM Prod
WHERE Prod_SerialNo = p.Prod_SerialNo
ORDER BY Prod_TestOnAt DESC) ca
INNER JOIN S ON S.S_AccountNo = ca.Prod_AccountNo

Related

Get max of sum during joining two tables

I want to get the subscriber that has maximum value of Bill (Total Bill).
I tried using the following script but SQL did not execute successflly.
Please help me on what I did wrong on this.
I have 2 tables:
Subscriber
FirstName
MIN
Ben
258999542
Reed
458524896
Steve
586692155
Clint
1007772121
Frank
1287548752
Jane
2345824215
Total Bill
Total
MIN
131.5
258999542
139.4
458524896
164
586692155
101
1007772121
224.12
1287548752
97.52
2345824215
And here's the code I tried:
SELECT MAX(B.Total), S.FirstName
FROM Subscriber AS S
JOIN Bill AS B ON S.MIN = B.MIN
It seems you just need TOP + ORDER BY:
SELECT TOP 1 B.Total, S.FirstName
FROM Subscriber AS S
JOIN Bill AS B ON S.MIN = B.MIN
ORDER BY B.Total DESC;
That's based on the fact that your sample data isn't showing multiple Bill records per Subscriber therefore you don't need a sum.

How to mix sql consults to make conditions to another one

I've the following tables
series_trailers:
ID EPISODEID CONTENT AUTHOR
-----------------------------
1 122383 url1 Peter
2 9999 url2 Ana
3 923822 stuff Jhon
4 122384 url3 Drake
series_episodes:
ID TITLE SERIESID
--------------------------------
122383 Episode 1 23
9999 Somethingweird 87
923822 Randomtitle 52
122384 Episode 2 23
series:
ID TITLE
-------------------
23 Stranger Things
87 Seriesname
512 Sometrashseries
As you can see there are three tables: one with the series info, one with the series' episodes and another one which contains urls that redirect to the episode's trailers. I'd like to get the lastest rows from series_trailers but without repeating the series where they're from.
I've tried with SELECT DISTINCT EPISODEID FROM series_trailers ORDER BY id DESCbut there are two rows with the same episodes' series so I'll get the seriies Stranger things twice. Summing up I'd like to display the lastest series with new urls but I don't want to get duplicated series (that's what i'd get with the sql above)
EDIT: What I'm supposed to get:
Last updated series:
Stranger Things
Seriesname
Sometrashseries
What I'd get with my sql code:
Stranger Things
Seriesname
Sometrashseries
Stranger Things (again)
If I understood correctly, here is the latest trailer for the latest episodes (latest as in the highest series ID / series_trailer ID, so most likely added lastest).
WITH MostRecentTrailers
AS (
SELECT MAX(st.ID) "TRAILERID"
,s.ID "SERIESID"
,s.TITLE "SERIESTITLE"
FROM series_trailers st
JOIN series_episodes se ON se.ID = st.EPISODEID
JOIN series s ON s.ID = se.SERIESID
GROUP BY s.ID
,s.TITLE
ORDER BY s.ID DESC
)
SELECT *
FROM MostRecentTrailers mrt
JOIN series_trailers st ON st.ID = mrt.TRAILERID
Let me know if that does it for ya.
Edit: Fixed some typo mistakes.
This gives you the trailer with the highest ID for each episode. This answer is based on the assumption that the episode with the highest ID is the latest one.
select id, content from series_trailer where episode_id in
(select max(id)
from series_episodes
group by seriesid)

finding duplicate rows with different IDs based on multiple columns

please forgive me if my jargon is off. I'm still learning!
I just started using Teradata, and to be honest has been a lot of fun. however, I have hit a road block that has stumped me for a while.
I successfully selected a table from a database that looks like:
ID service date name
1 service1 1/5/15 john
2 service2 1/7/15 steve
3 service3 1/8/15 lola
4 service4 1/3/15 joan
5 service5 1/5/15 fred
6 service3 1/3/15 joan
7 service5 1/8/15 oscar
Now I want to search the data base again to find any duplicate IDs (example: to see if service service1 with date 1/5/15 with name john exists on another row with a different ID.)
At first, I did something like this:
SELECT ID, service, date, name
FROM table
WHERE table.service = ANY(service1, service2, service3, service4, service5, service3, service5)
AND table.date = ANY('1/5/15', '1/7/15, '1/8/15', '1/3/15', '1/5/15', '1/3/15', '1/8/15')
AND table.name = ANY('john', 'steve', 'lola', 'joan', 'fred', 'joan', 'oscar');
But this is giving me more rows than I wanted.
example:
ID service date name
92 service3 1/8/15 steve
is of no use to me since I am looking for IDs that have the same combination of service, date, and name as of any of the other IDs in the above table.
something like this would be favorable:
ID service date name
609 service3 1/8/15 lola
since it matches than of ID 3.
I was curious to see if it were possible to treat the three columns (service, date, name) as a vector and maybe select the rows that match it that way?
ex
......
WHERE (table.service, table.date, table.name) = ANY((service3,1/8/15,lola), (service1, 1/5/15, john), ...etc)
My Teradata is down right now, So I have yet to try the above example. Nevertheless, any thoughts/feedback is greatly appreciated!
The following query may be what you are trying to achieve. This selects IDs for which the combination of service, date, and name appears more than once.
SELECT t1.ID
FROM yourTable t1
INNER JOIN
(
SELECT service, date, name
FROM yourTable
GROUP BY service, date, name
HAVING COUNT(*) > 1
) t2
ON t1.service = t2.service AND
t1.date = t2.date AND
t1.name = t2.name
This is a simple task for a Windowed Aggregate:
SELECT *
FROM tab
QUALIFY
COUNT(*) OVER (PARTITION BY service, date, name) > 1
This counts the number of rows with the same combination of values (like Tim Biegeleisen's Derived Table) but unlike a Standard Aggregate it keeps all rows. The QUALIFY is a nice Teradata syntax extension to avoid a Derived Table.
Don't hardcode values in your query unless you absolutely have to. Instead, take the query you already wrote and join to that.
SELECT dupes.*
FROM (your query) yourquery
JOIN table dupes
ON yourquery.service = dupes.service
AND yourquery.date = dupes.date
AND yourquery.name = dupes.name

DB2 Select from two tables when one table requires sum

In a DB2 Database, I want to do the following simple mathematics using a SQL query:
AvailableStock = SupplyStock - DemandStock
SupplyStock is stored in 1 table in 1 row, let's call this table the Supply table.
So the Supply table has this data:
ProductID | SupplyStock
---------------------
109 10
244 7 edit: exclude this product from the search
DemandStock is stored in a separate table Demand, where demand is logged as each customer logs demand during a customer order journey. Example data from the Demand table:
ProductID | DemandStock
------------------------
109 1
244 4 edit: exclude this product
109 6
109 2
So in our heads, if I want to calculate the AvailableStock for product '109', Supply is 10, Demand for product 109 totals to 9, and so Available stock is 1.
How do I do this in one select query in DB2 SQL?
The knowledge I have so far of some of the imagined steps in PseudoCode:
I select SupplyStock where product ID = '109'
I select sum(DemandStock) where product ID = '109'
I subtract SupplyStock from DemandStock
I present this as a resulting AvailableStock
The results will look like this:
Product ID | AvailableStock
109 9
I'd love to get this selected in one SQL select query.
Edit: I've since received an answer (that was almost perfect) and realised the question missed out some information.
This information:
We need to exclude data from products we don't want to select data for, and we also need to specifically select product 109.
My apologies, this was omitted from the original question.
I've since added a 'where' to select the product and this works for me. But for future sake, perhaps the answer should include this information too.
You do this using a join to bring the tables together and group by to aggregate the results of the join:
select s.ProductId, s.SupplyStock, sum(d.DemandStock),
(s.SupplyStock - sum(d.DemandStock)) as Available
from Supply s left join
Demand d
on s.ProductId = d.ProductId
where s.ProductId = 109
group by s.ProductId, s.SupplyStock;

GROUP BY and aggregate function query

I am looking at making a simple leader board for a time trial. A member may perform many time trials, but I only want for their fastest result to be displayed. My table columns are as follows:
Members { ID (PK), Forename, Surname }
TimeTrials { ID (PK), MemberID, Date, Time, Distance }
An example dataset would be:
Forename | Surname | Date | Time | Distance
Bill Smith 01-01-11 1.14 100
Dave Jones 04-09-11 2.33 100
Bill Smith 02-03-11 1.1 100
My resulting answer from the example above would be:
Forename | Surname | Date | Time | Distance
Bill Smith 02-03-11 1.1 100
Dave Jones 04-09-11 2.33 100
I have this so far, but access complains that I am not using Date as part of an aggregate function:
SELECT Members.Forename, Members.Surname, Min(TimeTrials.Time) AS MinOfTime, TimeTrials.Date
FROM Members
INNER JOIN TimeTrials ON Members.ID = TimeTrials.Member
GROUP BY Members.Forename, Members.Surname, TimeTrials.Distance
HAVING TimeTrials.Distance = 100
ORDER BY MIN(TimeTrials.Time);
IF I remove the Date from the SELECT the query works (without the date). I have tried using FIRST upon the TimeTrials.Date, but that will return the first date which is normally incorrect.
Obviously putting the Date as part of the GROUP BY would not return the result set that I am after.
Make this task easier on yourself by starting with a smaller piece of the problem. First get the minimum Time from TimeTrials for each combination of MemberID and Distance.
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance;
Assuming that SQL is correct, use it in a subquery which you join back to TimeTrials again.
SELECT tt2.*
FROM
TimeTrials AS tt2
INNER JOIN
(
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance
) AS sub
ON
tt2.MemberID = sub.MemberID
AND tt2.Distance = sub.Distance
AND tt2.Time = sub.MinOfTime
WHERE tt2.Distance = 100
ORDER BY tt2.Time;
Finally, you can join that query to Members to get Forename and Surname. Your question shows you already know how to do that, so I'll leave it for you. :-)