SQL query MAX(SUM(..)) [closed] - sql

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Table Structure:
Article(
model int(key),
year int(key),
author varchar(key),
num int)
num: number of articles wrote during the year
Find all the authors that each one of them in one year atleast wrote maximal number of articles (relative to all the other authors)
I tried:
SELECT author FROM Article,
(SELECT year,max(sumnum) s FROM
(SELECT year,author,SUM(num) sumnum FROM Article GROUP BY year,author)
GROUP BY year) AS B WHERE Article.year=B.year and Article.num=B.s;
Is this the right answer?
Thanks.

You might want to try a self-JOIN to get what you are looking for:
SELECT Main.author
FROM Article AS Main
INNER JOIN (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
) AS SumMain
ON SumMain.year = Main.year
AND SumMain.author = Main.author
GROUP BY Main.author
HAVING SUM(Main.num) = MAX(SumMain.sumnum)
;
This would guarantee (as it is ANSI) you are getting the MAX of the SUMmed nums and only bringing back results for what you need. Keep in mind I only JOINed on those two fields because of the information provided ... if you have a unique ID you can JOIN on, or you require more specificity to get a 1-to-1 match, adjust accordingly.
Depending on what DBMS you are using, it can be simplified one of two ways:
SELECT author
FROM (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
HAVING SUM(num) = MAX(sumnum)
) AS Main
;
Some DBMSes allow you to do multiple aggregate functions, and this could work there.
If your DBMS allows you to do OLAP functions, you can do something like this:
SELECT author
FROM (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
) AS Main
QUALIFY (
ROW_NUMBER() OVER (
PARTITION BY author
,year
ORDER BY sumnum DESC
) = 1
)
;
Which would limit the result set to only the highest sumnum, although you may need more parameters to handle things if you wanted the year to be involved (you are GROUPing by it, only reason I bring it up).
Hope this helps!

You mention for homework and a valid attempt, however incorrect.
This is under a premise (unclear since no sample data) that the model column is like an auto-increment, and there is only going to be one entry per author per year and never multiple records for the same author within the same year. Ex:
model year author num
===== ==== ====== ===
1 2013 A 15
2 2013 C 18
3 2013 X 17
4 2014 A 16
5 2014 B 12
6 2014 C 16
7 2014 X 18
8 2014 Y 18
So the result expected is highest article count in 2013 = 18 and would only return author "C". In 2014, highest article count is 18 and would return authors "X" and "Y"
First, get a query of what was the maximum number of articles written...
select
year,
max( num ) as ArticlesPerYear
from
Article
GROUP BY
year
This would give you one record per year, and the maximum number of articles published... so if you had data for years 2010-2014, you would at MOST have 5 records returned. Now, it is as simple as joining this to the original table that had the matching year and articles
select
A2.*
from
( select
year,
max( num ) as ArticlesPerYear
from
Article
GROUP BY
year ) PreQuery
JOIN Article A2
on PreQuery.Year = A2.Year
AND PreQuery.ArticlesPerYear = A2.num

I suggest a CTE
WITH maxyear AS
(SELECT year, max(num) AS max_articles
FROM article
GROUP BY year)
SELECT DISTINCT author
FROM article a
JOIN maxyear m
ON a.year=m.year AND a.num=m.max_articles;
and compare that in performance to a partition, which is another way
SELECT DISTINCT author FROM
(SELECT author, rank() AS r
OVER (PARTITION BY year ORDER BY num DESC)
FROM article) AS subq
WHERE r = 1;
I think some RDBMS will let you put HAVING rank()=1 on the subquery and then you don't need to nest queries.

Related

SQL Question : Technical question that requires you to answer How many members worked at IBM before working at Google? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 months ago.
Improve this question
Assume the following dataset:
member_id
company
Year_started
1
Apple
2001
1
IBM
2002
1
Oracle
2005
1
Microsoft
2010
2
IBM
2002
2
Microsoft
2004
2
Oracle
2008
Member 1, began work at IBM in 2002, moved to Oracle in 2005 and moved to Microsoft in 2010. Member 2, began workin gat IBM in 2002, moved to Microsoft in 2004 and then Moved to oracle in 2008. Assume that for each member in each year, there is only one company (cannot work at 2 different companies in the same year).
**Question: How many members ever worked at IBM prior to working at Oracle? **
How would you go about solving this? I tried a combination of CASE when's but am lost as to where else to go. Thanks.
...
..
I would phrase this using EXISTS:
SELECT DISTINCT member_id
FROM yourTable t1
WHERE company = 'Google' AND
EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.member_id = t1.member_id AND
t2.company = 'IBM' AND
t2.Year_started < t1.Year_started);
In plain English, the above query says to report every employee who worked at Google in some year, for which there is a record from an earlier year for the same employee who worked at IBM at that time.
I would go for querying the table twice and then join like so:
select count(*)
from (
select member_id, min(Year_started) y
from my_table
where company = 'IBM'
group by member_id
) i
join (
select member_id, max(Year_started) y
from my_table
where company = 'Oracle'
group by member_id
) o on i.member_id = o.member_id and i.y < o.y
Note the difference of min/max functions on the year column. This would yield a match for the use case Oracle-IBM-Oracle as well as IBM-Oracle-IBM.
Simply do a self join:
select count(distinct m1.member_id)
from members m1
join members m2
on m1.member_id = m2.member_id and m1.Year_started < m2.Year_started
where m1.company = 'IBM' and m2.company = 'Oracle';
More verbose, but using CTEs:
For each member_id, get the first year they worked at IBM (if any), and get the first year they worked at Oracle (again, if any).
"For each member_id" translates to GROUP BY member_id
"Get the first year..." translates to MIN( CASE WHEN "Company" = 'etc' THEN "Year_Started" END )
Filter those rows to rows where the first-year-at-IBM is less-than their first-year-at-Oracle.
Then simply get the COUNT(*) of those rows.
WITH ibmOracleYears AS (
SELECT
member_id,
MIN( CASE WHEN "Company" = 'IBM' THEN "Year_Started" END ) AS JoinedIbm,
MIN( CASE WHEN "Company" = 'Oracle' THEN "Year_Started" END ) AS JoinedOracle
FROM
yourTable
GROUP BY
member_id
),
workedAtIbmBeforeOracle AS (
SELECT
y.*
FROM
ibmOracleYears AS y
WHERE
y.JoinedIbm IS NOT NULL /* <-- This `IS NOT NULL` check isn't absolutely necessary, but I'm including it for clarity. */
AND
y.JoinedOracle IS NOT NULL
AND
y.JoinedIbm < y.JoinedOracle
)
SELECT
COUNT(*) AS "Number of members that worked at IBM before Oracle"
FROM
workedAtIbmBeforeOracle
But that query can be reduced down to this (if you don't mind anonymous expressions in HAVING clauses):
SELECT
COUNT(*) AS "Number of members that worked at IBM before Oracle"
FROM
(
SELECT
member_id
FROM
yourTable
GROUP BY
member_id
HAVING
MIN( CASE WHEN "Company" = 'IBM' THEN "Year_Started" END ) < MIN( CASE WHEN "Company" = 'Oracle' THEN "Year_Started" END )
) AS q
SQLFiddle of both examples.

Getting top 10 most popular within an array column using SQL UNNEST

I am working with a sample data set which gives the following result:
Continuing to work, I am now trying to get the top 10 Production Companies (based on "production_companies" field) that made the most number of movies in the most popular genre for a year.
The output
Rank | Production Company | Popular Genre | Movie Count
I thought breaking this down to getting the most popular genre for the year would be the 1st step with the following query:
select
genres.name AS _genre,
FROM
commons.movies m,
UNNEST(m.genres) as genres
WHERE
SUBSTR(m.release_date, 1, 4) = '2008'
GROUP BY
genres.name
ORDER BY
COUNT(genres.name) DESC
LIMIT
1
I have now go the output as 'Drama' being the most popular genre for the year 2008.
Answering the question to get the most popular prod company and their count has been a bit challenging and failing several times.
I have after several tries got to:
select
o_prd_cmp.name,
o_mov.title
from
commons.movies o_mov,
unnest(o_mov.genres) as o_gnr,
UNNEST(o_mov.production_companies) AS o_prd_cmp
where
SUBSTR(o_mov.release_date, 1, 4) = '2008'
AND o_gnr.name = (
select
genres.name AS _genre,
FROM
commons.movies m,
UNNEST(m.genres) as genres
WHERE
SUBSTR(m.release_date, 1, 4) = '2008'
GROUP BY
genres.name
ORDER BY
COUNT(genres.name) DESC
LIMIT
1
)
Any help with this is greatly appreciated.

SQL MAx function with multiple columns showing in apex?

image1 image2I am trying to write an sql function to show the year, playername, and ppg of the player with the highest ppg from each year in our database.
We have a Players table with all the stats, and a team table with stats linked to each season as a team total.
What I want to do is get the highest scorer from each season so:
2010: Jake 10ppg
2011: Jake 12 ppg
2012 Carl 13 ppq
Etc.
here is my current query
SELECT Year, PlayerName, MAX(PPG) AS PPG
FROM PLAYERS_T, TEAM_T
GROUP BY Year
ORDER BY PPG;
However this is not working, what do I need to do to make this work?
This should work, but will show duplicated record if same PPG. Dont know what is the use of Team table there
SQL DEMO
WITH PLAYERS_T as (
SELECT 2010 "Year", 'Jake' "PlayerName", 10 ppg
UNION
SELECT 2011 "Year", 'Jake' "PlayerName", 12 ppg
UNION
SELECT 2012 "Year", 'Carl' "PlayerName", 13 ppg
)
SELECT T1."Year", T1."PlayerName", T1.PPG
FROM PLAYERS_T T1
LEFT JOIN PLAYERS_T T2
ON T1."Year" = T2."Year"
AND T1.PPG < T2.PPG
WHERE T2."Year" IS NULL
OUTPUT
Try this one:
SELECT players_T.playername, players_T.ppg, players_T.year
FROM
(SELECT year, MAX(PPG) AS mx
FROM players_T
GROUP BY year) sub
INNER JOIN players_T ON sub.mx = players_T.ppg
WHERE sub.year = players_T.year
ORDER BY players_T.year
In the subquery, this finds the max ppg per year. Then we join with the players table on the ppg to find the player name. The result should be the player name, ppg and year together. Let me know what you find!
Edit: Need to include a WHERE clause for year

SQL (Subqueries)

Struggling to go the extra step with a SQL query I'd like to run.
I have a customer database with a Customer table with the date/time detail of when the customer joined and a transaction table with details of their transactions of the years
What I'd like to do is to Group by the Join Date (as Year) and count the number that joined in each year then in the next column I'd like to then count the number who have transacted in a specific year E.g. 2016 the current year. This way I can show customer retention over the years.
Both tables are linked by a customer URN, but I am struggling to get my head around the the most efficient way to show this. I can easily count and group the members by joined year and I can display the max dated transaction but I am struggling to bring the two together. I think I need to use sub queries and a left join but it's alluding me.
Example output column headers with data
Year_Joined = 2009
Joiner_Count = 10
Transact_in_2016 = 5
Where I am syntax-wise. I know this is no where near complete. As I need to group by DateJoined and then sub query the count of customers of have transacted in 2016?
SELECT Customer.URNCustomer,
MAX(YEAR(Customer.DateJoined)),
MAX(YEAR(Tran.TranDate)) As Latest_Tran,
FROM Mydatabase.dbo.Customer
LEFT JOIN Mydatabase.dbo.Tran
ON Tran.URNCustomer = Customer.URNCustomer
GROUP BY Customer.URNCustomer
ORDER BY Customer.URNCustomer
The best approach is to do the aggregation before doing the joins. You want to count two different things, so count them individually and them combine them.
The following uses full outer join. This handles the case where there are years with no new customers and years with no transactions:
select coalesce(c.yyyy, t.yyyy) as yyyy,
coalesce(c.numcustomers, 0) as numcustomers,
coalesce(t.numtransactions, 0) as numtransactions
from (select year(c.datejoined) as yyyy, count(*) as numcustomers
from Mydatabase.dbo.Customer c
group by year(c.datejoined)
) c full outer join
(select year(t.trandate) as yyyy, count(*) as numtransactions
from database.dbo.Tran t
group by year(t.trandate)
) t
on c.yyyy = t.yyyy;
You may want to try something like this:
SELECT YEAR(Customer.DateJoined),
COUNT( Customer.URNCustomer ),
COUNT( DISTINCT Tran.URNCustomer ) AS NO_ACTIVE_IN_2016
FROM Mydatabase.dbo.Customer
LEFT Mydatabase.dbo.Tran
ON Tran.URNCustomer = Customer.URNCustomer
AND YEAR(Tran.TranDate) = 2016
GROUP BY YEAR(Customer.DateJoined)

Oracle - group by of joined tables

I tried to look for an answer and I found more advices, but not anyone of them was helpful, so I'm trying to ask now.
I have two tables, one with distributors (columns: distributorid, name) and the second one with delivered products (columns: distributorid, productid, corruptcount, date) - the column corruptcount contains the number of corrupted deliveries. I need to select the first five distributors with the most corrupted deliveries in last two months. I need to select distributorid, name and sum of corruptcount, here is my query:
SELECT del.distributorid, d.name, SUM(del.corruptcount) AS corrupt
FROM distributor d, delivery del
WHERE d.distributorid = del.distributorid
AND d.distributorid IN
(SELECT distributorid
FROM (SELECT distributorid, SUM(corruptcount) AS corrupt
FROM delivery
WHERE storeid = 1
AND "date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE
AND ROWNUM <= 5
GROUP BY distributorid
ORDER BY corrupt DESC))
GROUP BY del.distributorid
But Oracle returns error message: "not a GROUP BY expression".And when I edit my query to this:
SELECT del.distributorid, d.name, del.corruptcount-- , SUM(del.corruptcount) AS corrupt
FROM distributor d, delivery del
WHERE d.distributorid = del.distributorid
AND d.distributorid IN
(SELECT distributorid
FROM (SELECT distributorid, SUM(corruptcount) AS corrupt
FROM delivery
WHERE storeid = 1
AND "date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE
AND ROWNUM <= 5
GROUP BY distributorid
ORDER BY corrupt DESC))
--GROUP BY del.distributorid
It's working as you expect and returns correct data:
1 IBM 10
2 DELL 0
2 DELL 1
2 DELL 6
3 HP 3
8 ACER 2
9 ASUS 1
I'd like to group this data. Where and why is my query wrong? Can you help please? Thank you very, very much.
I think the problem is just the d.name in the select list; you need to include it in the group by clause as well. Try this:
SELECT del.distributorid, d.name, SUM(del.corruptcount) AS corrupt
FROM distributor d join
delivery del
on d.distributorid = del.distributorid
WHERE d.distributorid IN
(SELECT distributorid
FROM delivery
WHERE storeid = 1 AND
"date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE AND
ROWNUM <= 5
GROUP BY distributorid
ORDER BY SUM(corruptcount) DESC
)
GROUP BY del.distributorid, d.name;
I also switched the query to using explicit join syntax with an on clause, instead of the outdated implicit join syntax using a condition in the where.
I also removed the additional layer of subquery. It is not really necessary.
EDIT:
"Why does d.name have to be included in the group by?" The easy answer is that SQL requires it because it does not know which value to include from the group. You could instead use min(d.name) in the select, for instance, and there would be no need to change the group by clause.
The real answer is a wee bit more complicated. The ANSI standard does actually permit the query as you wrote it. This is because id is (presumably) declared as a primary key on the table. When you group by a primary key (or unique key), then you can use other columns from the same table just as you did. Although ANSI supports this, most databases do not yet. So, the real reason is that Oracle doesn't support the ANSI standard functionality that would allow your query to work.