Find the most frequent data every year in SQL - sql

TableA :
Years Data
2000 A
2000 B
2000 C
2000 C
2000 D
2001 A
2001 B
2001 B
2002 B
2002 D
2002 D
I want to output:
Years Data
2000 C
2001 B
2002 D
My solution:
SELECT DISTINCT Years, Data
FROM
(
SELECT Years, Data, COUNT(*) AS _count
FROM TableA
GROUP BY Years, Data
) a1
ORDER BY Years, _count DESC
But it have a problem:
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
How do I correct my SQL code?

Assuming your database supports row_number(), you can do it like this:
SELECT Years, Data
FROM
(
SELECT Years,
Data,
ROW_NUMBER() OVER(PARTITION BY Years ORDER BY count(*) DESC) rn
FROM TableA
GROUP BY Years, Data
) x
WHERE rn = 1
ORDER BY Years, Data
See a live demo on rextester.

Try this:
select t.Years, t.[Data]
from (
select *, count(*) cnt
from TableA
group by years, [Data]
) t
left join (
select Years, max(cnt) maxCnt
from (
select *, count(*) cnt
from TableA
group by years, [Data]
) t
group by Years
) tt on t.Years = tt.Years -- tt is a view that gives you max count of each year
where t.cnt = tt.maxCnt -- you need `years`, `[Data]` that its count is max count
order by years;
SQL Fiddle Demo
Another way is to use rank() in a DBMS that supports it:
;with t as (
select *, count(*) cnt
from TableA
group by years, [Data]
), tt as (
select *, rank() over (partition by years order by cnt desc) rn
from t
)
select years, [Data]
from tt
where rn = 1
order by years;
SQL Fiddle Demo

If you're using oracle, you can use the function STATS_MODE
select years, stats_mode(data)
from tablet
group by years;
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions154.htm

Your error is "ORDER BY items must appear in the select list if SELECT DISTINCT is specified."
This means you have put something in the ORDER BY that is not in the SELECT. In this case, _count DESC is not in the SELECT statement
SELECT DISTINCT Years, Data, _count DESC
FROM
(
SELECT Years, Data, COUNT(*) AS _count
FROM TableA
GROUP BY Years, Data
) a1
ORDER BY Years, _count DESC

Related

select value based on max of other column

I have a few questions about a table I'm trying to make in Postgres.
The following table is my input:
id
area
count
function
1
100
20
living
1
200
30
industry
2
400
10
living
2
400
10
industry
2
400
20
education
3
150
1
industry
3
150
1
education
I want to group by id and get the dominant function based on max area. With summing up the rows for area and count. When area is equal it should be based on max count, when area and count is equal it should be based on prior function (i still have to decide if education is prior to industry or vice versa). So the result should be:
id
area
count
function
1
300
50
industry
2
1200
40
education
3
300
2
industry
I tried a lot of things and maybe it's easy, but i don't get it. Can someone help to get the right SQL?
One method uses row_number() and conditional aggregation:
select id, sum(area), sum(count),
max(function) over (filter where seqnum = 1) as function
from (select t.*,
row_number() over (partition by id order by area desc) as seqnum
from t
) t
group by id;
Another method uses ``distinct on`:
select id, sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
function
from t
order by id, area desc;
Use a scalar sub-query for "function".
select t.id, sum(t.area), sum(t.count),
(
select "function"
from the_table
where id = t.id
order by area desc, count desc, "function" desc
limit 1
) as "function"
from the_table as t
group by t.id order by t.id;
SQL Fiddle
you can use sum as window function:
select distinct on (t.id)
id,
sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
( select function from tbl_test where tbl_test.id = t.id order by count desc limit 1 ) as function
from tbl_test t
This is how you get the function for each group based on id:
select id, function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null;
(we ensure that no yt2 exists that would be of the same id but of higher areay)
This would work nicely, but you might have several max areas with different values. To cope with this isue, let's ensure that exactly one is chosen:
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id;
Now, let's join this to our main table;
select yourtable.id, sum(yourtable.area), sum(yourtable.count), t.function
from yourtable
join (
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id
) t
on yourtable.id = t.id
group by yourtable.id;

Sql query to find the row having max sum for a column

I have following 'Scores' table which has score of players in a specific year
Sid Name Score Year
1 John 500 2016
2 Kim 900 2015
3 Ren 300 2016
4 John 600 2015
5 Kim 200 2016
6 Ren 200 2016
Find the player who has scored maximum runs in 2016
I can find this using the below query
Select Name
from
( select Name
, sum(Score) as sumScore
from Scores
where year=2016
group
by Name
) sub
order
by sumScore desc
limit 1;
Ouput:
Ren
How can i find the same without using order by?
I tried below but it doesn't work as it can't refer sub in 2nd where clause and complains relation sub doesn't exist
select Name from(select Name,sum(Score) as sumScore from Scores
where year=2016 group by Name)sub where sumScore=(select max(sumScore) from sub)
One simple method uses window functions:
select s.*
from (select s.*, max(s.score) over (partition by year) as max_score
from scores s
where year = 2016
) s
where score = max_score;
You can try using correlated subquery
DEMO
select * from tablename a where score in
(select max(score) from tablename b where a.year=b.year and b.year=2016)
and a.year=2016
OR you can use window function row_number() like below
select * from
(
select *,row_number() over(partition by yr order by score desc) as rn from cte1
)a where rn=1 and yr=2016
OUTPUT:
id name score yr
1 John 500 2016
SELECT Scores.Name, SUM(Scores.Score)
FROM (
select Name,sum(Score) as sumScore, Years
from Scores
where Years=2016
group by Name, Years
)sub INNER JOIN Scores ON sub.Name = Scores.Name
GROUP BY Scores.Name
HAVING SUM(Scores.Score) = MAX(sub.sumScore)
You could also use common table expression in combination with dense rank
with cte as (
select *,
DENSE_RANK() OVER(ORDER BY score desc, year) rank
from demo
where year = 2016
)
select *
from cte
where rank = 1
Demo
Edit to get players with max score of 2016 you can tweak above query as
with cte as (
select name,year ,
DENSE_RANK() OVER(ORDER BY sum(score) desc, year) rank
from demo
where year = 2016
group by name,year
)
select *
from cte
where rank = 1
Demo

Selecting the max of a count column with a group by in spark sql

I have the following data:
yr char cnt
1 a 27
1 g 20
3 b 50
3 z 70
I like to get the year, only max count of cnt field. i.e,
yr char count
1 a 27
3 z 70
I tried to use a SQL like below:
SELECT yr, char, max(count(cnt)) as count
FROM view
GROUP BY yr,char
But it resulted in an error saying the max cannot be used with count in SparkSQL. How can I get the result I want as shown above?
This should work
sql("select a.yr, a.char, a.cnt from view a join (select yr, max(cnt) as cnt from view group by yr) b on a.yr = b.yr and b.cnt = a.cnt").show()
This would often be done using row_number():
select yr, char, cnt
from (select yr, char, count(*) as cnt,
row_number() over (partition by yr order by count(*) desc) as seqnum
from view
group by yr, char
) yc
where seqnum = 1;
Note: In the event of ties, this returns an arbitrary one of them. If you want all of them, use rank() or dense_rank().

SQL select rows given they were included in TOP in a given year

I'm trying to select previous years' data for the top-selling products from this year to see how they've done over time.
I currently have something like:
SELECT TOP 10 *
WHERE Table.Year = 2016
ORDER BY Table.Transactions DESC
which gets me the data for the current year, but I'm not sure how to pull data for that same top 10 in previous years, given that this year's top 10 likely differs from previous years' top 10.
I was wondering if there was a way to do something along the lines of:
SELECT *
WHERE Table.Year = 2015
AND (order ID was in Top 10 in 2016)
ORDER BY Table.Transactions DESC
Except clearly I have no idea what to put in place of the bracketed condition.
Any suggestions would be greatly appreciated. Thanks!
do you want this:
SELECT *
From Table
WHERE Table.Year = 2015
AND ID IN (
SELECT TOP 10 ID
FROM Table
WHERE Table.Year = 2016
ORDER BY Table.Transactions DESC
)
ORDER BY Table.Transactions DESC
Use window functions:
select t.*
from (select t.*,
row_number() over (partition by year order by transactions desc) as seqnum
from t
) t
where seqnum <= 10;
Another fun way is to use cross apply:
select t.*
from (select 2015 as yyyy union all select 2016) y cross apply
(select top 10 t.*
from t
where t.year = y.yyyy
order by transactions desc
) t;
Instead of listing out the years explicitly, you can use a subquery (select distinct year from t) y.
Note ID should be some unique value like a PK
SELECT *
FROM YourTable
WHERE YourTable.Year = 2015
AND ID IN
(SELECT TOP 10 ID
WHERE Table.Year = 2016
ORDER BY Table.Transactions DESC)
select *
from (select t.*
,dense_rank () over (partition by t.year order by t.transactions desc) as dr
from t
) t
where t.dr <= 10
order by t.year
,t.dr
;

SQL Server Group By Complex Query

In SQL Server, suppose we have a SALES_HISTORY table as below.
CustomerNo PurchaseDate ProductId
1 20120411 12
1 20120330 13
2 20120312 14
3 20120222 16
3 20120109 16
... and many records for each purchase of each customer...
How can I write the appropriate query for finding:
For each customer,
find the product he bought at MOST,
find the percentage of this product over all products he bought.
The result table must have columns like:
CustomerNo,
MostPurchasedProductId,
MostPurchasedProductPercentage
Assuming SQL Server 2005+, you can do the following:
;WITH CTE AS
(
SELECT *,
COUNT(*) OVER(PARTITION BY CustomerNo, ProductId) TotalProduct,
COUNT(*) OVER(PARTITION BY CustomerNo) Total
FROM YourTable
), CTE2 AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY CustomerNo
ORDER BY TotalProduct DESC)
FROM CTE
)
SELECT CustomerNo,
ProductId MostPurchasedProductId,
CAST(TotalProduct AS NUMERIC(16,2))/Total*100 MostPurchasedProductPercent
FROM CTE2
WHERE RN = 1
You still need to deal when you have more than one product as the most purchased one. Here is a sqlfiddle with a demo for you to try.
Could do a lot prettier, but it works:
with cte as(
select CustomerNo, ProductId, count(1) as c
from SALES_HISTORY
group by CustomerNo, ProductId)
select CustomerNo, ProductId as MostPurchasedProductId, (t.c * 1.0)/(select sum(c) from cte t2 where t.CustomerNo = t2.CustomerNo) as MostPurchasedProductPercentage
from cte t
where c = (select max(c) from cte t2 where t.CustomerNo = t2.CustomerNo)
SQL Fiddle