I have a table Tabl1 : id, name, country, year, medal.
how can I find the top 10 countries by the number of medals for each year in 1 request?
thanks:)
You haven't told us anything about your table schema or the data, so this is a guess!
Going to assume your medal column contains the qty of medals for each Id/name, so you just need to rank by the sum of medals. Something along the lines of:
select [year], country, [Rank] from (
select [year], country, Rank() over(partition by [year] order by Sum(medal) desc ) [Rank]
from Tabl1
group by [year],country
)x
where [Rank]<=10
order by [year], [Rank]
here you can get the top 10 countries in each year:
select * from
(
select country,year,count(*),row_number() over (order by count(*) desc) as rn
from table
group by country, year
) tt
where tt.rn < 11
the sub query groups the data per country and year and gives you count() of each group, but at the same time It sorts them per count(*) desc and gives the a row number per each group ( it happanes using row_number() window funcion) , so the country with the most medal in eacg year is on top and it gets row number = 1 in each group , you need top 10 , so you filter them tt.rn < 11 in the main query.
If you want 10 countries per year:
with data as (
select country, "year" as yr,
rank() over (partition by "year" order by count(*) desc) as rnk
from T
group by country, "year"
)
select yr as "year", country from data
where rnk <= 10
order by yr, rnk;
Note that if ties are possible this could return more than ten rows for any given year.
Related
I have a database (crimes) and I want to separate per year the top 3 districts with the most frequent amount of crimes in SQL. I have tried the following code but this just
sums the amount of crimes:
SELECT
year,
district,
CrimeID,
COUNT(*) OVER (PARTITION BY year)
FROM Crimes
You could do it like this in Oracle, if that helps (editing to add, it looks like you might be using SQL Server so I have added an alias to the derived table to make it work for that too):
SELECT
v.year,
v.district,
v.count
FROM (
SELECT
year,
district,
COUNT(*) AS count,
ROW_NUMBER() OVER (PARTITION BY year ORDER BY COUNT(*) DESC) AS rono
FROM crimes
GROUP BY year, district
) v
WHERE v.rono <= 3
ORDER BY v.year ASC, v.rono ASC
I have a sample dataframe below that is over 500k rows:
|year|name|text|id|
|2001|foog|ltgn|01|
|2001|goof|ltg4|02|
|2002|tggr|ltg5|03|
|2002|wwwe|ltg6|04|
|2004|frgr|ltg7|05|
|2004|ggtg|ltg8|06|
|2003|hhyy|lt9n|07|
|2003|jjuu|l2gn|08|
|2005|fotg|l3gn|09|
I want to use sql to select the most popular name for each of the year. ie: it returns me a dataframe that has only most popular name per year for all the years that it has in the 500k rows.
I can do this via 2 separate statements:
-- sql query that gives me the names
select count(1), name from table_name group by name, order by count(1) desc limit 1;
-- If i add in a year parameter -> i can get for that particular year
select count(1), name from table_name where year = '2001' group by name, order by count(1) desc limit 1;
However how do I merge the query into 1 sql such that it provides me with the data of just the most popular name for each year?
You can use aggregation and window functions:
select yn.*
from (select yn.*,
row_number() over (partition by year order by cnt desc) as seqnum
from (select year, name, count(*) as cnt
from table_name
group by year, name
) yn
) yn
where seqnum = 1;
The innermost subquery calculates the count for each name in each year. The middle subquery enumerates the names for each year based on the count, with the highest count getting 1. And the outer subquery filters to get only the name (per year) that has the highest count.
In most databases, you can simplify this to:
select yn.*
from (select year, name, count(*) as cnt,
row_number() over (partition by year order by count(*) desc as seqnum
from table_name
group by year, name
) yn
where seqnum = 1;
I have a vague recollection that SparcSQL doesn't allow this syntax.
I've a table that has this information:
And need to get the following information:
If the country of the same person name (in this case Artur) is different, then I need to sum the two values of quantity from the max date (in this case 04/10) and return both person (Artur) and the qty (15k)
If the country of the same person name (in this case Joseph) is the same, then I need only the first row of the max date available.
I'm really struguling as I'm not sure how to implement the logic into my code:
Select
table.person,
table.quantity
From
(
Select
table.date,
table.person,
table.country,
table.quantity,
ROW_NUMBER () over (
PARTITION by table.code, table.person
ORDER by table.date DESC
) AS rn
FROM
table
WHERE table.date >= DATE '{2020-04-10}' -5
) a
WHERE a.RN IN (1,2)
Is it possible to create a rule to sum rows 1 and 2 when country is different (Artur case) and only return row number 1 when the country is the same for a name (Joseph case)?
Use dense_rank() or max() as a window function:
select person, sum(quantity)
from (select t.*,
max(date) over (partition by person) as max_date
from t
) t
where date = max_date
group by person;
EDIT:
Hmmm . . . I think you might want one row per country per person on the max date. If so:
select person, sum(quantity)
from (select t.*,
row_number() over (partition by person, country order by date desc) as seqnum_pc,
rank() over (partition by person order by date desc) as seqnum_p
from t
) t
where seqnum_p = 1 and seqnum_pc = 1
group by person;
Customer have ordered from different cities. Thus we have multiple cities against same customer_id. I want to display that city against customer id which has occurred maximum number of times , in case where customer has ordered same number of orders from multiple cities that city should be selected from where he has placed last order. I have tried something like
SELECT customer_id,delivery_city,COUNT(DISTINCT delivery_city)
FROM analytics.f_order
GROUP BY customer_id,delivery_city
HAVING COUNT(DISTINCT delivery_city) > 1
WITH cte as (
SELECT customer_id,
delivery_city,
COUNT(delivery_city) as city_count,
MAX(order_date) as last_order
FROM analytics.f_order
GROUP BY customer_id, delivery_city
), ranking as (
SELECT *, row_number() over (partition by customer_id
order by city_count DESC, last_order DESC) as rn
FROM cte
)
SELECT *
FROM ranking
WHERE rn = 1
select customer_id,
delivery_city,
amount
from
(
select t.*,
rank() over (partition by customer_id order by amount asc) as rank
from(
SELECT customer_id,
delivery_city,
COUNT(DISTINCT delivery_city) as amount
FROM analytics.f_order
GROUP BY customer_id,delivery_city
) t
)
where rank = 1
I have a column for group name and a column for amount spent.
I need to sum the amounts group them based on the group name and then grab the highest five. After that, I need to combine the the rest into it's own group w/ a total of their amount spent. This is what i have right now
SELECT groupName, SUM(amount) AS theAmountSpent
FROM purchases
GROUP BY groupName
ORDER BY theAmountSpent DESC
This groups and orders them, but i dont know how to then grab the remaining groups to combine them. Any help would be appreciated.
Alternate CTE-approach using row_number() (SQL Server 2005+):
WITH cte AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SUM(amount)) DESC) AS num,
groupName, SUM(amount) AS theAmountSpent
FROM purchases
GROUP BY groupName
)
SELECT groupName, theAmountSpent FROM cte WHERE num BETWEEN 1 AND 5 --top 5
UNION ALL
SELECT 'Sum rest', SUM(theAmountSpent) FROM cte WHERE num > 5 -- sum of rest
If I'm understanding you correctly, this should do it:
SELECT top 5 groupName, SUM(amount) AS theAmountSpent
into #tempSpent FROM purchases
GROUP BY groupName
ORDER BY theAmountSpent DESC
Select * from #tempSpent -- get the top 5
--get sum for the rest
SELECT SUM(amount) AS theAmountSpent
FROM purchases
where groupName not in (select groupName from #tempSpent)
Drop table #tempSpent
Another idea from Larsts code:
WITH cte
AS
(
SELECT case
when ROW_NUMBER() OVER (ORDER BY (SUM(amount)) DESC) <=5
then ROW_NUMBER() OVER (ORDER BY (SUM(amount)) DESC)
else 6 end AS num
, groupName
, SUM(amount) AS theAmountSpent
FROM purchases
GROUP BY groupName
)
SELECT num
, max(groupName)
, sum(theAmountSpent )
FROM cte
group by num