Top N results grouped Oracle SQL - sql

I want to write a query that allows me to only get the specific data I want and nothing more.
We will use TV's as an example. I have three brands of TVs and I want to see the top ten selling models of each brand. I only want to return 30 rows. One solution is unions, but that can get messy fast. Ideally there would be a WHERE ROWNUM grouping by situation.
SELECT
A.Brand
, A.Model
, A.Sales
FROM
( SELECT
TV.Brand
, TV.Model
, SUM(TV.SALES) AS SALES
FROM TV_TABLE as TV
ORDER BY
TV.Brand
, SALES DESC
) A
WHERE ROWNUM <10
In my code above I will get the top 10 total results from the inner query, but not 10 from each Grouping.
What I want to see is something like this:
Brand: Model: Sales
Sony: x10: 20
Sony: X20: 18
Sony: X30: 10
VISIO: A40: 40
VISIO: A20: 10
This is an oversimplified example, in practice I'll need to have 20-50 gropings and would like to avoid downloading all of the data and using a Pivot feature.

select Brand, Model, SALES
from(
select Brand, Model, SALES,row_number()over(partition by Brand order by SALES desc) rn
from (
SELECT TV.Brand, TV.Model,SUM(TV.SALES) AS SALES,
FROM TV_TABLE as TV
group BY TV.Brand,TV.Model
)a
)b
where rn <= 10

SELECT TV.Brand, TV.Model, SUM(TV.SALES) AS SALES
FROM TV_TABLE TV
group by TV.Brand, TV.Model
order by SUM(TV.SALES) desc, TV.Brand
limit 30

Related

How to get a fraction of counters of subquery from different subqueries in one select?

I have a table with reviews for products. I want to sort product_ids that have more than 100 verified reviews(verified review is a review with verified_purshace=True) by the fraction of 5 star-reviews to all reviews. I tried to implement this in one select, but after numerous tries, I finish with the need to create views. I managed to write a query that counts a number of 5-star reviews, but can`t do better. Can anybody give me a hint?
My best query:
select *,count(*)
from (
select *
from reviews
where star_rating = 5
) low_reviews
left join (
select distinct filtered_reviews.product_id
from (
select *
from (
select verified_reviews.product_id, count(*) as verified_reviews_number
from (
select *
from reviews
where verified_purchase=True
) as verified_reviews
) as counted_verified_reviews
where counted_verified_reviews.verified_reviews_number > 100
) as filtered_reviews
) filtered_product_ids on low_reviews.product_id = filtered_product_ids.product_id;
Data example:
review_id customer_id product_id star_rating helpful_votes total_votes vine verified_purshase review_headline review_body review_date
14830128 R158AS05ZMH7VQ 0615349439 5 2 2 N false Planting a Church ... Witnessing To Dracula... 2011-02-14
I want to sort product_ids that have more than 100 verified reviews(verified review is a review with verified_purshace=True) by the fraction of 5 star-reviews to all reviews.
You don't provide sample data, but I would expect a query like this:
select product_id
from reviews
where verified_purchase
group by product_id
having count(*) > 100
order by avg( (review = 5)::int ) desc;
The expression avg( (review = 5)::int ) is a shorthand way of saying count(*) filter (where review = 5) * 1.0 / count(review). It works because it converts the expression review = 5 to an int, which is 1 for true and 0 for false. The average is the proportion of times when it is true.
Actually, the above assumes that you only care about review start ratings for verified purchases. If you want to include all reviews (even non-verified ones) for the ordering:
select product_id
from reviews
group by product_id
having count(*) filter (where verified_purchase) > 100
order by avg( (review = 5)::int ) desc;

How to select 1000 customers who were the first to gain 1000 bonus points for purchases in categories "Taxi" and "Books"? (SQLite)

The BONUS table has attributes: client_id, bonus_date, the number of accrued bonuses (bonus_cnt), mcc code of the transaction for which added bonuses (mcc_code). The MCC_CATEGORIES table is a mcc code reference.
Attributes:
mcc-code (mcc_code), category (for example, supermarkets, transport, pharmacies, etc., mcc_category)
How to select 1000 customers who were the first to gain 1000 bonus points for purchases in
categories "Taxi" and "Books"?
BONUS table looks like:
CLIENT_ID BONUS_DATE BONUS_CNT MCC_CODE
1121 2020-01-02 23 5432
3421 2020-04-15 7 654
...
MCC_CATEGORIES table looks like:
MCC_CODE MCC_CATEGORY
5432 Taxi
3532 Music
...
I would use window functions and aggregation: first join the tables and compute the running sum of bonus per user and category. Then aggregate by user and category, and get the date when they reached a bonus of 1000. Finally, compute the date when each user reached the target on both categories, order by that, and limit:
select client_id, max(bonus_date) bonus_date
from (
select client_id, mcc_category, min(bonus_date) bonus_date
from (
select b.client_id, b.bonus_date, c.mcc_category,
sum(bonus_cnt) over(partition by b.client_id, c.mcc_category order by b.bonus_date) sum_bonus
from bonus b
inner join mcc_categories c on c.mcc_code = b.mcc_code
where mcc_category in ('Taxi', 'Books')
) t
where sum_bonus >= 1000
group by client_id, mcc_category
) t
group by client_id
having count(*) = 2
order by bonus_date
limit 1000
Window functions are available in SQLite starting version 3.25.
How to select 1000 customers who were the first to gain 1000 bonus points for purchases in categories "Taxi" and "Books"?
I am guessing you want to combine the bonuses for the two categories together. If so:
select client_id, min(bonus_date) as min_bonus_date
from (select b.client_id, b.bonus_date, b.bonus_cnt,
sum(b.bonus_cnt) over (partition by b.client_id order by b.bonus_date) as running_bonus_cnt
from bonus b join
mcc_categories c
on c.mcc_code = b.mcc_code
where mcc_category in ('Taxi', 'Books')
) bc
where running_bonus_cnt >= 1000 and
running_bonus_cnt - bonus_cnt < 1000
group by client_id
order by min_bonus_date
limit 1000;
Note how this works. The subquery calculates the running bonus amount. The where clause then gets the one row where the bonus count first exceeds 1000.
The rest is just aggregation.

Using Count case

So I've been just re-familiarizing myself with SQL after some time away from it, and I am using Mode Analytics sample Data warehouse, where they have a dataset for SF police calls in 2014.
For reference, it's set up as this:
incident_num, category, descript, day_of_week, date, time, pd_district, Resolution, address, ID
What I am trying to do is figure out the total number of incidents for a category, and a new column of all the people who have been arrested. Ideally looking something like this
Category, Total_Incidents, Arrested
-------------------------------------
Battery 10 4
Murder 200 5
Something like that..
So far I've been trying this out:
SELECT category, COUNT (Resolution) AS Total_Incidents, (
Select COUNT (resolution)
from tutorial.sf_crime_incidents_2014_01
where Resolution like '%ARREST%') AS Arrested
from tutorial.sf_crime_incidents_2014_01
group by 1
order by 2 desc
That returns the total amount of incidents correctly, but for the Arrested, it keeps printing out 9014 Arrest
Any idea what I am doing wrong?
The subquery is not correlated. It just selects the count of all rows. Add a condition, that checks for the category to be equal to that of the outer query.
SELECT o.category,
count(o.resolution) total_incidents,
(SELECT count(i.resolution)
FROM tutorial.sf_crime_incidents_2014_01 i
WHERE i.resolution LIKE '%ARREST%'
AND i.category = o.category) arrested
FROM tutorial.sf_crime_incidents_2014_01 o
GROUP BY 1
You could use this:
SELECT category,
COUNT(Resolution) AS Total_Incidents,
SUM(CASE WHEN Resolution LIKE '%ARREST%' THEN 1 END) AS Arrested
FROM tutorial.sf_crime_incidents_2014_01
GROUP BY category
ORDER BY 2 DESC;

SELECT TOP 10 rows

I have built an SQL Query that returns me the top 10 customers which have the highest outstanding. The oustanding is on product level (each product has its own outstanding).
Untill now everything works fine, my only problem is that if a certain customer has more then 1 product then the second product or more should be categorized under the same customer_id like in the second picture (because the first product that has the highest outstanding contagions the second product that may have a lower outstanding that the other 9 clients of top 10).
How can I modify my query in order to do that? Is it possible in SQL Server 2012?
My query is:
select top 10 CUSTOMER_ID
,S90T01_GROSS_EXPOSURE_THSD_EUR
,S90T01_COGNOS_PROD_NAME
,S90T01_DPD_C
,PREVIOUS_BUCKET_DPD_REP
,S90T01_BUCKET_DPD_REP
from [dbo].[DM_07MONTHLY_DATA]
where S90T01_CLIENT_SEGMENT = 'PI'
and YYYY_MM = '2017_01'
group by CUSTOMER_ID
,S90T01_GROSS_EXPOSURE_THSD_EUR
,S90T01_COGNOS_PROD_NAME
,S90T01_DPD_C
,PREVIOUS_BUCKET_DPD_REP
,S90T01_BUCKET_DPD_REP
order by S90T01_GROSS_EXPOSURE_THSD_EUR desc;
You need to calculate the top Customers first, then pull out all their products. You can do this with a Common Table Expression.
As you haven't provided any test data this is untested, but I think it will work for you:
with top10 as
(
select top 10 CUSTOMER_ID
,sum(S90T01_GROSS_EXPOSURE_THSD_EUR) as TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR
from [dbo].[DM_07MONTHLY_DATA]
where S90T01_CLIENT_SEGMENT = 'PI'
and YYYY_MM = '2017_01'
group by CUSTOMER_ID
order by TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR desc
)
select m.CUSTOMER_ID
,m.S90T01_GROSS_EXPOSURE_THSD_EUR
,m.S90T01_COGNOS_PROD_NAME
,m.S90T01_DPD_C
,m.PREVIOUS_BUCKET_DPD_REP
,m.S90T01_BUCKET_DPD_REP
from [dbo].[DM_07MONTHLY_DATA] m
join top10 t
on m.CUSTOMER_ID = t.CUSTOMER_ID
order by t.TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR desc
,m.S90T01_GROSS_EXPOSURE_THSD_EUR;

SQL to produce Top 10 and Other

Imagine I have a table showing the sales of Acme Widgets, and where they were sold. It's fairly easy to produce a report grouping sales by country. It's fairly easy to find the top 10. But what I'd like is to show the top 10, and then have a final row saying Other. E.g.,
Ctry | Sales
=============
GB | 100
US | 80
ES | 60
...
IT | 10
Other | 50
I've been searching for ages but can't seem to find any help which takes me beyond the standard top 10.
TIA
I tried some of the other solutions here, however they seem to be either slightly off, or the ordering wasn't quite right.
My attempt at a Microsoft SQL Server solution appears to work correctly:
SELECT Ctry, Sales FROM
(
SELECT TOP 2
Ctry,
SUM(Sales) AS Sales
FROM
Table1
GROUP BY
Ctry
ORDER BY
Sales DESC
) AS Q1
UNION ALL
SELECT
Ctry AS 'Other',
SUM(Sales) AS Sales
FROM
Table1
WHERE
Ctry NOT IN (SELECT TOP 2
Ctry
FROM
Table1
GROUP BY
Ctry
ORDER BY
SUM(Sales) DESC)
Note that in my example, I'm only using TOP 2 rather than TOP 10. This is simply due to my test data being rather more limited. You can easily substitute the 2 for a 10 in your own data.
Here's the SQL Script to create the table:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Table1](
[Ctry] [varchar](50) NOT NULL,
[Sales] [float] NOT NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
And my data looks like this:
GB 10
GB 21.2
GB 34
GB 16.75
US 10
US 11
US 56.43
FR 18.54
FR 98.58
WE 44.33
WE 11.54
WE 89.21
KR 10
PO 10
DE 10
Note that the query result is correctly ordered by the Sales value aggregate and not the alphabetic country code, and that the "Other" category is always last, even if it's Sales value aggregate would ordinarily push it to the top of the list.
I'm not saying this is the best (read: most optimal) solution, however, for the dataset that I provided it seems to work pretty well.
SELECT Ctry, sum(Sales) Sales
FROM (SELECT COALESCE(T2.Ctry, 'OTHER') Ctry, T1.Sales
FROM (SELECT Ctry, sum(Sales) Sales
FROM Table1
GROUP BY Ctry) T1
LEFT JOIN
(SELECT TOP 10 Ctry, sum(sales) Sales
FROM Table1
GROUP BY Ctry) T2
on T1.Ctry = T2.Ctry
) T
GROUP BY Ctry
The pure SQL solutions to this problem make multiple passes through the individual records more than once. The following solution only queries the data once, and uses a SQL ranking function, ROW_NUMBER() to determine if some results belong in the "Other" category. The ROW_NUMBER() function has been available in SQL Server since SQL Server 2008. In my database, this seems to have resulted in a more efficient query. Please note that the "Other" row will appear above some rows if the total of the "Other" sales exceeds the top 10. If this is not desired some adjustments would need to be made to this query:
SELECT CASE WHEN RowNumber > 10 THEN 'Other' ELSE Ctry END AS Ctry,
SUM(Sales) as Sales FROM
(
SELECT Ctry, SUM(Sales) as Sales,
ROW_NUMBER() OVER(ORDER BY SUM(Sales) DESC) AS RowNumber
FROM Table1 GROUP BY Ctry
) as AggregateQuery
GROUP BY CASE WHEN RowNumber > 10 THEN 'Other' ELSE Ctry END
ORDER BY SUM(Sales) DESC
Using a real analytics SQL engine, such as Apache Spark, you can use Common Table Expression with to do:
with t as (
select rank() over (order by sales desc) as r, sales,city
from DB
order by sales desc
)
select sales, city, r
from t where r <= 10
union
select sum(sales) as sales, "Other" as city, 11 as r
from t where r > 10
In pseudo SQL:
select top 10 order by sales
UNION
select 'Other',SUM(sales) where Ctry not in (select top 10 like above)
Union the top ten with an outer Join of the top ten with the table it self to aggregate the rest.
I don't have access to SQL here but I'll hazzard a guess:
select top (10) Ctry, sales from table1
union all
select 'other', sum(sales)
from table1
left outer join (select top (10) Ctry, sales from table1) as table2
on table2.Ctry = table2.Ctry
where table2.ctry = null
group by table1.Ctry
Of course if this is a rapidly changing top(10) then you either lock or maintain a copy of the top(10) for the duration of the query.
Have in mind that depending on your use (and database volume / restrictions) you can achieve the same results using application code (python, node, C#, java etc). Sure it will depend on your use-case but hey, it's possible.
I ended up doing this in C# for instance:
// Mockup Class that has a CATEGORY and it's VOLUME
class YourModel { string category; double volume; }
List<YourModel> groupedList = wholeList.Take (5).ToList ();
groupedList.Add (new YourModel()
{
category = "Others",
volume = tempChartData.Skip (5).Select (t => t.qtd).Sum ()
});
Disclaimer
I understand that this is a "SQL Only" tagged question, but there might be other people like me out there who can make use of the application layer instead of relying only on SQL to make it happen. I am just trying to show people other ways of doing the same thing, that might be helpful. Even if this gets downvoted to oblivion I know that someone will be happy to read this because they were taught to use each tool to it's best, and think "outside the box".