Querying revenue by percentile in googlesql - sql

I'm trying to group companies and their revenues by percentiles (>90% as Top, 50-90% as middle, < 50% as bottom, in googlesql.
Table format for revenue_table:
|company | product | revenue |
------------------------------
I'm trying out doing a cross join to split these companies up:
SELECT
company,
SUM(revenue) as revenue,
CASE
WHEN SUM(revenue) > Percentile90_Max THEN "Top"
WHEN SUM(revenue) >= Percentile50_Max THEN "Middle"
ELSE "Bottom"
END as percentile_section,
Percentile50_Max,
Percentile90_Max,
FROM revenue_table
CROSS JOIN
(SELECT
APPROX_QUANTILES(revenue,100)[offset(50)] As Percentile50_Max,
APPROX_QUANTILES(revenue,100)[offset(90)] As Percentile90_Max
FROM
(SELECT
company,
SUM(revenue) as revenue
FROM revenue_table
GROUP BY 1
)
)
GROUP BY 1,4,5
Order by 2 desc
The code above currently works, but gets broken once I change my main select statement to:
SELECT
company,
--SUM(revenue) as revenue,
CASE
WHEN SUM(revenue) > Percentile90_Max THEN "Top"
WHEN SUM(revenue) >= Percentile50_Max THEN "Middle"
ELSE "Bottom"
END as percentile_section,
--Percentile50_Max,
--Percentile90_Max,
... same code here
GROUP BY 1
Ideally my end result should just be Company + percentile_section.
How should I do this without doing another subquery? Or is subquery really the only way to go in terms of querying efficiency?
Thank you!

You should be able to calculate the quantiles as part of the aggregation, so no subquery should be necessary:
SELECT company, SUM(revenue) as revenue,
(CASE WHEN SUM(revenue) > (APPROX_QUANTILES(SUM(revenue), 100) OVER ())[offset(90)] THEN 'Top'
WHEN SUM(revenue) >= (APPROX_QUANTILES(SUM(revenue), 100) OVER ())[offset(50)] THEN 'Middle'
ELSE 'Bottom'
END) as percentile_section
FROM revenue_table
GROUP BY 1
Order by 2 desc

Related

SQL get top 3 values / bottom 3 values with group by and sum

I am working on a restaurant management system. There I have two tables
order_details(orderId,dishId,createdAt)
dishes(id,name,imageUrl)
My customer wants to see a report top 3 selling items / least selling 3 items by the month
For the moment I did something like this
SELECT
*
FROM
(SELECT
SUM(qty) AS qty,
order_details.dishId,
MONTHNAME(order_details.createdAt) AS mon,
dishes.name,
dishes.imageUrl
FROM
rms.order_details
INNER JOIN dishes ON order_details.dishId = dishes.id
GROUP BY order_details.dishId , MONTHNAME(order_details.createdAt)) t
ORDER BY t.qty
This gives me all the dishes sold count order by qty.
I have to manually filter max 3 records and reject the rest. There should be a SQL way of doing this. How do I do this in SQL?
You would use row_number() for this purpose. You don't specify the database you are using, so I am guessing at the appropriate date functions. I also assume that you mean a month within a year, so you need to take the year into account as well:
SELECT ym.*
FROM (SELECT YEAR(od.CreatedAt) as yyyy,
MONTH(od.createdAt) as mm,
SUM(qty) AS qty,
od.dishId, d.name, d.imageUrl,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_desc,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_asc
FROM rms.order_details od INNER JOIN
dishes d
ON od.dishId = d.id
GROUP BY YEAR(od.CreatedAt), MONTH(od.CreatedAt), od.dishId
) ym
WHERE seqnum_asc <= 3 OR
seqnum_desc <= 3;
Using the above info i used i combination of group by, order by and limit
as shown below. I hope this is what you are looking for
SELECT
t.qty,
t.dishId,
t.month,
d.name,
d.mageUrl
from
(
SELECT
od.dishId,
count(od.dishId) AS 'qty',
date_format(od.createdAt,'%Y-%m') as 'month'
FROM
rms.order_details od
group by date_format(od.createdAt,'%Y-%m'),od.dishId
order by qty desc
limit 3) t
join rms.dishes d on (t.dishId = d.id)

Combine and fuse the results of two SQL queries with UNION

I am writing two seperate SQL queries to get data for two different dates like so:
SELECT number, sum(sales) as sales, sum(discount) sa discount, sum(margin) as margin
FROM table_a
WHERE day = '2019-08-09'
GROUP BY number
SELECT number, sum(sales) as sales, sum(discount) sa discount, sum(margin) as margin
FROM table_a
WHERE day = '2018-08-10'
GROUP BY number
I tried fusing them like so to get the results for the same number in one row from two different dates:
SELECT number, sum(sales) as sales, sum(discount) sa discount, sum(margin) as margin, 0 as sales_n1, 0 as discount_n1, 0 as margin_n1
FROM table_a
WHERE day = '2019-08-09'
GROUP BY number
UNION
SELECT number, 0 as sales, 0 as discount, 0 as margin, sum(sales_n1) as sales_n1, sum(discount_n1) as discount_n1, sum(margin_n1) as margin_n1
FROM table_a
WHERE day = '2018-08-10'
GROUP BY number
But it didn't work as I get the rows for the first query with zeroes for the columns defined as zero followed by the columns of the second query in the same fashion.
How can I correct this to have the desired output ?
Use conditional aggregation:
SELECT number,
sum(case when day = '2019-08-09' then sales end) as sales_20190809,
sum(case when day = '2019-08-09' then discount end) sa discount, sum(margin) as margin_20190810,
sum(case when day = '2019-08-10' then sales end) as sales_20190809,
sum(case when day = '2019-08-10' then discount end) sa discount, sum(margin) as margin_20190810
FROM table_a
WHERE day IN ('2019-08-09', '2019-08-10')
GROUP BY number;
If you want the numbers in different rows (which you don't seem to), then use aggregation:
SELECT day, number, sum(sales) as sales, sum(discount) as discount, sum(margin) as margin
FROM table_a
WHERE day IN ('2019-08-09', '2019-08-10')
GROUP BY day, number

SQL - Rank monthly dataset high, medium, low

I have a table which includes the month, accountID and a set of application scores. I want to create a new column which either gives a 'high', 'medium' or 'low' for the top, middle and bottom 33% of the results each month.
If I use rank() I can order the application scores for a single month or the whole dataset but I'm unsure how to order it per month. Also, on my version of sql server percent_rank() does not work.
select
AccountID
, ApplicationScore
, rank() over (order by applicationscore asc) as Rank
from Table
I then know I need to put the rank() statement in a subquery and then use a case statement to apply the 'high', 'medium' or 'low'.
select
AccountID
, case when rank <= total/3 then 'low'
when rank > total/3 and rank <= (total/3)*2 then 'medium'
when rank > (total/3)*2 then 'high' end ApplicationScore
from (subquery) a
Ntile(3) worked very well
select
AccountID
, Monthstart
, ApplicationScore
, ntile(3) over (partition by monthstart order by applicationscore) Rank
from table
SQL Server may have something built in to handle your problem. But we can easily use a ratio of counts to find the three segments of your scores, for each month. The ratio we can use is the count, partitioned by month and ordered by score, divided by the count for the entire month.
WITH cte AS (
SELECT *,
1.0 * COUNT(*) OVER (PARTITION BY Month ORDER BY ApplicationScore) /
COUNT(*) OVER (PARTITION BY Month) AS cnt
FROM yourTable
)
SELECT
AccountID,
Month,
ApplicationScore,
CASE WHEN cnt < 0.34 THEN 'low'
WHEN cnt < 0.67 THEN 'medium'
ELSE 'high' END AS rank
FROM cte
ORDER BY
Month,
ApplicationScore DESC;
Demo

SQL - CASE WHEN query

I am trying to get from my table all invoices raised for those customers who spent more than 1000 over period of last 12 months. Below is my table just for two customers as example:
And my query:
SELECT
t.Customer, t.Invoice
FROM
(SELECT
CI.Customer, CI.Invoice, CI.Date,
SUM(CASE
WHEN CI.Date > DATEADD(month, -12, getdate())
THEN CI.Valuee
ELSE 0
END) as Net
FROM
CustomerInvoice CI
GROUP BY
CI.Customer, CISRV.Invoice, CISRV.Date) AS t
GROUP BY
t.Customer, t.Invoice
HAVING
SUM (t.Net) > 1000
As result I will get only invoice INV-341453 but I would like to show also invoices INV-346218 and INV-349065.
What I am doing wrong?
Use ANSI standard window functions:
select ci.*
from (select ci.*,
sum(ci.value) over (partition by ci.customer) as total_value
from CustomerInvoice CI
where CI.Date > DATEADD(month, -12, getdate())
) ci
where total_value > 1000;
By "all invoices", I assume you mean the ones in the past twelve months.
You could do this by using a grouped query to identify the customers who have gone over the 1000 threshold, and then display all invoices for those customers:
SELECT Customer, Invoice
FROM CustomerInvoice
WHERE Customer IN
(SELECT Customer
FROM CustomerInvoice
GROUP BY Customer
HAVING SUM(CASE WHEN CI.Date>DATEADD(month,-12,getdate()) THEN CI.Valuee ELSE 0 END) > 1000)

Query to get top product gainers by sales over previous week

I have a database table with three columns.
WeekNumber, ProductName, SalesCount
Sample data is shown in below table. I want top 10 gainers(by %) for week 26 over previous week i.e. week 25. The only condition is that the product should have sales count greater than 0 in both the weeks.
In the sample data B,C,D are the common products and C has the highest % gain.
Similarly, I will need top 10 losers also.
What I have tried till now is to make a inner join and get common products between two weeks. However, I am not able to get the top gainers logic.
The output should be like
Product PercentGain
C 400%
D 12.5%
B 10%
This will give you a generic answer, not just for any particular week:
select top 10 product , gain [gain%]
from
(
SELECT product, ((curr.salescount-prev.salescount)/prev.salescount)*100 gain
from
(select weeknumber, product, salescount from tbl) prev
JOIN
(select weeknumber, product, salescount from tbl) curr
on prev.weeknumber = curr.weeknumber - 1
AND prev.product = curr.product
where prev.salescount > 0 and curr.salescount > 0
)A
order by gain desc
If you are interested in weeks 25 and 26, then just add the condition below in the WHERE clause:
and prev.weeknumber = 25
If you are using SQL-Server 2012 (or newer), you could use the lag function to match "this" weeks sales with the previous week's. From there on, it's just some math:
SELECT TOP 10 product, sales/prev_sales - 1 AS gain
FROM (SELECT product,
sales,
LAG(sales) OVER (PARTITION BY product
ORDER BY weeknumber) AS prev_sales
FROM mytable) t
WHERE weeknumber = 26 AND
sales > 0 AND
prev_sales > 0 AND
sales > prev_sales
ORDER BY sales/prev_sales
this is the Query .
select top 10 product , gain [gain%]
from
(
SELECT curr.Product, ( (curr.Sales - prev.Sales ) *100)/prev.Sales gain
from
(select weeknumber, product, sales from ProductInfo where weeknumber = 25 ) prev
JOIN
(select weeknumber, product, sales from ProductInfo where weeknumber = 26 ) curr
on prev.product = curr.product
where prev.Sales > 0 and curr.Sales > 0
)A
order by gain desc