Totalling Values and Finding the Top x Values - sql

Situation:
I have a database that contains sales records, each record has a ID (PK), ProductCode, Year, Month, SalesVolume.
As a user I will specify the year, so if I specify 1980, I will query the database for all the records that correspond to that year.
I am trying to create a query that will total the months SalesVolumes for the specified year, then pick the top 5 sales volume values.
What I have gathered so far is to somehow do the above, then to choose the top 5 sales volumes values, put them in descending order, and select the top 5, but that's as far as I have got
Note in a year there will be SalesVolumes for multiple products:
ID, Product Code, Year, Month, SalesVolume
23041 121 1980 1 21
23042 121 1980 2 960
23043 121 1980 3 939
23044 121 1980 4 927
23045 121 1980 5 931
23046 121 1980 6 950
23047 121 1980 7 975
23048 121 1980 8 994
23049 121 1980 9 994
23050 121 1980 10 968
23051 121 1980 11 918
23052 121 1980 12 854
23425 122 1980 1 1002
23426 122 1980 2 1032
23427 122 1980 3 1090
23428 122 1980 4 1062
23429 122 1980 5 1010
23430 122 1980 6 1103
23431 122 1980 7 1214
23432 122 1980 8 1122
23433 122 1980 9 1019
23434 122 1980 10 1181
23435 122 1980 11 1343
23436 122 1980 12 1180
Expected Output:
For 1980
Product Code, SalesVolume
121 , (total SalesVolume of the 12 months)
122 , (total SalesVolume of the 12 months)

Here's what you can do - total up the sales for each item by for a given year, rank them, and then select the top 5 ranking items.
SELECT *
FROM (
SELECT "Product Code",
SUM(SalesVolume) as total,
RANK() OVER (ORDER BY SUM(SalesVolume) DESC) as rnk
FROM YourTable
WHERE Year = 1980
GROUP BY "Product Code"
) A
WHERE rnk <= 5
If you want to do select the top 5 per year for all years, you can do that too:
SELECT *
FROM (
SELECT Year,
"Product Code",
SUM(SalesVolume) as total,
RANK() OVER (PARTITION BY Year ORDER BY SUM(SalesVolume) DESC) as rnk
FROM YourTable
GROUP BY Year, "Product Code"
) A
WHERE rnk <= 5

Related

Sales amounts of the top n selling vendors by month with other fields in bigquery

i have a table in bigquery like this (260000 rows):
vendor date item_price discount_price
x 2021-07-08 23:41:10 451,5 0
y 2021-06-14 10:22:10 41,7 0
z 2020-01-03 13:41:12 74 4
s 2020-04-12 01:14:58 88 12
....
exactly what I want is to group this data by month and find the sum of the sales of only the top 20 vendors in that month. Expected output:
month vendor_name(top20) sum_of_vendor's_sales sum_of_vendor's_discount item_count(sold)
2020-01 x1 10857 250 150
2020-01 x2 9685 410 50
2020-01 x3 3574 140 45
....
2021 01 x20 700 15 20
2020-02 y1 7421 280 120
2020-02 y2 6500 250 40
2020-02 y3 4500 200 70
.....
2020-02 y20 900 70 30
i tried this (source here). But The desired output could not be obtained.
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors,count(item_price) as count_of_items,sum(discount_price)
from my_table
group by month
) t
Consider below approach
select
format_datetime('%Y%m', date) month,
vendor as vendor_name_top20,
sum(item_price) as sum_of_vendor_sales,
sum(discount_price) as sum_of_vendor_discount,
count(*) as item_count_sold
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sum_of_vendor_sales desc) <= 20

How to return the top 3 spending customers per country?

I'm trying to return the top 3 spending customers per country for a table like this:
customer_id
country
spend
159
China
45
152
China
8
159
China
21
160
China
6
161
China
9
162
China
93
152
China
3
168
Germany
91
169
Germany
101
170
Germany
38
171
Germany
17
154
Germany
11
154
Germany
50
167
Germany
63
168
Germany
1
153
Japan
7
163
Japan
58
164
Japan
44
153
Japan
19
164
Japan
10
165
Japan
15
166
Japan
24
153
Japan
105
I've tried the below code but it's not returning the correct results.
SELECT customer_id, country, spend FROM (SELECT customer_id, country, spend,
#country_rank := IF(#current_country = country, #country_rank + 1, 1)
AS country_rank,
#current_country := country
FROM table1
ORDER BY country ASC, spend DESC) ranked_rows
WHERE country_rank<=3;
Since some customers are also repeat customers, I want to make sure that it's the sum of spend per customer that's being taken into account.
You appear to be using MySQL. If you're running version 8 or later, then just use ROW_NUMBER() here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY country ORDER BY spend DESC) rn
FROM table1
)
SELECT customer_id, country, spend
FROM cte
WHERE rn <= 3;

Redshift SQL query help needed

Table order_details:
order_id dish_id category_id
----------------------------------
601 22 123
601 23 234
603 32 456
603 54 456
603 11 543
603 19 456
From the sample table provided above: how can I group the order_id,dish_id and category_id on the basis of distinct group with respect to each order_id?
The result should look like
order_id dish_id category_id count
---------------------------------------------
601 22 123 1
601 23 234 1
603 32 456 3
603 54 452 3
603 11 543 3
603 19 456 3
Note:
Like dish_id 22 in order_id 601 went along with 1 different distinct category_id i.e 234, and similarly in order_id 603 dish_id 32 went along 2 different distinct category_id ie 456, 543
If I assume that the triplets are unique, you seem to want 1 less than the count of the group. That would be:
select t.*,
(count(*) over (partition by order_id) - 1) as cnt
from t;

Sqlite substract sums (with group by) with JOIN and duplicates

I previously found the solution to my problem but unfortunately I lost files on my harddrive and I can't find the statement I managed to produce.
I have 2 tables T2REQ and T2STOCK, both have 2 columns (typeID and quantity) and my problem reside in the fact that I can have multiple occurences of SAME typeID in BOTH tables.
What I'm trying to do is SUM(QUANTITY) grouped by typeID and substract the values of T2STOCK from T2REQ but since I have multiple occurences of same typeID in both tables, the SUM I get is multiplied by the number of occurences of typeID.
Here's a sample of T2REQ (take typeID 11399 for example):
typeID quantity
---------- ----------
34 102900
35 10500
36 3220
37 840
11399 700
563 140
9848 140
11486 28
11688 700
11399 390
4393 130
9840 390
9842 390
11399 390
11483 19.5
11541 780
And this is a sample of T2STOCK table :
typeID quantity
---------- ----------
9842 1921
9848 2400
11399 1700
11475 165
11476 27
11478 28
11481 34
11483 122
11476 2
And this is where I'm at for now, I know that the SUM(t2stock.quantity) is affected (multiplied) because of the JOIN 1 = 1 but whatever I tried, I'm not doing it in the right order:
SELECT
t2req.typeID, sum(t2req.quantity), sum(t2stock.quantity),
sum(t2req.quantity) - sum(t2stock.quantity) as diff
FROM t2req JOIN t2stock ON t2req.typeID = t2stock.typeID
GROUP BY t2req.typeID
ORDER BY diff DESC;
typeID sum(t2req.quantity) sum(t2stock.quantity) diff
---------- ------------------- --------------------- ----------
563 140 30 110
11541 780 780 0
11486 28 40 -12
11483 19.5 122 -102.5
9840 390 1000 -610
40 260 940 -680
9842 390 1921 -1531
9848 140 2400 -2260
11399 1480 5100 -3620
39 650 7650 -7000
37 1230 116336 -115106
36 28570 967098 -938528
35 33770 2477820 -2444050
34 102900 2798355 -2695455
You can see that SUM(t2req) for typeID 11399 is correct : 1480
And you can see that the SUM(t2stock) for typeID 11399 is not correct : 5100 instead of 1700 (which is 5100 divided by 3, the number of occurences in t2req)
What would be the best way to avoid multiplications because of multiple typeIDs (in both tables) with the JOIN for my sum substract ?
Sorry for the wall of text, just trying to explain as best as I can since english is not my mother tongue.
Thanks a lot for your help.
You can aggregate before join:
SELECT
t2req.typeID,
t2req.quantity,
t2stock.quantity,
t2req.quantity - t2stock.quantity as diff
FROM
(SELECT TypeID, SUM(Quantity) Quantity FROM t2req GROUP BY TypeID) t2req JOIN
(SELECT TypeID, SUM(Quantity) Quantity FROM t2stock GROUP BY TypeID) t2stock
ON t2req.typeID = t2stock.typeID
ORDER BY diff DESC;
Fiddle sample: http://sqlfiddle.com/#!7/06711/5
You can't do this in a single aggregation:
SELECT
COALESCE(r.typeID, s.typeID) AS typeID,
COALESCE(r.quantity, 0) AS req_quantity,
COALESCE(s.quantity, 0) AS stock_quantity,
COALESCE(r.quantity, 0) - COALESCE(s.quantity, 0) AS diff
FROM (
SELECT rr.typeID, SUM(rr.quantity) AS quantity
FROM t2req rr
GROUP BY rr.typeID
) r
CROSS JOIN (
SELECT ss.typeID, SUM(ss.quantity) AS quantity
FROM t2stock ss
GROUP BY ss.typeID
) s ON r.typeID = s.typeID
ORDER BY 4 DESC;

SQL query self join

I am working on a query for a report in Oracle 10g.
I need to generate a short list of each course along with the number of times they were offered in the past year (including ones that weren't actually offered).
I created one query
SELECT coursenumber, count(datestart) AS Offered
FROM class
WHERE datestart BETWEEN (sysdate-365) AND sysdate
GROUP BY coursenumber;
Which produces
COURSENUMBER OFFERED
---- ----------
ST03 2
PD01 1
AY03 2
TB01 4
This query is all correct. However ideally I want it to list those along with COURSENUMBER HY and CS in the left column as well with 0 or null as the OFFERED value. I have a feeling this involves a join of sorts, but so far what I have tried doesn't produce the classes with nothing offered.
The table normally looks like
REFERENCE_NO DATESTART TIME TIME EID ROOMID COURSENUMBER
------------ --------- ---- ---- ---------- ---------- ----
256 03-MAR-11 0930 1100 2 2 PD01
257 03-MAY-11 0930 1100 12 7 PD01
258 18-MAY-11 1230 0100 12 7 PD01
259 24-OCT-11 1930 2015 6 2 CS01
260 17-JUN-11 1130 1300 6 4 CS01
261 25-MAY-11 1900 2000 13 6 HY01
262 25-MAY-11 1900 2000 13 6 HY01
263 04-APR-11 0930 1100 13 5 ST03
264 13-SEP-11 1930 2100 6 4 ST03
265 05-NOV-11 1930 2100 6 5 ST03
266 04-FEB-11 1430 1600 6 5 ST03
267 02-JAN-11 0630 0700 13 1 TB01
268 01-FEB-11 0630 0700 13 1 TB01
269 01-MAR-11 0630 0700 13 1 TB01
270 01-APR-11 0630 0700 13 1 TB01
271 01-MAY-11 0630 0700 13 1 TB01
272 14-MAR-11 0830 0915 4 3 AY03
273 19-APR-11 0930 1015 4 3 AY03
274 17-JUN-11 0830 0915 14 3 AY03
275 14-AUG-09 0930 1015 14 3 AY03
276 03-MAY-09 0830 0915 14 3 AY03
SELECT
coursenumber,
COUNT(CASE WHEN datestart BETWEEN (sysdate-365) AND sysdate THEN 1 END) AS Offered
FROM class
GROUP BY coursenumber;
So, as you can see, this particular problem doesn't need a join.
I think something like this should work for you, by just doing it as a subquery.
SELECT distinct c.coursenumber,
(SELECT COUNT(*)
FROM class
WHERE class.coursenumber = c.coursenumber
AND datestart BETWEEN (sysdate-365) AND sysdate
) AS Offered
FROM class c
I like jschoen's answer better for this particular case (when you want one and only one row and column out of the subquery for each row of the main query), but just to demonstrate another way to do it:
select t1.coursenumber, nvl(t2.cnt,0)
from class t1 left outer join (
select coursenumber, count(*) cnt
from class
where datestart between (sysdate-365) AND sysdate
group by coursenumber
) t2 on t1.coursenumber = t2.coursenumber