Sqlite substract sums (with group by) with JOIN and duplicates

Sqlite substract sums (with group by) with JOIN and duplicates - sql

I previously found the solution to my problem but unfortunately I lost files on my harddrive and I can't find the statement I managed to produce.
I have 2 tables T2REQ and T2STOCK, both have 2 columns (typeID and quantity) and my problem reside in the fact that I can have multiple occurences of SAME typeID in BOTH tables.
What I'm trying to do is SUM(QUANTITY) grouped by typeID and substract the values of T2STOCK from T2REQ but since I have multiple occurences of same typeID in both tables, the SUM I get is multiplied by the number of occurences of typeID.
Here's a sample of T2REQ (take typeID 11399 for example):
typeID quantity
---------- ----------
34 102900
35 10500
36 3220
37 840
11399 700
563 140
9848 140
11486 28
11688 700
11399 390
4393 130
9840 390
9842 390
11399 390
11483 19.5
11541 780
And this is a sample of T2STOCK table :
typeID quantity
---------- ----------
9842 1921
9848 2400
11399 1700
11475 165
11476 27
11478 28
11481 34
11483 122
11476 2
And this is where I'm at for now, I know that the SUM(t2stock.quantity) is affected (multiplied) because of the JOIN 1 = 1 but whatever I tried, I'm not doing it in the right order:
SELECT
t2req.typeID, sum(t2req.quantity), sum(t2stock.quantity),
sum(t2req.quantity) - sum(t2stock.quantity) as diff
FROM t2req JOIN t2stock ON t2req.typeID = t2stock.typeID
GROUP BY t2req.typeID
ORDER BY diff DESC;
typeID sum(t2req.quantity) sum(t2stock.quantity) diff
---------- ------------------- --------------------- ----------
563 140 30 110
11541 780 780 0
11486 28 40 -12
11483 19.5 122 -102.5
9840 390 1000 -610
40 260 940 -680
9842 390 1921 -1531
9848 140 2400 -2260
11399 1480 5100 -3620
39 650 7650 -7000
37 1230 116336 -115106
36 28570 967098 -938528
35 33770 2477820 -2444050
34 102900 2798355 -2695455
You can see that SUM(t2req) for typeID 11399 is correct : 1480
And you can see that the SUM(t2stock) for typeID 11399 is not correct : 5100 instead of 1700 (which is 5100 divided by 3, the number of occurences in t2req)
What would be the best way to avoid multiplications because of multiple typeIDs (in both tables) with the JOIN for my sum substract ?
Sorry for the wall of text, just trying to explain as best as I can since english is not my mother tongue.
Thanks a lot for your help.

You can aggregate before join:
SELECT
t2req.typeID,
t2req.quantity,
t2stock.quantity,
t2req.quantity - t2stock.quantity as diff
FROM
(SELECT TypeID, SUM(Quantity) Quantity FROM t2req GROUP BY TypeID) t2req JOIN
(SELECT TypeID, SUM(Quantity) Quantity FROM t2stock GROUP BY TypeID) t2stock
ON t2req.typeID = t2stock.typeID
ORDER BY diff DESC;
Fiddle sample: http://sqlfiddle.com/#!7/06711/5

You can't do this in a single aggregation:
SELECT
COALESCE(r.typeID, s.typeID) AS typeID,
COALESCE(r.quantity, 0) AS req_quantity,
COALESCE(s.quantity, 0) AS stock_quantity,
COALESCE(r.quantity, 0) - COALESCE(s.quantity, 0) AS diff
FROM (
SELECT rr.typeID, SUM(rr.quantity) AS quantity
FROM t2req rr
GROUP BY rr.typeID
) r
CROSS JOIN (
SELECT ss.typeID, SUM(ss.quantity) AS quantity
FROM t2stock ss
GROUP BY ss.typeID
) s ON r.typeID = s.typeID
ORDER BY 4 DESC;

Related

Sales amounts of the top n selling vendors by month with other fields in bigquery

i have a table in bigquery like this (260000 rows):
vendor date item_price discount_price
x 2021-07-08 23:41:10 451,5 0
y 2021-06-14 10:22:10 41,7 0
z 2020-01-03 13:41:12 74 4
s 2020-04-12 01:14:58 88 12
....
exactly what I want is to group this data by month and find the sum of the sales of only the top 20 vendors in that month. Expected output:
month vendor_name(top20) sum_of_vendor's_sales sum_of_vendor's_discount item_count(sold)
2020-01 x1 10857 250 150
2020-01 x2 9685 410 50
2020-01 x3 3574 140 45
....
2021 01 x20 700 15 20
2020-02 y1 7421 280 120
2020-02 y2 6500 250 40
2020-02 y3 4500 200 70
.....
2020-02 y20 900 70 30
i tried this (source here). But The desired output could not be obtained.
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors,count(item_price) as count_of_items,sum(discount_price)
from my_table
group by month
) t

Consider below approach
select
format_datetime('%Y%m', date) month,
vendor as vendor_name_top20,
sum(item_price) as sum_of_vendor_sales,
sum(discount_price) as sum_of_vendor_discount,
count(*) as item_count_sold
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sum_of_vendor_sales desc) <= 20

Cumulated Cohorts in SQL

I have the following table :
cohort
month cohort
orders
cumulated orders
2021-01
0
126
126
2021-01
1
5
131
2021-01
2
4
135
2021-02
0
131
131
2021-02
1
9
140
2021-02
2
8
148
And now I want to have the following table where I divide each repeat orders by the number of orders of month 0 :
cohort
month cohort
orders
cumulated orders
cumulated in %
2021-01
0
126
126
100%
2021-01
1
5
131
104%
2021-01
2
4
135
107%
2021-02
0
131
131
100%
2021-02
1
9
140
107%
2021-02
2
8
148
114%
My only hint is to create a CASE statement, but I don't want each month to update the query by adding the line
WHEN cohort="2021-08" THEN cumulated orders / 143
where 143 is the number of orders of cohort 2021-08 at month cohort =0
Has someone got an idea how to get this table ?

A case expression isn't needed. You can use first_value():
select t.*,
( cumulated_order /
first_value(orders) over (partition by cohort order by month_cohort)
) as ratio
from t;
If you really wanted a case, you could use:
select t.*,
( cumulated_order /
max(case when month_cohort = 0 then orders end) over (partition by cohort)
) as ratio
from t;

Consider below
select *,
round(100 * cumulated_orders /
sum(if(month_cohort = 0, orders, 0)) over(partition by cohort)
) as cumulated_in_percent
from `project.dataset.table`
if applied to sample data in your question - output is

Oracle Cumulative Calculation using Windows Analytical functions

I am new to Windows functions (lag, lead etc) and I am using Oracle. I did some research and tried few solutions but I couldnt get the desired result.
I have an inventory table, where I want to find out the items used and the remaining item left on the given day.
The data comes in as follows
Dates | Items | Total_Inv_Items | Damaged_Items | Sellable_Items | Sold_Items | Remaining_Items
11/13/2020 Pen 999 15 984 109 875
11/13/2020 Book 401 6 386 109 277
11/14/2020 Pen 0 0 0 121 0
11/14/2020 Book 0 0 0 121 0
11/15/2020 Pen 0 0 0 31 0
11/15/2020 Book 0 0 0 31 0
11/16/2020 Pen 201 3 198 33 165
11/16/2020 Book 301 5 296 33 263
Useable_Items and Remaing_Items are caluclated columns.
Sellable_Items = Total_Inv_Items - Damaged_Items
Remaning_Items = Sellable_Items - Sold_Items
Total_Inv_Items is manufacture or new items available in the store.
The desired output:
Dates | Items | Total_Inv_Items | Damaged_Items | Sellable_Items | Sold_Items | Remaining_Items
11/13/2020 Pen 999 15 984 109 875
11/13/2020 Book 401 6 386 109 277
11/14/2020 Pen 0 0 875 121 754
11/14/2020 Book 0 0 277 121 156
11/15/2020 Pen 0 0 754 31 721
11/15/2020 Book 0 0 156 31 125
11/16/2020 Pen 201 3 919 33 886
11/16/2020 Book 301 5 421 33 388
The remanining item is cummulative and the sellable items changes whenever there is a new item added to the inventory or it is same as previous days remaining item.
Query that gives me first data
SELECT
Date,
Items,
NVL(Total_Inv_Items, 0) as Total_Inv_Items,
NVL(Round(Total_Inv_Items * DamagePercentage/100), 0) as Damaged_Item,
Round(Total_Inv_Items - (Total_Inv_Items * DamagePercentage/100)) as Sellable_Items,
Sold_Items,
Round(Total_Inv_Items - (Total_Inv_Items * DamagePercentage/100))- Sold_Items as Remaining_Items
FROM
Table
Order By
Date
Note:
If you notice on Nov 16th there was new items that was available in the store so it adds to the previous days remaining items(Ex Pen 201-3+721).

You can use the lag function as follows:
SELECT Date,
Items,
Total_Inv_Items,
Damaged_Item,
-- FOLLOWING IS CALCULATED EXPRESSION BASED ON YOUR FORMULA: Pen 201-3+721
Total_Inv_Items
- Damaged_Item
+ COALESCE(LAG(Remaining_Items) OVER (PARTITION BY ITEMS ORDER BY DATE),0) AS Sellable_Items,
-- EXPRESSION ENDS HERE
Sold_Items,
Remaining_Items
FROM
(SELECT DISTINCT
Date,
Items,
NVL(Total_Inv_Items, 0) as Total_Inv_Items,
NVL(Round(Total_Inv_Items * DamagePercentage/100), 0) as Damaged_Item,
Round(Total_Inv_Items - (Total_Inv_Items * DamagePercentage/100)) as Sellable_Items,
Sold_Items,
Round(Total_Inv_Items - (Total_Inv_Items * DamagePercentage/100))- Sold_Items as Remaining_Items
FROM Table)
Order By Date
This expression should be placed in the outer query. The inner query should be your query which is giving the first data.

I think you just want cumulative sums and arithmetic:
select t.*,
sum(total_inv_items - damaged_items) over (partition by item order by date) as sellable_items,
sum(total_inv_items - damaged_items - sold_items) over (partition by item order by date) as remaining_items
from t
order by date;

Thanks #Gordon Linoff and #Tejash, your comments helped me to resolve this issue. The following code gave me the desired result.
NOTE: 1.5 is the pecentage of Damaged Items which I have included it directly in the query, most of the time its comes from the DB.
SELECT
Dates,Items,
Total_Inv_Items,
Damaged_Item,
CASE
WHEN Total_Inv_Items = 0 THEN LAG(Remaining_Items) OVER (partition by items order by dates)
ELSE Sellable_Items+NVL(LAG(Remaining_Items) OVER (partition by items order by dates),0)
END as Sellable_Items,
Sold_Items,
Remaining_Items
FROM
(
SELECT
Dates,Items,
Total_Inv_Items,
Damaged_Item,
Sellable_Items,
Sold_Items,
SUM(Sellable_Items - Sold_Items) OVER (partition by items order by dates) as Remaining_Items
FROM
(
SELECT
Dates,
Items,
NVL(Total_Inv_Items, 0) as Total_Inv_Items,
NVL(Round(Total_Inv_Items * 1.5/100), 0) as Damaged_Item,
Round(Total_Inv_Items - (Total_Inv_Items * 1.5/100)) as Sellable_Items,
Sold_Items,
Round(Total_Inv_Items - (Total_Inv_Items * 1.5/100))- Sold_Items as Remaining_Items
FROM
Table T
) A
)B
Order By
Dates

T-SQL Group by day date but i want show query full date

I want to show the date field can not group.
My Query:
SELECT DAY(T1.UI_CreateDate) AS DATEDAY, SUM(1) AS TOTALCOUNT
FROM mydb.dbo.LP_UseImpression T1 WHERE T1.UI_BR_BO_ID = 45
GROUP BY DAY(T1.UI_CreateDate)
Result:
DATEDAY TOTALCOUNT
----------- -----------
15 186
9 1
3 2
26 481
21 297
27 342
18 18
30 14
4 183
25 553
13 8
22 469
16 1
17 28
20 331
28 90
14 33
8 1
But i want to show the full date...
Example result:
DATEDAY TOTALCOUNT
----------- -----------
15/06/2015 186
9/06/2015 1
3/06/2015 2
26/06/2015 481
21/06/2015 297
27/06/2015 342
18/06/2015 18
30/06/2015 14
4/06/2015 183
25/06/2015 553
13/06/2015 8
22/06/2015 469
16/06/2015 1
17/06/2015 28
20/06/2015 331
28/06/2015 90
14/06/2015 33
8/06/2015 1
I want to see the results...
I could not get a kind of results...
How can I do?
Thanx!

How about just casting to date to remove any time component:
SELECT CAST(T1.UI_CreateDate as DATE) AS DATEDAY, COUNT(*) AS TOTALCOUNT
FROM mydb.dbo.LP_UseImpression T1
WHERE T1.UI_BR_BO_ID = 45
GROUP BY CAST(T1.UI_CreateDate as DATE)
ORDER BY DATEDAY;
SUM(1) for calculating the count does work. However, because SQL has the COUNT(*) function, it seems a bit awkward.

So you can group by DAY(T1.UI_CreateDate) or use full date for grouping. But these are different . As both these dates '2015-04-15' and '2015-12-15' result in same DAY value of 15.
Assuming you want to group on DAY rather than date please try the below version of query:
SELECT DISTINCT
T1.UI_CreateDate as DATEDAY,
count(1) over (PARTITION BY DAY(T1.UI_CreateDate) ) AS TOTALCOUNT
FROM mydb.dbo.LP_UseImpression T1 WHERE T1.UI_BR_BO_ID = 45
sql fiddle for demo: http://sqlfiddle.com/#!6/c3337/1

Totalling Values and Finding the Top x Values

Situation:
I have a database that contains sales records, each record has a ID (PK), ProductCode, Year, Month, SalesVolume.
As a user I will specify the year, so if I specify 1980, I will query the database for all the records that correspond to that year.
I am trying to create a query that will total the months SalesVolumes for the specified year, then pick the top 5 sales volume values.
What I have gathered so far is to somehow do the above, then to choose the top 5 sales volumes values, put them in descending order, and select the top 5, but that's as far as I have got
Note in a year there will be SalesVolumes for multiple products:
ID, Product Code, Year, Month, SalesVolume
23041 121 1980 1 21
23042 121 1980 2 960
23043 121 1980 3 939
23044 121 1980 4 927
23045 121 1980 5 931
23046 121 1980 6 950
23047 121 1980 7 975
23048 121 1980 8 994
23049 121 1980 9 994
23050 121 1980 10 968
23051 121 1980 11 918
23052 121 1980 12 854
23425 122 1980 1 1002
23426 122 1980 2 1032
23427 122 1980 3 1090
23428 122 1980 4 1062
23429 122 1980 5 1010
23430 122 1980 6 1103
23431 122 1980 7 1214
23432 122 1980 8 1122
23433 122 1980 9 1019
23434 122 1980 10 1181
23435 122 1980 11 1343
23436 122 1980 12 1180
Expected Output:
For 1980
Product Code, SalesVolume
121 , (total SalesVolume of the 12 months)
122 , (total SalesVolume of the 12 months)

Here's what you can do - total up the sales for each item by for a given year, rank them, and then select the top 5 ranking items.
SELECT *
FROM (
SELECT "Product Code",
SUM(SalesVolume) as total,
RANK() OVER (ORDER BY SUM(SalesVolume) DESC) as rnk
FROM YourTable
WHERE Year = 1980
GROUP BY "Product Code"
) A
WHERE rnk <= 5
If you want to do select the top 5 per year for all years, you can do that too:
SELECT *
FROM (
SELECT Year,
"Product Code",
SUM(SalesVolume) as total,
RANK() OVER (PARTITION BY Year ORDER BY SUM(SalesVolume) DESC) as rnk
FROM YourTable
GROUP BY Year, "Product Code"
) A
WHERE rnk <= 5

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sqlite substract sums (with group by) with JOIN and duplicates - sql

Related

Sales amounts of the top n selling vendors by month with other fields in bigquery

Cumulated Cohorts in SQL

Oracle Cumulative Calculation using Windows Analytical functions

T-SQL Group by day date but i want show query full date

Totalling Values and Finding the Top x Values

Categories

Resources