Aggregating data in a table up to the date of each row in the table - sql

If I run the following query, it results in 1.6M rows:
SELECT
customer_id,
credit_id,
date
FROM
wallet
WHERE
credit_title = 'Topups'
AND credit_type = 'Credit Card Topups'
AND day >= DATE '2017-11-06'
AND day <= DATE '2020-04-03'
I am now trying to, for each row in the above query, get the count and total amount of all credit transactions for that customer up to the date of the row. I have tried the below query (which is a join with itself), but this results in 1.3M rows. Why are rows being dropped in the join? credit_id is a unique identifier in this table.
SELECT
customer_id,
credit_id,
COALESCE(COUNT(wallet_agg.credit_id), 0) AS topup_count_to_date,
COALESCE(SUM(wallet_agg.credit_amt_usd), 0) AS topup_amount_to_date
FROM
wallet
LEFT JOIN
wallet AS wallet_agg
ON
wallet.customer_id = wallet_agg.customer_id
AND wallet_agg.date < wallet.date
WHERE
wallet.credit_title = 'Topups'
AND wallet.credit_type = 'Credit Card Topups'
AND wallet.day >= DATE '2017-11-06'
AND wallet.day <= DATE '2020-04-03'
AND wallet_agg.credit_title = 'Topups'
AND wallet_agg.credit_type = 'Credit Card Topups'
Here is a simple demo of what I am trying, which gets the result I am expecting. How is the logic of my more complex query above different?

Your JOIN is being turned into an inner join by the WHERE conditions. You need to move the conditions on the second table into the ON clause:
FROM wallet LEFT JOIN
wallet AS wallet_agg
ON wallet.customer_id = wallet_agg.customer_id AND
wallet_agg.date < wallet.date AND
wallet_agg.credit_title = 'Topups'
wallet_agg.credit_type = 'Credit Card Topups'
WHERE wallet.credit_title = 'Topups' AND
wallet.credit_type = 'Credit Card Topups' AND
wallet.day >= DATE '2017-11-06' AND
wallet.day <= DATE '2020-04-03'
Of course, aggregation is way over-kill for this problem. You should just use window functions:
SELECT w.*
FROM (SELECT w.customer_id, w.credit_id, w.date,
COUNT(*) OVER (PARTITION BY customer_id, credit_title, credit_type ORDER BY date) as topup_count_to_date,
SUM(amount) OVER (PARTITION BY customer_id, credit_title, credit_type ORDER BY date) as topup_amount_to_date
FROM wallet w
WHERE w.credit_title = 'Topups' AND
w.credit_type = 'Credit Card Topups'
) w
WHERE w.day >= DATE '2017-11-06' AND
w.day <= DATE '2020-04-03';

Related

Translating my query from using aggregate functions to using window functions

I have a table (wallet_credit_details) where each row is a single transaction made by a single customer. There is another table (wallet_usage_details) which shows how each user paid for these transactions. What I would like to do is - for each row (transaction) in the first table, aggregate (count and sum) values by the customer from the second table up to the date of the transaction in the first table.
An example of this would be for each transaction, count the number of previous transactions by the same customer which we paid for using "Topups".
SELECT
wallet.customer_id,
wallet.credit_id AS transaction_id,
COALESCE(COUNT(usage.debit_id), 0) AS transactions_paid_count_to_date,
COALESCE(SUM(usage.used_amt_usd), 0) AS transactions_paid_amount_to_date,
COALESCE(COUNT(CASE WHEN usage.credit_title = 'Topups' AND usage.credit_type = 'Credit Card Topups' THEN 1 END), 0) AS topups_used_count_to_date,
COALESCE(SUM(CASE WHEN usage.credit_title = 'Topups' AND usage.credit_type = 'Credit Card Topups' THEN usage.used_amt_usd END), 0) AS topups_used_amount_to_date
FROM
prod_dwh.wallet_credit_details AS wallet
LEFT JOIN
prod_dwh.wallet_usage_details AS usage
ON
wallet.customer_id = usage.user_id
AND usage.transaction_date < wallet.day
WHERE
wallet.credit_title = 'Topups'
AND wallet.credit_type = 'Credit Card Topups'
AND wallet.day >= DATE '{min_date}'
AND wallet.day <= DATE '{max_date}'
GROUP BY
wallet.customer_id,
wallet.credit_id
How can I reproduce the same logic using a window function?

SUM and Grouping by date and material

Still learning SQL forgive me.
I have 3 tables. a material table, a material_req table and a material_trans table. I want to group by material and then group columns by year.
so it would be [material, 2019, 2018, 2017, 2016, total (total being the total qty used for each material.
I have tried to place the date in the select statement, and grouped by the date also. but then the returned result is a lot of the same material with a lot of dates. I only need the year. maybe try the same and return just the year?
SELECT material_req.Material
-- , Material_Trans_Date
, SUM(-1 * material_trans.Quantity) AS 'TOTAL'
,Standard_Cost
FROM
Material_Req inner join Material_Trans
ON
Material_Req.Material_Req = Material_Trans.Material_Req
LEFt JOIN Material
ON
Material.Material = Material_Req.Material
WHERE
material_trans.Material_Trans_Date between '20180101' AND GETDATE()
-- Material_Trans_Date between '20180101' AND '20181231'
-- Material_Trans_Date between '20170101' AND '20171231'
-- Material_Trans_Date between '20160101' AND '20161231'
GROUP BY
material_req.Material ,Standard_Cost
ORDER BY
Material_Req.Material, Standard_Cost
expected results should by grouped by material, 2019, 2018, 2017,2016, Standard_Cost. the years column will have the sum of qty for each material for that year.
results look like this current_results
If you are using SQL Server then you might try this:
SELECT material_req.Material
, SUM(CASE WHEN DATEPART(YEAR, Material_Trans_Date) = '2019' THEN material_trans.Quantity ELSE 0 END) [2019 TOTAL]
, SUM(CASE WHEN DATEPART(YEAR, Material_Trans_Date) = '2018' THEN material_trans.Quantity ELSE 0 END) [2018 TOTAL]
,Standard_Cost
FROM
Material_Req inner join Material_Trans
ON
Material_Req.Material_Req = Material_Trans.Material_Req
LEFt JOIN Material
ON
Material.Material = Material_Req.Material
WHERE
material_trans.Material_Trans_Date between '20180101' AND GETDATE()
GROUP BY
material_req.Material ,Standard_Cost
ORDER BY
Material_Req.Material, Standard_Cost

Sum of 2 selects statement in sql with different where Clause

I am trying to get the sum of 2 columns in one table but with different where the condition, the only difference is the amount per department is calculated based on 17% Margin.
The Result should be the total revenue grouped by Event Name and Event ID.
for a sql Report, I have written 2 sql statements with different conditions and got the correct value for 2 columns but separately, i have summed both in a way but it was for one event.
SELECT EVT_ID, Event_Desc, Sum(Order_Total) as Total + (Select SUm(Order_Total *0.17) as Total from Orders Join Events EM On OrD.EVT_ID = EV.EVENTS_ID
where EVT_START_DATE between '2019-01-01' and '2019-01-31' Order_Department = 'FAB' )
From Orders Join Events EM On OrD.EVT_ID = EV.EVENTS_ID
where EVT_START_DATE between '2019-01-01' and '2019-01-31' Order_Department <> 'FAB'
Group by EVT_ID, Event_Desc
select EVT_ID, Event_Desc, sum(Total)as Total
from
(
SELECT EVT_ID, Event_Desc, Sum(Order_Total) as Total
From Orders
Join Events EM On OrD.EVT_ID = EV.EVENTS_ID
where EVT_START_DATE between '2019-01-01' and '2019-01-31' and Order_Department <> 'FAB'
Group by EVT_ID, Event_Desc
union
Select EVT_ID, Event_Desc, SUm(Order_Total *0.17) as Total
from Orders
Join Events EM On OrD.EVT_ID = EV.EVENTS_ID
where EVT_START_DATE between '2019-01-01' and '2019-01-31' and Order_Department = 'FAB' ) tbl
Group by EVT_ID, Event_Desc
OR
SELECT EVT_ID, Event_Desc, Sum(case when Order_Department = 'FAB' then Order_Total else Order_Total *0.17 end ) as Total
From Orders
Join Events EM On OrD.EVT_ID = EV.EVENTS_ID
where EVT_START_DATE between '2019-01-01' and '2019-01-31'
Group by EVT_ID, Event_Desc
If I followed you correctly, you could approach this with conditional aggregation. You can use a CASE construct within the SUM aggregate function to check to which departement the current record belongs to and do the computation accordingly.
SELECT
o.evt_id,
event_desc,
SUM(CASE
WHEN order_department = 'FAB' THEN order_total * 0.17
ELSE order_total END
) AS Total
FROM orders o
INNER JOIN events e On o.evt_id = e.events_id
WHERE evt_start_date BETWEEN '2019-01-01' and '2019-01-31'
GROUP BY
o.evt_id,
event_desc
NB: most columns in your query are not prefixed with a table alias, making it unclear from which table they come from. I added them when it was possible to make an educated guess from your sql code, and I would higly recommend that you add prefixes to all of the remaining.

Count of record, per day, by user comparative

Looking to see if there is a way to compare the number of orders entered a day and by either a specific user (EDI) or anyone else.
I can return results for per day (but only the days where a value exists) but can't figure out a way to combine all three together (Total - by EDI - by everyone else).
Any assistance greatly appreciated.
select Date, count(Order_ID)
from orders
WHERE Date >=dateadd(day,datediff(day,0,GetDate())- 7,0) and [user] = 'EDI'
and customer = '9686'
GROUP BY Date, [user];
select Date, count(Order_ID)
from orders
WHERE Date >=dateadd(day,datediff(day,0,GetDate())- 7,0) and [user] <> 'EDI'
and customer = '9686'
GROUP BY Date, [user];
select Date, count(Order_ID)
from orders
WHERE Date >=dateadd(day,datediff(day,0,GetDate())- 7,0)
and customer = '9686'
GROUP BY Date, [user];
Use conditional aggregation:
select Date, sum(case when [user] = 'EDI' then 1 else 0 end) as cnt_edi,
sum(case when [user] <> 'EDI' then 1 else 0 end) as cnt_non_edi,
count(*) as total
from orders
where Date >= dateadd(day, datediff(day, 0, GetDate()) - 7, 0) and customer = '9686'
group by Date;

Summing Two Columns from Two Tables with Two Dates

I work for a CPG company and need to create a report that compares the previous month's delivered units to the next month's forecast. (Simply, our forecasting tool screws up occasionally and this will help identify when the forecast is off.)
My issue is my SQL query is summing forecast sales correctly, but the sum of total delivered is not respecting the dates I have in my WHERE clause -- it's summing total delivered for as far back as the query can reach.
Here is my query:
SELECT
DelUnits.Customer, DelUnits.ObsText01,
FinalFcst.SKU, FinalFcst.Customer,
SUM(DelUnits.Value) AS TotalDelivered,
SUM(FinalFcst.FinalFcst) AS ForecastSales
FROM
DelUnits
LEFT JOIN
FinalFcst ON DelUnits.Customer = FinalFcst.Customer
WHERE
(FinalFcst.DT >= '2018-01-01' and FinalFcst.DT <= '2018-01-31')
AND (DelUnits.Date >= '2017-12-01' and DelUnits.Date <= '2017-12-31')
AND DelUnits.ObsText01 = '10_LB'
AND FinalFcst.SKU = '10_LB'
GROUP BY
DelUnits.Customer, DelUnits.ObsText01, FinalFcst.SKU, FinalFcst.Customer
Again, the query seems to work correctly for the final forecast (summing the forecast between 1/1/18 - 1/31/18) but sums the entire delivery history for a customer. I don't understand why it won't sum the delivery history for just 12/1/17 - 12/31/17.
Thank you for your help!
Presumably, there is only one row for FinalFcst. So, either include it in the GROUP BY clause or use MAX() instead of SUM():
max(FinalFcst.FinalFcst) as ForecastSales
One way to achieve this is to calculate TotalDelivered and ForecastSales in 2 different queries and then join them together.
Try this:
SELECT DelUnits.customer,
DelUnits.obstext01,
FinalFcst.sku,
FinalFcst.customer,
totaldelivered,
forecastsales
FROM (SELECT customer,
obstext01,
Sum(value) AS TotalDelivered
FROM delunits
WHERE date >= '2017-12-01'
AND date <= '2017-12-31'
AND obstext01 = '10_LB'
GROUP BY customer,
obstext01) DelUnits
LEFT JOIN (SELECT customer,
sku,
Sum(finalfcst) AS ForecastSales
FROM finalfcst
WHERE dt >= '2018-01-01'
AND dt <= '2018-01-31'
AND sku = '10_LB'
GROUP BY customer,
sku) FinalFcst ON DelUnits.customer = FinalFcst.customer
You have a many to many relationship between the tables. Ultimately you need to SUM() one table before joining to the other to create a one to many relationship, or you end up duplicating records.
My favorite approach is a derived table:
SELECT C.Customer,
C.ObsText01,
FC.SKU,
C.TotalDelivered,
SUM(FC.FinalFcst) ForecastSales
FROM (SELECT SUM(Value) TotalDelivered, Customer, ObsText01
FROM DelUnits
WHERE Date >= '2017-12-01' AND Date <= '2017-12-31'
AND ObsText01 = '10_LB'
GROUP BY Customer) C
LEFT JOIN FinalFcst FC ON C.Customer = FC.Customer
AND FC.DT >= '2018-01-01'
AND FC.DT <= '2018-01-31'
AND FC.SKU = '10_LB'
GROUP BY C.Customer, C.ObsText01, FC.SKU, C.TotalDelivered
A couple things: Added your forecast table filters to the join predicate, since having those in the WHERE will create an INNER JOIN out of your LEFT JOIN. Also removed FC.Customer from the select and the group since it is redundant with C.Customer.
Maybe you could try to create a temp table to calculate the delivery history. I am not sure of the SQL Server verbiage, but something like this:
WITH DEL_HIST AS
(SELECT DelUnits.Customer,
DelUnits.ObsText01,
sum(DelUnits.Value) as TotalDelivered,
FROM DelUnits
Where(DelUnits.Date >= '2017-12-01' and DelUnits.Date <= '2017-12-31')
and DelUnits.ObsText01 = '10_LB'
Group By DelUnits.Customer, DelUnits.ObsText01)
SELECT
DEL_HIST.Customer,
DEL_HIST.ObsText01,
FinalFcst.SKU,
FinalFcst.Customer,
DEL_HIST.TotalDelivered,
sum(FinalFcst.FinalFcst) as ForecastSales
FROM DEL_HIST
left join FinalFcst ON DelUnits.Customer = FinalFcst.Customer
Where (FinalFcst.DT >= '2018-01-01' and FinalFcst.DT <= '2018-01-31')
and FinalFcst.SKU = '10_LB'
Group By DelUnits.Customer, DelUnits.ObsText01, FinalFcst.SKU, FinalFcst.Customer