SQL sum over(partition) not subtracting negative values in SUM - sql

I have the following query which outputs a list of transactions per user - units spent and units earned - column 'Amount'.
I have managed to group this per user and do a running total - column 'Running_Total_Spend'.
However it is ADDING the negative 'Amount' values rather than subtracting them. Sp pretty sure it is the SUM part of query not working.
WITH cohort AS(
SELECT DISTINCT userID FROM events_live WHERE startDate = '2018-07-26' LIMIT 50),
my_events AS (
SELET events_live.* FROM events_live WHERE eventDate >= '2018-07-26')
SELECT cohort.userID,
my_events.eventDate,
my_events.eventTimestamp,
CASE
--spent resource outputs a negative value ---working
WHEN transactionVector = 'SPENT' THEN -abs(my_events.productAmount)
--earned resource outputs a positive value ---working
WHEN transactionVector = 'RECEIVED' THEN my_events.productAmount END AS Amount,
ROW_NUMBER() OVER (PARTITION BY cohort.userID ORDER BY cohort.userID, eventTimestamp asc) AS row,
--sum the values in column 'Amount' for this partition
--should sum positive and negative values ---NOT WORKING--converting negatives into positive
--------------------------------------------------
SUM(CASE WHEN my_events.productAmount >= 0 THEN my_events.productAmount
WHEN my_events.productAmount <0 THEN -abs(my_events.productAmount) end) OVER(PARTITION BY cohort.userID ORDER BY cohort.userID, eventTimestamp asc) AS Running_Total_Spend
---------------------------------------------------
FROM cohort
INNER JOIN my_events ON cohort.userID=my_events.userID
WHERE productName = 'COINS' AND transactionVector IN ('SPENT','RECEIVED')

I suspect you want that logic around transactionvector for the sum too as my_events.productamount seems to be always positive.
...
sum(CASE
WHEN transactionvector = 'SPENT' THEN
-my_events.productamount
WHEN transactionvector = 'RECEIVED' THEN
my_events.productamount
END) OVER (PARTITION BY cohort.userid
ORDER BY cohort.userid,
eventTimestamp) running_total_spend
...

Update your sum function to -
SUM(my_events.productAmount) OVER(PARTITION BY cohort.userID ORDER BY cohort.userID, eventTimestamp asc) AS Running_Total_Spend

Related

Use last value when current row is null , for PostgreSQL timeseries table

I come across a problem that I could not find an optimal solution. So the idea is to get the price at each given time for a list of products from a list of shops but because the price are registered at different time I get some nulls when grouping by time and also an array o values. Therefore it requires to couple of steps in order to obtain what I need. I am wondering if someone know a better, faster way to achieve this. Bellow is my initial PostgreSQL table of course this is just a snippet of it to get the idea:
Initial Table
Desired results (intermediate table and final one)
And bellow is the PostgreSQL sql code that give the result I want but it seems very costly:
SELECT times,
first_value(price_yami_egg) OVER (PARTITION BY partition_price_yami_egg order by time) as price_yami_egg
first_value(price_yami_salt) OVER (PARTITION BY partition_price_yami_salt order by time) as price_yami_salt
first_value(price_dobl_egg) OVER (PARTITION BY partition_price_dobl_egg order by time) as price_dobl_egg
first_value(price_dobl_salt) OVER (PARTITION BY partition_price_dobl_salt order by time) as price_dobl_salt
FROM(
SELECT time,
min(price_yami_egg) as price_yami_egg,
sum(case when min(price_yami_egg) is not null then 1 end) over (order by times) as partition_price_yami_egg
min(price_yami_salt) as price_yami_salt,
sum(case when min(price_yami_salt) is not null then 1 end) over (order by times) as partition_price_yami_salt
min(price_dobl_egg) as price_dobl_egg,
sum(case when min(price_dobl_egg) is not null then 1 end) over (order by times) as partition_price_dobl_egg
min(price_dobl_salt) as price_dobl_salt,
sum(case when min(price_dobl_salt) is not null then 1 end) over (order by times) as partition_price_dobl_salt
FROM (
SELECT "time" AS times,
CASE WHEN shop_name::text = 'yami'::text AND product_name::text = 'egg'::text THEN price END AS price_yami_egg
CASE WHEN shop_name::text = 'yami'::text AND product_name::text = 'salt'::text THEN price END AS price_yami_salt
CASE WHEN shop_name::text = 'dobl'::text AND product_name::text = 'egg'::text THEN price END AS price_dobl_egg
CASE WHEN shop_name::text = 'dobl'::text AND product_name::text = 'salt'::text THEN price END AS price_dobl_salt
FROM shop sh
) S
GROUP BY time
ORDER BY time) SS
Do you just want aggregation?
select time,
min(price) filter (where shop_name = 'Yami' and product_name = 'EGG'),
min(price) filter (where shop_name = 'Yami' and product_name = 'SALT'),
min(price) filter (where shop_name = 'Dobl' and product_name = 'EGG'),
min(price) filter (where shop_name = 'Dobl' and product_name = 'SALT')
from shop s
group by time;
If. your concern is NULL values in the result, then you can fill them in. This is a little tricky, but the idea is:
with t as (
select time,
min(price) filter (where shop_name = 'Yami' and product_name = 'EGG') as yami_egg,
min(price) filter (where shop_name = 'Yami' and product_name = 'SALT') as yami_salt,
min(price) filter (where shop_name = 'Dobl' and product_name = 'EGG') as dobl_egg,
min(price) filter (where shop_name = 'Dobl' and product_name = 'SALT') as dobl_salt
from shop s
group by time
)
select s.*,
max(yaml_egg) over (yaml_egg_grp) as imputed_yaml_egg,
max(yaml_salt) over (yaml_egg_grp) as imputed_yaml_salt,
max(dobl_egg) over (yaml_egg_grp) as imputed_dobl_egg,
max(dobl_salt) over (yaml_egg_grp) as imputed_dobl_salt
from (select s.*,
count(yaml_egg) over (order by time) as yaml_egg_grp,
count(yaml_salt) over (order by time) as yaml_egg_grp,
count(dobl_egg) over (order by time) as dobl_egg_grp,
count(dobl_salt) over (order by time) as dobl_salt_grp
from s
) s

How to get the difference between (multiple) two different rows?

I have a set of data containing some fields: month, customer_id, row_num (RANK), and verified_date.
The rank field indicates the first (1) and second (2) purchase of each customer. I would like to know the time difference between first and second purchase for each customer and show only its first month = month where row_num = 1.
https://i.ibb.co/PjJk5Y0/Capture.png
So my expected result is like below image:
https://i.ibb.co/y5Mww7k/Capture-2.png
I'm using StandardSQL in Google Bigquery.
row_num, verified_date
from table
GROUP BY 1, 2```
We can try using a pivot query here, aggregating by the customer_id:
SELECT
MAX(CASE WHEN row_num = 1 THEN month END) AS month,
customer_id,
1 AS row_num,
DATE_DIFF(MAX(CASE WHEN row_num = 2 THEN verified_date END),
MAX(CASE WHEN row_num = 1 THEN verified_date END), DAY) AS difference
FROM yourTable
GROUP BY
customer_id;

SQL - Rank monthly dataset high, medium, low

I have a table which includes the month, accountID and a set of application scores. I want to create a new column which either gives a 'high', 'medium' or 'low' for the top, middle and bottom 33% of the results each month.
If I use rank() I can order the application scores for a single month or the whole dataset but I'm unsure how to order it per month. Also, on my version of sql server percent_rank() does not work.
select
AccountID
, ApplicationScore
, rank() over (order by applicationscore asc) as Rank
from Table
I then know I need to put the rank() statement in a subquery and then use a case statement to apply the 'high', 'medium' or 'low'.
select
AccountID
, case when rank <= total/3 then 'low'
when rank > total/3 and rank <= (total/3)*2 then 'medium'
when rank > (total/3)*2 then 'high' end ApplicationScore
from (subquery) a
Ntile(3) worked very well
select
AccountID
, Monthstart
, ApplicationScore
, ntile(3) over (partition by monthstart order by applicationscore) Rank
from table
SQL Server may have something built in to handle your problem. But we can easily use a ratio of counts to find the three segments of your scores, for each month. The ratio we can use is the count, partitioned by month and ordered by score, divided by the count for the entire month.
WITH cte AS (
SELECT *,
1.0 * COUNT(*) OVER (PARTITION BY Month ORDER BY ApplicationScore) /
COUNT(*) OVER (PARTITION BY Month) AS cnt
FROM yourTable
)
SELECT
AccountID,
Month,
ApplicationScore,
CASE WHEN cnt < 0.34 THEN 'low'
WHEN cnt < 0.67 THEN 'medium'
ELSE 'high' END AS rank
FROM cte
ORDER BY
Month,
ApplicationScore DESC;
Demo

How can I add cumulative sum column?

I use SqlExpress
Following is the query using which I get the attached result.
SELECT ReceiptId, Date, Amount, Fine, [Transaction]
FROM (
SELECT ReceiptId, Date, Amount, 'DR' AS [Transaction]
FROM ReceiptCRDR
WHERE (Amount > 0)
UNION ALL
SELECT ReceiptId, Date, Amount, 'CR' AS [Transaction]
FROM ReceiptCR
WHERE (Amount > 0)
UNION ALL
SELECT strInvoiceNo AS ReceiptId, CONVERT(datetime, dtInvoiceDt, 103) AS Date, floatTotal AS Amount, 'DR' AS [Transaction]
FROM tblSellDetails
) AS t
ORDER BY Date
Result
want a new column which would show balance amount.
For example. 1 Row should show -2500, 2nd should -3900, 3rd should -700 and so on.
basically, it requires previous row' Account column's data and carry out calculation based on transaction type.
Sample Result
Well, that looks like SQL-Server , if you are using 2012+ , then use SUM() OVER() :
SELECT t.*,
SUM(CASE WHEN t.transactionType = 'DR'
THEN t.amount*-1
ELSE t.amount END)
OVER(PARTITION BY t.date ORDER BY t.receiptId,t.TransactionType DESC) as Cumulative_Col
FROM (YourQuery Here) t
This will SUM the value when its CR and the value*-1 when its DR
Right now I grouped by date, meaning each day will recalculate this column, if you want it for all time, replace the OVER() with this:
OVER(ORDER BY t.date,t.receiptId,t.TransactionType DESC) as Cumulative_Col
Also, I didn't understand why in the same date, for the same ReceiptId DR is calculated before CR , I've add it to the order by but if thats not what you want then explain the logic better.

Sum Until Value Reached - Teradata

In Teradata, I need a query to first identify all members in the MEM TABLE that currently have a negative balance, let's call that CUR_BAL. Then, for all of those members only, sum all transactions from the TRAN TABLE in order by date until the sum of those transactions is equal to the CUR_BAL.
Editing to add a third ADJ table that contains MEM_NBR, ADJ_DT and ADJ_AMT that need to be included in the running total in order to capture all of the records.
I would like the outcome to include the MEM.MEM_NBR, MEM.CUR_BAL, TRAN.TRAN_DATE OR ADJ.ADJ_DT (date associated with the transaction that resulted in the running total to equal CUR_BAL), MEM.LST_UPD_DT. I don't need to know if the balance is negative as a result of a transaction or adjustment, just the date that it went negative.
Thank you!
select
mem_nbr,
cur_bal,
tran_date,
tran_type
from (
select
a.mem_nbr,
a.cur_bal,
b.tran_date,
b.tran_type,
a.lst_upd_dt,
sum(b.tran_amt) over (partition by b.mem_nbr order by b.tran_date rows between unbounded preceding and current row) as cumulative_bal
from mem a
inner join (
select
mem_nbr,
tran_date,
tran_amt,
'Tran' as tran_type
from tran
union all
select
mem_nbr,
adj_date,
adj_amt,
'Adj' as tran_type
from adj
) b
on a.mem_nbr = b.mem_nbr
where a.cur_bal < 0
qualify cumulative_bal < 0
) z
qualify rank() over (partition by mem_nbr order by tran_date) = 1
The subquery picks up all instances where the cumulative balance is negative, then the outer query picks up the earliest instance of it. If you want the latest, add desc after tran_date in the final qualify line.