I have the following table:
/**
| NAME | DELTA (PAID - EXPECTED) | PERIOD |
|-------|-------------------------|--------|
| SMITH | -50| 1|
| SMITH | 0| 2|
| SMITH | 150| 3|
| SMITH | -200| 4|
| DOE | 300| 1|
| DOE | 0| 2|
| DOE | -200| 3|
| DOE | -200| 4|
**/
DROP TABLE delete_me;
CREATE TABLE delete_me (
"NAME" varchar(255),
"DELTA (PAID - EXPECTED)" numeric(15,2),
"PERIOD" Int
);
INSERT INTO delete_me("NAME", "DELTA (PAID - EXPECTED)", "PERIOD")
VALUES
('SMITH', -50, 1),
('SMITH', 0, 2),
('SMITH', 150, 3),
('SMITH', -200, 4),
('DOE', 300, 1),
('DOE', 0, 2),
('DOE', -200, 3),
('DOE', -200, 4)
Where period represents time, with 1 being the newest and 4 being the oldest. In each time period the person was charged and amount and they could pay off that amount or more. A negative delta means that they owe for that time period. A positive delta means that they paid over the expected amount and has a credit for that time period that can be applied to other time periods. If there's a credit we'd want to pay off the oldest time period first. I want to get how much unpaid debt is still looming for each time period.
So in the example above I'd want to see:
| NAME | DELTA (PAID - EXPECTED) | PERIOD | PERIOD BALANCE |
|-------|-------------------------|--------|----------------|
| SMITH | -50| 1| -50|
| SMITH | 0| 2| 0|
| SMITH | 150| 3| 0|
| SMITH | -200| 4| -50|
| DOE | 300| 1| 0|
| DOE | 0| 2| 0|
| DOE | -200| 3| -100|
| DOE | -200| 4| 0|
How can I use Postgres SQL to show the Unpaid debt within periods?
Additional description: for doe initially, in the oldest period, 200 was owed, the next period the owed the original 200 plus another 200 (400 total owed). In Period 2 the monthly charge was paid, but not the past balances. In the most recent period (1) 300 over the monthly amount was paid (200 of this was applied to the oldest debt in period 4, meaning it was paid off; leaving 100 to apply to period three's debt; and after applying the remaining 100, 100 was still owed).
For the Smith family initially in period 4 they underpaid 200. The next period they overpaid 150 for the month and this was applied to the oldest debt of 200, leaving 50 to still be paid. In period 2 the monthly bill was paid exactly, they still owed the 50 dollars from period 4. Then in period 1 they underpaid 50. They owe 100 in total, 50 for period 1 and 50 for period 4.
According to what I understood, you want to distribute the sum of positive DELTA values (credit) among the negative DELTA values starting from the oldest period.
with cte as (
Select Name_, DELTA, PERIOD_,
Sum(case when DELTA<0 then delta else 0 end)
Over (Partition By NAME_ Order By PERIOD_ desc) +
Sum(case when DELTA>0 then delta else 0 end)
Over (Partition By NAME_) as positive_credit_to_negativ_delta
From delete_me
)
Select Name_,DELTA,PERIOD_,positive_credit_to_negativ_delta,
case
when delta >= 0 then 0
else
case
when positive_credit_to_negativ_delta >= 0 then 0
else
greatest(delta , positive_credit_to_negativ_delta)
end
end as PERIOD_BALANCE
from cte
Order By NAME_,PERIOD_
See a demo from db-fiddle.
The idea in this query is to find the sum of all positive DELTA values for each user, then add that sum to the cumulative sum of the negative values starting from the oldest period. The result of this addition is stored in positive_credit_to_negativ_delta in the query.
Of course for DELTA with values >= 0, the result will be 0 since no debit for that period.
For negative DELTA values:
If the value of positive_credit_to_negativ_delta is >= 0 then the
result will be 0, that means the period delta is covered by the
positive credit.
If the value of positive_credit_to_negativ_delta
is < 0 then the result will be the max value from positive_credit_to_negativ_delta and DELTA.
The query below utilizes a recursive cte with several JSON structures. The usage of the latter allows accurate tracking of negative to positive balance intervals with possibly more than one potential positive balances after negative:
with recursive cte(n, l, p, js, js1) as (
select d1.name, d5.delta, d1.m, jsonb_build_object(d1.m, d5.delta), jsonb_build_array(d1.m)
from (select d.name, max(d.period) m from delete_me d where d.delta < 0 group by d.name) d1
join delete_me d5 on d5.name = d1.name and d5.period = d1.m
union all
select c.n, d.delta, d.period,
case when d.delta < 0 then c.js||jsonb_build_object(d.period, d.delta)
when d.delta = 0 then c.js
else (select jsonb_object_agg(k.v3, least((c.js -> k.v3::text)::int +
greatest(d.delta + coalesce((select sum((c.js -> v2.value::text)::int)
from jsonb_array_elements(c.js1) v2 where v2.value::int > k.v3),0),0),0))
from (select v.value::int v3 from jsonb_array_elements(c.js1) v
order by v.value::int desc) k)||jsonb_build_object(d.period, greatest(d.delta + (select sum((c.js -> v2.value::text)::int)
from jsonb_array_elements(c.js1) v2),0)) end,
case when d.delta < 0 then (case when c.l <= 0 then c.js1 else '[]'::jsonb end) || ('['||d.period||']')::jsonb
else c.js1 end
from cte c join delete_me d on d.period = c.p - 1 and d.name = c.n
)
select d.*, coalesce((c.js -> d.period::text)::int, 0) from delete_me d
join cte c on c.n = d.name where c.p = 1
order by d.name desc, d.period asc
Related
Hi my company wants to better tracks how many users are active on our platform. We are using Microsoft SQL Server 2019 as the Database, connected to the Azure Data Studio.
Below are two tables DDLs from our DB:
CALENDAR TABLE
COLUMN
DATA TYPE
DETAILS
CALENDAR_DATE
DATE NOT NULL
Base date (YYYY-MM-DD)
CALENDAR_YEAR
INTEGER NOT NULL
2010, 2011 etc
CALENDAR_MONTH_NUMBER
INTEGER NOT NULL
1-12
CALENDAR_MONTH_NAME
VARCHAR(100)
January, February etc
CALENDAR_DAY_OF_MONTH
INTEGER NOT NULL
1-31
CALENDAR_DAY_OF_WEEK
INTEGER NOT NULL
1-7
CALENDAR_DAY_NAME
INTEGER NOT NULL
Monday, Tuesday etc
CALENDAR_YEAR_MONTH
INTEGER NOT NULL,
201011, 201012, 201101 etc
REVENUE ANALYSIS
Column
Data Type
Details
ACTIVITY_DATE
DATE NOT NULL
Date Wager was made
MEMBER_ID
INTEGER NOT NULL
Unique Player identifier
GAME_ID
SMALLINT NOT NULL
Unique Game identifier
WAGER_AMOUNT
REAL NOT NULL
Total amount wagered on the game
NUMBER_OF_WAGERS
INTEGER NOT NULL
Number of wagers on the game
WIN_AMOUNT
REAL NOT NULL
Total amount won on the game
ACTIVITY_YEAR_MONTH
INTEGER NOT NULL
YYYYMM
BANK_TYPE_ID
SMALL INT DEFAULT 0 NOT NULL,
0=Real money, 1=Bonus money
Screenshot for both tables below:
CALENDAR TABLE
REVENUE ANALYSIS TABLE
Long story short "active" means that the member has made a minimum of one real money wager in the month.
Every month a member has a certain lifecycle type. This status will change on a monthly basis on their previous and current months activity. The statuses are the following:
NEW
First time they placed a real money wager
RETAINED
Active in the prior calendar month and the current calendar month
UNRETAINED
Active in the prior calendar month but not active in the current calendar month
REACTIVATED
Not active in the prior calendar month, but active in the current calendar month
LAPSED
Not active in the prior calendar month or the current calendar month
We would like initially to get to a view with the columns below:
MEMBER_ID |
CALENDAR_YEAR_MONT |
MEMBER_LIFECYCLE_STATUS |
LAPSED_MONTHS
Also the view should display one row per member per month, starting from the month in which they first placed a real money wager. This view should give their lifecycle status for that month, and if the member has lapsed, it should show a rolling count of the number of months since they were last active.
So far I have come up with the following CTE to give me a basis for the view. However I am not sure about the UNRETAINED and REACTIVATED columns. Any ideas anyone?
with all_activities as (
select a.member_id, activity_date, calendar_month_number as month_activity, calendar_year as year_activity,
datepart(month,CURRENT_TIMESTAMP) as current_month, datepart(year,CURRENT_TIMESTAMP) as current_year,
datepart(month,CONVERT(DATE, DATEADD(DAY,-DAY(GETDATE()),GETDATE()))) as previous_month, datepart(year,CONVERT(DATE, DATEADD(DAY,-DAY(GETDATE()),GETDATE()))) as year_last_month,
a.NUMBER_OF_WAGERS, (case when datepart(month,CURRENT_TIMESTAMP) = calendar_month_number and datepart(year,CURRENT_TIMESTAMP) = calendar_year then 'active' else 'inactive' end) as status,
case when (case when datepart(month,CURRENT_TIMESTAMP) = calendar_month_number and datepart(year,CURRENT_TIMESTAMP) = calendar_year then 'active' else 'inactive' end) = 'active' and number_of_wagers = 1 then 'New'
when (LAG((case when datepart(month,CURRENT_TIMESTAMP) = calendar_month_number and datepart(year,CURRENT_TIMESTAMP) = calendar_year then 'active' else 'inactive' end) ,1,0) OVER(PARTITION BY member_id ORDER BY calendar_month_number desc) = 'active' and calendar_month_number = datepart(month,CONVERT(DATE, DATEADD(DAY,-DAY(GETDATE()),GETDATE())))) then 'Retained'
when (calendar_month_number = datepart(month,CURRENT_TIMESTAMP) and year_activity = datepart(year,CURRENT_TIMESTAMP) and calendar_month_number = datepart(month,CONVERT(DATE, DATEADD(DAY,-DAY(GETDATE()),GETDATE())))) then 'Unretained'
from [dbo].[REVENUE_ANALYSIS] a
join CALENDAR b on a.ACTIVITY_DATE= b.CALENDAR_DATE
)
select * from all_activities
This is about customer lifecycle status analysis, which requires a couple of things:
customer acquisition date (it'll be nice to have this stored because some customers may go back to years or tens of years). For this question, we assume revenue_analysis has everthing we need and to calculate user acquisition month
lapsed vs churned: a churned customer is usually defined no activity for a period of time. For this question, we don't have the definition, thus, a user will be reported as lapsed forever.
For life cycle status calculation, we're going to gather the following (member_id, calendar_month, acquisition_month, activity_month, prior_activity_month), so that we can calculate the final result.
with cte_new_user_monthly as (
select member_id,
min(activity_year_month) as acquisition_month
from revenue_analysis
group by 1),
cte_user_monthly as (
select u.member_id,
u.acquisition_month,
m.yyyymm as calendar_month
from cte_new_user_monthly u,
calendar_month m
where u.acquisition_month <= m.yyyymm),
cte_user_activity_monthly as (
select f.member_id,
f.activity_year_month as activity_month
from revenue_analysis f
group by 1,2),
cte_user_lifecycle as (
select u.member_id,
u.calendar_month,
u.acquisition_month,
m.activity_month
from cte_user_monthly u
left
join cte_user_activity_monthly m
on u.member_id = m.member_id
and u.calendar_month = m.activity_month),
cte_user_status as (
select member_id,
calendar_month,
acquisition_month,
activity_month,
lag(activity_month,1) over (partition by member_id order by calendar_month) as prior_activity_month
from cte_user_lifecycle),
user_status_monthly as (
select member_id,
calendar_month,
activity_month,
case
when calendar_month = acquisition_month then 'NEW'
when prior_activity_month is not null and activity_month is not null then 'RETAINED'
when prior_activity_month is not null and activity_month is null then 'UNRETAINED'
when prior_activity_month is null and activity_month is not null then 'REACTIVATED'
when prior_activity_month is null and activity_month is null then 'LAPSED'
else null
end as user_status
from cte_user_status)
select member_id,
calendar_month,
activity_month,
user_status,
row_number() over (partition by member_id, user_status order by calendar_month) as months
from user_status_monthly
order by 1,2;
Result (include activity_month for easy understanding):
member_id|calendar_month|activity_month|user_status|months|
---------+--------------+--------------+-----------+------+
1001| 201701| 201701|NEW | 1|
1001| 201702| |UNRETAINED | 1|
1001| 201703| |LAPSED | 1|
1001| 201704| |LAPSED | 2|
1001| 201705| 201705|REACTIVATED| 1|
1001| 201706| 201706|RETAINED | 1|
1001| 201707| |UNRETAINED | 2|
1001| 201708| |LAPSED | 3|
1001| 201709| 201709|REACTIVATED| 2|
1001| 201710| |UNRETAINED | 3|
1001| 201711| |LAPSED | 4|
1001| 201712| 201712|REACTIVATED| 3|
1002| 201703| 201703|NEW | 1|
1002| 201704| |UNRETAINED | 1|
1002| 201705| |LAPSED | 1|
1002| 201706| |LAPSED | 2|
1002| 201707| |LAPSED | 3|
1002| 201708| |LAPSED | 4|
1002| 201709| |LAPSED | 5|
1002| 201710| |LAPSED | 6|
1002| 201711| |LAPSED | 7|
1002| 201712| |LAPSED | 8|
EDIT:
Codes tested in MySQL because I didn't notice 'mysql' tag was removed.
calendar_month in the code can be derived from the calendar dimension.
I want to query the table so that the running total is repeatedly
carrying over to the latest period as long as the value dose not fall to 0.
Assuming I have a table with values as such below:
Name
Period
Value
A
02/2022
2
A
03/2022
5
A
04/2022
3
A
05/2022
7
B
02/2022
9
B
04/2022
6
I want my result to be:
| Name | Period | Value|
| A| 02/2022| 2 |
| A| 03/2022| 7 |
| A| 04/2022| 10|
| A| 05/2022| 17|
| B| 02/2022| 9 |
| B| 03/2022| 9 |
| B| 04/2022| 15|
| B| 05/2022| 15|
My current query is:
SELECT
PERIOD
,NAME
,SUM(SUM(Value)) OVER (PARTITION BY NAME ORDER BY PERIOD) AS balance
FROM
table
WHERE Period < CURRENT_DATE()
GROUP BY
1
,2
This results in the value stopping at the latest period the activity occurred as such:
| Name | Period | Value|
| A | 02/2022| 2 |
| A | 03/2022| 7 |
| A | 04/2022| 10 |
| A | 05/2022| 17 |
| B | 02/2022| 9 |
| B | 04/2022| 15 |
OK, you haven't had an answer in a full day so even though I work in TSQL I'll try a solution that's ANSI SQL compatible. Work with me if my syntax is off a bit.
Before we start, check your "Desired Output", you're currently showing a running total for A on 3/22 of 5, but you had a 2 for A on 2/22 so it should be a running total of 7, right?
Anyway, assuming that's just a typo, I'd approach this by making a few CTEs that build a list of all {PERIOD, NAME} pairs you want reported, then JOIN your actual data to that. There are a number of ways to generate the dates, the easiest is to use DISTINCT if your actual data is fairly robust, but I can describe other methods if that assumption does not hold for your data.
So with all that in mind, here is my solution. I put your sample data in a CTE for portability, just replace my "cteTabA" with whatever your data table is really named
--Code sample data as a CTE for portability
;with cteTabA as (
SELECT *
FROM ( VALUES
('A', '02/2022', '2')
, ('A', '03/2022', '5')
, ('A', '04/2022', '3')
, ('A', '05/2022', '7')
, ('B', '02/2022', '9')
, ('B', '04/2022', '6')
) as TabA(Name, Period, Value)
) --END of sample data, actual query below
--First, build a list of periods to use. If your data set is full, just select DISTINCT
, cteDates as ( --but there are other ways if this doesn't work for you - let me know!
SELECT DISTINCT Period FROM cteTabA
) --Next, build a list of names to report on
, cteNames as (
SELECT DISTINCT Name FROM cteTabA
) --Now build your table that has all periods for all names
, cteRepOn as (
SELECT * FROM cteNames CROSS JOIN cteDates
)--Now assemble a table that has entries for each period for each name,
--but fill in zeroes for those you don't actually have data for
, cteFullList as (
SELECT L.*, COALESCE(D.Value, 0) as Value
FROM cteRepOn as L
LEFT OUTER JOIN cteTabA as D on L.Name = D.Name AND L.Period = D.Period
)--Now your query works as expected with the gaps filled in
SELECT PERIOD, NAME, Value
,SUM(Value) OVER (PARTITION BY NAME ORDER BY PERIOD) AS balance
FROM cteFullList
WHERE Period < '06/2022'--CURRENT_DATE()
ORDER BY NAME, PERIOD
This produces an output as follows
PERIOD
NAME
Value
balance
02/2022
A
2
2
03/2022
A
5
7
04/2022
A
3
10
05/2022
A
7
17
02/2022
B
9
9
03/2022
B
0
9
04/2022
B
6
15
05/2022
B
0
15
I'm encountering a bug in my SQL code that calculates the day-over-day (DoD) count difference. This table (curr_day) summarizes the count of trades on any business day (i.e. excluding weekends and government-mandated holidays) and is joined by a similar table (prev_day) that is day-lagged (previous day). The joining is based on the day's rank; for example the first day on the curr_day table is Jan-01 and it's rank is 1, the first day (rank 1) for the prev_day table is Dec-31.
My issue is that the trade count does not seem to calculate positive changes (see table below), only negative or no changes. This problem does not affect other fields that calculate the value of a trade, simply the amount of trades on a given day.
Sample of query
with curr_day as (select GROUP, COUNT from DB where DATE is not HOLIDAY),
prev_day as (select rank()over(partition by GROUP order by DATE) as RANK, GROUP, DATE, COUNT
from curr_day where DATE is not HOLIDAY)
select ID, DATE, curr_day.COUNT-prev_day.COUNT
from (select rank()over(partition by curr_day.GROUP order by curr_day.DATE) as RANK
from curr_day
where curr_day.DATE >= (select min(curr_day.DATE)+1) from curr_day)
left join prev_day on curr_day.RANK = prev_day.RANK and curr_day.GROUP = prev_day.GROUP)
;
Output table
Date | Group | Count | DoD_Cnt_Diff
2020-12-31 | A | 1 | 0
2021-01-01 | A | 1 | 0
2021-01-02 | A | 0 | -1
2021-01-03 | A | 1 | (null)
2021-01-04 | A | 0 | -1
2021-01-05 | A | 0 | 0
2021-12-31 | B | 0 | 0
I have the following Table charges
charges:
|Quantity|Timestamp| Charge|
|--------+---------+-------|
| 1|8/01/2020| Yearly|
| 2|8/01/2020|Monthly|
| 1|8/01/2020|Monthly|
| 2|8/02/2020| Yearly|
| 1|8/02/2020|Monthly|
| 1|8/02/2020|Monthly|
Using the following query gets my the counts by Charge and Date
Query:
select SUM(Quantity), Timestamp, Charge
from charges
group by Timestamp, Charge
Result:
|Sum|Timestamp| Charge|
|---+---------+-------|
| 1|8/01/2020| Yearly|
| 3|8/01/2020|Monthly|
| 2|8/02/2020| Yearly|
| 2|8/02/2020|Monthly|
Is there a way to transpose this to get the following?
Expected:
|Timestamp|Yearly|Monthly|
|---------+------+-------|
|8/01/2020| 1| 3|
|8/02/2020| 2| 2|
Thank you.
Use conditional aggregation:
select
timestamp,
sum(case when charge = 'Yearly' then charge else 0 end) yearly,
sum(case when charge = 'Monthly' then charge else 0 end) monthly
from charges
group by timestamp
Depending on your RDBMS you can use PIVOT function.
SELECT *
FROM (SELECT QUANTITY, TIMESTAMP, CHARGE
FROM CHARGES C) C PIVOT (SUM (QUANTITY)
FOR CHARGE
IN ('Yearly', 'Monthly'))
I'm stucking for a solution at the problem of finding daily profits from db (ms access) table. The difference wrt other tips I found online is that I don't have in the table a field "Price" and one "Cost", but a field "Type" which distinguish if it is a revenue "S" or a cost "C"
this is the table "Record"
| Date | Price | Quantity | Type |
-----------------------------------
|01/02 | 20 | 2 | C |
|01/02 | 10 | 1 | S |
|01/02 | 3 | 10 | S |
|01/02 | 5 | 2 | C |
|03/04 | 12 | 3 | C |
|03/03 | 200 | 1 | S |
|03/03 | 120 | 2 | C |
So far I tried different solutions like:
SELECT
(SELECT SUM (RS.Price* RS.Quantity)
FROM Record RS WHERE RS.Type='S' GROUP BY RS.Data
) as totalSales,
(SELECT SUM (RC.Price*RC.Quantity)
FROM Record RC WHERE RC.Type='C' GROUP BY RC.Date
) as totalLosses,
ROUND(totalSales-totaleLosses,2) as NetTotal,
R.Date
FROM RECORD R";
in my mind it could work but obviously it doesn't
and
SELECT RC.Data, ROUND(SUM (RC.Price*RC.QuantitY),2) as DailyLoss
INTO #DailyLosses
FROM Record RC
WHERE RC.Type='C' GROUP BY RC.Date
SELECT RS.Date, ROUND(SUM (RS.Price*RS.Quantity),2) as DailyRevenue
INTO #DailyRevenues
FROM Record RS
WHERE RS.Type='S'GROUP BY RS.Date
SELECT Date, DailyRevenue - DailyLoss as DailyProfit
FROM #DailyLosses dlos, #DailyRevenues drev
WHERE dlos.Date = drev.Date";
My problem beyond the correct syntax is the approach to this kind of problem
You can use grouping and conditional summing. Try this:
SELECT data.Date, data.Income - data.Cost as Profit
FROM (
SELECT Record.Date as Date,
SUM(IIF(Record.Type = 'S', Record.Price * Record.Quantity, 0)) as Income,
SUM(IIF(Record.Type = 'C', Record.Price * Record.Quantity, 0)) as Cost,
FROM Record
GROUP BY Record.Date
) data
In this case you first create a sub-query to get separate fields for Income and Cost, and then your outer query uses subtraction to get actual profit.