SQL Group by Transpose - sql

I have the following Table charges
charges:
|Quantity|Timestamp| Charge|
|--------+---------+-------|
| 1|8/01/2020| Yearly|
| 2|8/01/2020|Monthly|
| 1|8/01/2020|Monthly|
| 2|8/02/2020| Yearly|
| 1|8/02/2020|Monthly|
| 1|8/02/2020|Monthly|
Using the following query gets my the counts by Charge and Date
Query:
select SUM(Quantity), Timestamp, Charge
from charges
group by Timestamp, Charge
Result:
|Sum|Timestamp| Charge|
|---+---------+-------|
| 1|8/01/2020| Yearly|
| 3|8/01/2020|Monthly|
| 2|8/02/2020| Yearly|
| 2|8/02/2020|Monthly|
Is there a way to transpose this to get the following?
Expected:
|Timestamp|Yearly|Monthly|
|---------+------+-------|
|8/01/2020| 1| 3|
|8/02/2020| 2| 2|
Thank you.

Use conditional aggregation:
select
timestamp,
sum(case when charge = 'Yearly' then charge else 0 end) yearly,
sum(case when charge = 'Monthly' then charge else 0 end) monthly
from charges
group by timestamp

Depending on your RDBMS you can use PIVOT function.
SELECT *
FROM (SELECT QUANTITY, TIMESTAMP, CHARGE
FROM CHARGES C) C PIVOT (SUM (QUANTITY)
FOR CHARGE
IN ('Yearly', 'Monthly'))

Related

Customer life cycle status analysis based on monthly activity

Hi my company wants to better tracks how many users are active on our platform. We are using Microsoft SQL Server 2019 as the Database, connected to the Azure Data Studio.
Below are two tables DDLs from our DB:
CALENDAR TABLE
COLUMN
DATA TYPE
DETAILS
CALENDAR_DATE
DATE NOT NULL
Base date (YYYY-MM-DD)
CALENDAR_YEAR
INTEGER NOT NULL
2010, 2011 etc
CALENDAR_MONTH_NUMBER
INTEGER NOT NULL
1-12
CALENDAR_MONTH_NAME
VARCHAR(100)
January, February etc
CALENDAR_DAY_OF_MONTH
INTEGER NOT NULL
1-31
CALENDAR_DAY_OF_WEEK
INTEGER NOT NULL
1-7
CALENDAR_DAY_NAME
INTEGER NOT NULL
Monday, Tuesday etc
CALENDAR_YEAR_MONTH
INTEGER NOT NULL,
201011, 201012, 201101 etc
REVENUE ANALYSIS
Column
Data Type
Details
ACTIVITY_DATE
DATE NOT NULL
Date Wager was made
MEMBER_ID
INTEGER NOT NULL
Unique Player identifier
GAME_ID
SMALLINT NOT NULL
Unique Game identifier
WAGER_AMOUNT
REAL NOT NULL
Total amount wagered on the game
NUMBER_OF_WAGERS
INTEGER NOT NULL
Number of wagers on the game
WIN_AMOUNT
REAL NOT NULL
Total amount won on the game
ACTIVITY_YEAR_MONTH
INTEGER NOT NULL
YYYYMM
BANK_TYPE_ID
SMALL INT DEFAULT 0 NOT NULL,
0=Real money, 1=Bonus money
Screenshot for both tables below:
CALENDAR TABLE
REVENUE ANALYSIS TABLE
Long story short "active" means that the member has made a minimum of one real money wager in the month.
Every month a member has a certain lifecycle type. This status will change on a monthly basis on their previous and current months activity. The statuses are the following:
NEW
First time they placed a real money wager
RETAINED
Active in the prior calendar month and the current calendar month
UNRETAINED
Active in the prior calendar month but not active in the current calendar month
REACTIVATED
Not active in the prior calendar month, but active in the current calendar month
LAPSED
Not active in the prior calendar month or the current calendar month
We would like initially to get to a view with the columns below:
MEMBER_ID |
CALENDAR_YEAR_MONT |
MEMBER_LIFECYCLE_STATUS |
LAPSED_MONTHS
Also the view should display one row per member per month, starting from the month in which they first placed a real money wager. This view should give their lifecycle status for that month, and if the member has lapsed, it should show a rolling count of the number of months since they were last active.
So far I have come up with the following CTE to give me a basis for the view. However I am not sure about the UNRETAINED and REACTIVATED columns. Any ideas anyone?
with all_activities as (
select a.member_id, activity_date, calendar_month_number as month_activity, calendar_year as year_activity,
datepart(month,CURRENT_TIMESTAMP) as current_month, datepart(year,CURRENT_TIMESTAMP) as current_year,
datepart(month,CONVERT(DATE, DATEADD(DAY,-DAY(GETDATE()),GETDATE()))) as previous_month, datepart(year,CONVERT(DATE, DATEADD(DAY,-DAY(GETDATE()),GETDATE()))) as year_last_month,
a.NUMBER_OF_WAGERS, (case when datepart(month,CURRENT_TIMESTAMP) = calendar_month_number and datepart(year,CURRENT_TIMESTAMP) = calendar_year then 'active' else 'inactive' end) as status,
case when (case when datepart(month,CURRENT_TIMESTAMP) = calendar_month_number and datepart(year,CURRENT_TIMESTAMP) = calendar_year then 'active' else 'inactive' end) = 'active' and number_of_wagers = 1 then 'New'
when (LAG((case when datepart(month,CURRENT_TIMESTAMP) = calendar_month_number and datepart(year,CURRENT_TIMESTAMP) = calendar_year then 'active' else 'inactive' end) ,1,0) OVER(PARTITION BY member_id ORDER BY calendar_month_number desc) = 'active' and calendar_month_number = datepart(month,CONVERT(DATE, DATEADD(DAY,-DAY(GETDATE()),GETDATE())))) then 'Retained'
when (calendar_month_number = datepart(month,CURRENT_TIMESTAMP) and year_activity = datepart(year,CURRENT_TIMESTAMP) and calendar_month_number = datepart(month,CONVERT(DATE, DATEADD(DAY,-DAY(GETDATE()),GETDATE())))) then 'Unretained'
from [dbo].[REVENUE_ANALYSIS] a
join CALENDAR b on a.ACTIVITY_DATE= b.CALENDAR_DATE
)
select * from all_activities
This is about customer lifecycle status analysis, which requires a couple of things:
customer acquisition date (it'll be nice to have this stored because some customers may go back to years or tens of years). For this question, we assume revenue_analysis has everthing we need and to calculate user acquisition month
lapsed vs churned: a churned customer is usually defined no activity for a period of time. For this question, we don't have the definition, thus, a user will be reported as lapsed forever.
For life cycle status calculation, we're going to gather the following (member_id, calendar_month, acquisition_month, activity_month, prior_activity_month), so that we can calculate the final result.
with cte_new_user_monthly as (
select member_id,
min(activity_year_month) as acquisition_month
from revenue_analysis
group by 1),
cte_user_monthly as (
select u.member_id,
u.acquisition_month,
m.yyyymm as calendar_month
from cte_new_user_monthly u,
calendar_month m
where u.acquisition_month <= m.yyyymm),
cte_user_activity_monthly as (
select f.member_id,
f.activity_year_month as activity_month
from revenue_analysis f
group by 1,2),
cte_user_lifecycle as (
select u.member_id,
u.calendar_month,
u.acquisition_month,
m.activity_month
from cte_user_monthly u
left
join cte_user_activity_monthly m
on u.member_id = m.member_id
and u.calendar_month = m.activity_month),
cte_user_status as (
select member_id,
calendar_month,
acquisition_month,
activity_month,
lag(activity_month,1) over (partition by member_id order by calendar_month) as prior_activity_month
from cte_user_lifecycle),
user_status_monthly as (
select member_id,
calendar_month,
activity_month,
case
when calendar_month = acquisition_month then 'NEW'
when prior_activity_month is not null and activity_month is not null then 'RETAINED'
when prior_activity_month is not null and activity_month is null then 'UNRETAINED'
when prior_activity_month is null and activity_month is not null then 'REACTIVATED'
when prior_activity_month is null and activity_month is null then 'LAPSED'
else null
end as user_status
from cte_user_status)
select member_id,
calendar_month,
activity_month,
user_status,
row_number() over (partition by member_id, user_status order by calendar_month) as months
from user_status_monthly
order by 1,2;
Result (include activity_month for easy understanding):
member_id|calendar_month|activity_month|user_status|months|
---------+--------------+--------------+-----------+------+
1001| 201701| 201701|NEW | 1|
1001| 201702| |UNRETAINED | 1|
1001| 201703| |LAPSED | 1|
1001| 201704| |LAPSED | 2|
1001| 201705| 201705|REACTIVATED| 1|
1001| 201706| 201706|RETAINED | 1|
1001| 201707| |UNRETAINED | 2|
1001| 201708| |LAPSED | 3|
1001| 201709| 201709|REACTIVATED| 2|
1001| 201710| |UNRETAINED | 3|
1001| 201711| |LAPSED | 4|
1001| 201712| 201712|REACTIVATED| 3|
1002| 201703| 201703|NEW | 1|
1002| 201704| |UNRETAINED | 1|
1002| 201705| |LAPSED | 1|
1002| 201706| |LAPSED | 2|
1002| 201707| |LAPSED | 3|
1002| 201708| |LAPSED | 4|
1002| 201709| |LAPSED | 5|
1002| 201710| |LAPSED | 6|
1002| 201711| |LAPSED | 7|
1002| 201712| |LAPSED | 8|
EDIT:
Codes tested in MySQL because I didn't notice 'mysql' tag was removed.
calendar_month in the code can be derived from the calendar dimension.

Running total between two dates SQL

I have a problem with building an efficient query in order to get a running total of sales between two dates.
Now I have the query :
select SalesId,
sum(Sales) as number_of_sales,
Sales_DATE as SalesDate,
ADD_MONTHS(Sales_DATE , -12) as SalesDatePrevYear
from DWH.L_SALES
group by SalesId, Sales_DATE
With the result:
| SalesId| number_of_sales| SalesDate|SalesDatePrevYear|
|:---- |:------:| :-----:|-----:|
| 1000| 1| 20200101|20190101|
| 1001| 1| 20220101|20210101|
| 1002| 1| 20220201|20210201|
| 1003| 1| 20220301|20210301|
The preferred result is the following:
| SalesId| number_of_sales| running total of sales | SalesDate|SalesDatePrevYear|
|:---- |:------:| :-----:| :-----:|-----:|
| 1000| 1| 1 | 20200101|20190101|
| 1001| 1| 1 | 20220101|20210101|
| 1002| 1| 2| 20220201|20210201|
| 1003| 1| 3|20220301|20210301|
As you can see, I want the total of Sales between the two dates, but because I also need the lower level (SalesId), it always stays at 1.
How can i get this efficiently?
You have successfully gotten the result which gives you the start and end dates that you care about, so you just need to take this result and then join it to the original data with an inequality join, and then sum the results. I suggest looking into the style of using CTE's (Common Table Expressions) which is helpful for learning and debugging.
For example,
WITH CTE_BASE_RESULT AS
(
your query goes here
)
SELECT CTE_BASE_RESULT.SalesId, CTE_BASE_RESULT.SalesDate, SUM(Sales) AS Total_Sales_Prior_Year
FROM CTE_BASE_RESULT
INNER JOIN DWH.L_Sales
ON CTE_BASE_RESULT.SalesId = L_Sales.SalesId
AND CTE_BASE_RESULT.SalesDate >= L_Sales.SalesDATE
AND CTE_BASE_RESULT.SalesDatePrevYear > L_Sales.SalesDATE
GROUP BY CTE_BASE_RESULT.SalesId, CTE_BASE_RESULT.SalesDate
I also recommend a website like SQL Generator that can help write complex operations, for example this is called Timeseries Aggregate.
This syntax works for snowflake, I didnt see what system you're on.
Alternatively,
WITH BASIC_OFFSET_1YEAR AS (
SELECT
A.Sales_Id,
A.SalesDate,
SUM(B.Sales) as SUM_SALES_PAST1YEAR
FROM
L_Sales A
INNER JOIN L_Sales B ON A.Sales_Id = B.Sales_Id
WHERE
B.SalesDate >= DATEADD(YEAR, -1, A.SalesDate)
AND B.SalesDate <= A.SalesDate
GROUP BY
A.Sales_Id,
A.SalesDate
)
SELECT
src.*, BASIC_OFFSET_1YEAR.SUM_SALES_PAST1YEAR
FROM
L_Sales src
LEFT OUTER JOIN BASIC_OFFSET_1YEAR
ON BASIC_OFFSET_1YEAR.SalesDate = src.SalesDate
AND BASIC_OFFSET_1YEAR.Sales_Id = src.Sales_Id

SQL to Determine Unpaid Balance Within Periods

I have the following table:
/**
| NAME | DELTA (PAID - EXPECTED) | PERIOD |
|-------|-------------------------|--------|
| SMITH | -50| 1|
| SMITH | 0| 2|
| SMITH | 150| 3|
| SMITH | -200| 4|
| DOE | 300| 1|
| DOE | 0| 2|
| DOE | -200| 3|
| DOE | -200| 4|
**/
DROP TABLE delete_me;
CREATE TABLE delete_me (
"NAME" varchar(255),
"DELTA (PAID - EXPECTED)" numeric(15,2),
"PERIOD" Int
);
INSERT INTO delete_me("NAME", "DELTA (PAID - EXPECTED)", "PERIOD")
VALUES
('SMITH', -50, 1),
('SMITH', 0, 2),
('SMITH', 150, 3),
('SMITH', -200, 4),
('DOE', 300, 1),
('DOE', 0, 2),
('DOE', -200, 3),
('DOE', -200, 4)
Where period represents time, with 1 being the newest and 4 being the oldest. In each time period the person was charged and amount and they could pay off that amount or more. A negative delta means that they owe for that time period. A positive delta means that they paid over the expected amount and has a credit for that time period that can be applied to other time periods. If there's a credit we'd want to pay off the oldest time period first. I want to get how much unpaid debt is still looming for each time period.
So in the example above I'd want to see:
| NAME | DELTA (PAID - EXPECTED) | PERIOD | PERIOD BALANCE |
|-------|-------------------------|--------|----------------|
| SMITH | -50| 1| -50|
| SMITH | 0| 2| 0|
| SMITH | 150| 3| 0|
| SMITH | -200| 4| -50|
| DOE | 300| 1| 0|
| DOE | 0| 2| 0|
| DOE | -200| 3| -100|
| DOE | -200| 4| 0|
How can I use Postgres SQL to show the Unpaid debt within periods?
Additional description: for doe initially, in the oldest period, 200 was owed, the next period the owed the original 200 plus another 200 (400 total owed). In Period 2 the monthly charge was paid, but not the past balances. In the most recent period (1) 300 over the monthly amount was paid (200 of this was applied to the oldest debt in period 4, meaning it was paid off; leaving 100 to apply to period three's debt; and after applying the remaining 100, 100 was still owed).
For the Smith family initially in period 4 they underpaid 200. The next period they overpaid 150 for the month and this was applied to the oldest debt of 200, leaving 50 to still be paid. In period 2 the monthly bill was paid exactly, they still owed the 50 dollars from period 4. Then in period 1 they underpaid 50. They owe 100 in total, 50 for period 1 and 50 for period 4.
According to what I understood, you want to distribute the sum of positive DELTA values (credit) among the negative DELTA values starting from the oldest period.
with cte as (
Select Name_, DELTA, PERIOD_,
Sum(case when DELTA<0 then delta else 0 end)
Over (Partition By NAME_ Order By PERIOD_ desc) +
Sum(case when DELTA>0 then delta else 0 end)
Over (Partition By NAME_) as positive_credit_to_negativ_delta
From delete_me
)
Select Name_,DELTA,PERIOD_,positive_credit_to_negativ_delta,
case
when delta >= 0 then 0
else
case
when positive_credit_to_negativ_delta >= 0 then 0
else
greatest(delta , positive_credit_to_negativ_delta)
end
end as PERIOD_BALANCE
from cte
Order By NAME_,PERIOD_
See a demo from db-fiddle.
The idea in this query is to find the sum of all positive DELTA values for each user, then add that sum to the cumulative sum of the negative values starting from the oldest period. The result of this addition is stored in positive_credit_to_negativ_delta in the query.
Of course for DELTA with values >= 0, the result will be 0 since no debit for that period.
For negative DELTA values:
If the value of positive_credit_to_negativ_delta is >= 0 then the
result will be 0, that means the period delta is covered by the
positive credit.
If the value of positive_credit_to_negativ_delta
is < 0 then the result will be the max value from positive_credit_to_negativ_delta and DELTA.
The query below utilizes a recursive cte with several JSON structures. The usage of the latter allows accurate tracking of negative to positive balance intervals with possibly more than one potential positive balances after negative:
with recursive cte(n, l, p, js, js1) as (
select d1.name, d5.delta, d1.m, jsonb_build_object(d1.m, d5.delta), jsonb_build_array(d1.m)
from (select d.name, max(d.period) m from delete_me d where d.delta < 0 group by d.name) d1
join delete_me d5 on d5.name = d1.name and d5.period = d1.m
union all
select c.n, d.delta, d.period,
case when d.delta < 0 then c.js||jsonb_build_object(d.period, d.delta)
when d.delta = 0 then c.js
else (select jsonb_object_agg(k.v3, least((c.js -> k.v3::text)::int +
greatest(d.delta + coalesce((select sum((c.js -> v2.value::text)::int)
from jsonb_array_elements(c.js1) v2 where v2.value::int > k.v3),0),0),0))
from (select v.value::int v3 from jsonb_array_elements(c.js1) v
order by v.value::int desc) k)||jsonb_build_object(d.period, greatest(d.delta + (select sum((c.js -> v2.value::text)::int)
from jsonb_array_elements(c.js1) v2),0)) end,
case when d.delta < 0 then (case when c.l <= 0 then c.js1 else '[]'::jsonb end) || ('['||d.period||']')::jsonb
else c.js1 end
from cte c join delete_me d on d.period = c.p - 1 and d.name = c.n
)
select d.*, coalesce((c.js -> d.period::text)::int, 0) from delete_me d
join cte c on c.n = d.name where c.p = 1
order by d.name desc, d.period asc

how to get daily profit from sql table

I'm stucking for a solution at the problem of finding daily profits from db (ms access) table. The difference wrt other tips I found online is that I don't have in the table a field "Price" and one "Cost", but a field "Type" which distinguish if it is a revenue "S" or a cost "C"
this is the table "Record"
| Date | Price | Quantity | Type |
-----------------------------------
|01/02 | 20 | 2 | C |
|01/02 | 10 | 1 | S |
|01/02 | 3 | 10 | S |
|01/02 | 5 | 2 | C |
|03/04 | 12 | 3 | C |
|03/03 | 200 | 1 | S |
|03/03 | 120 | 2 | C |
So far I tried different solutions like:
SELECT
(SELECT SUM (RS.Price* RS.Quantity)
FROM Record RS WHERE RS.Type='S' GROUP BY RS.Data
) as totalSales,
(SELECT SUM (RC.Price*RC.Quantity)
FROM Record RC WHERE RC.Type='C' GROUP BY RC.Date
) as totalLosses,
ROUND(totalSales-totaleLosses,2) as NetTotal,
R.Date
FROM RECORD R";
in my mind it could work but obviously it doesn't
and
SELECT RC.Data, ROUND(SUM (RC.Price*RC.QuantitY),2) as DailyLoss
INTO #DailyLosses
FROM Record RC
WHERE RC.Type='C' GROUP BY RC.Date
SELECT RS.Date, ROUND(SUM (RS.Price*RS.Quantity),2) as DailyRevenue
INTO #DailyRevenues
FROM Record RS
WHERE RS.Type='S'GROUP BY RS.Date
SELECT Date, DailyRevenue - DailyLoss as DailyProfit
FROM #DailyLosses dlos, #DailyRevenues drev
WHERE dlos.Date = drev.Date";
My problem beyond the correct syntax is the approach to this kind of problem
You can use grouping and conditional summing. Try this:
SELECT data.Date, data.Income - data.Cost as Profit
FROM (
SELECT Record.Date as Date,
SUM(IIF(Record.Type = 'S', Record.Price * Record.Quantity, 0)) as Income,
SUM(IIF(Record.Type = 'C', Record.Price * Record.Quantity, 0)) as Cost,
FROM Record
GROUP BY Record.Date
) data
In this case you first create a sub-query to get separate fields for Income and Cost, and then your outer query uses subtraction to get actual profit.

Calculations over Multiple Rows SQL Server

If I have data in the format;
Account | Period | Values
Revenue | 2013-01-01 | 5432
Revenue | 2013-02-01 | 6471
Revenue | 2013-03-01 | 7231
Costs | 2013-01-01 | 4321
Costs | 2013-02-01 | 5672
Costs | 2013-03-01 | 4562
And I want to get results out like;
Account | Period | Values
Margin | 2013-01-01 | 1111
Margin | 2013-02-01 | 799
Margin | 2013-03-01 | 2669
M% | 2013-01-01 | .20
M% | 2013-02-01 | .13
M% | 2013-03-01 | .37
Where Margin = Revenue - Costs and M% is (Revenue - Costs)/Revenue for each period.
I can see various ways of achieving this but all are quite ugly and I wanted to know if there was elegant general approach for these sorts of multi-row calculations.
Thanks
Edit
Some of these calculations can get really complicated like
Free Cash Flow = Margin - Opex - Capex + Change in Working Capital + Interest Paid
So I am hoping for a general method that doesn't require lots of joins back to itself.
Thanks
Ok, then just Max over a Case statement, like such:
with RevAndCost as (revenue,costs,period)
as
(
select "Revenue" = Max(Case when account="Revenue" then Values else null end),
"Costs" = MAX(Case when account="Costs" then values else null end),
period
from data
group by period
)
select Margin = revenue-costs,
"M%" = (revenue-costs)/nullif(revenue,0)
from RevAndCost
Use a full self-join with a Union
Select 'Margin' Account,
coalesce(r.period, c.period) Period,
r.Values - c.Values Values
From myTable r
Full Join Mytable c
On c.period = r.period
Union
Select 'M%' Account,
coalesce(r.period, c.period) Period,
(r.Values - c.Values) / r.Values Values
From myTable r
Full Join Mytable c
On c.period = r.period
Here I use a Common Table Expression to do a full outer join between two instances of your data table to pull in Revenue and Costs into 1 table, then select from that CTE.
with RevAndCost as (revenue,costs,period)
as
(
select ISNULL(rev.Values,0) as revenue,
ISNULL(cost.values,0) as costs,
ISNULL(rev.period,cost.period)
from data rev full outer join data cost
on rev.period=cost.period
)
select Margin = revenue-costs,
"M%" = (revenue-costs)/nullif(revenue,0)
from RevAndCost
I'd do it like this:
SELECT r.PERIOD, r.VALUES AS revenue, c.VALUES AS cost,
r.VALUES - c.VALUES AS margin, (r.VALUES - c.VALUES) / r.VALUES AS mPct
FROM
(SELECT PERIOD, VALUES FROM t WHERE
ACCOUNT = 'revenue') r INNER JOIN
(SELECT PERIOD, VALUES FROM t WHERE
ACCOUNT = 'costs') c ON
r.PERIOD = c.PERIOD