Sum for a rolling total - sql

I have the following query:
select b.month_date,total_signups,active_users from
(
SELECT date_trunc('month',confirmed_at) as month_date
, count(distinct id) as total_signups
FROM follower.users
WHERE confirmed_at::date >= dateadd(day,-90,getdate())::date
and (deleted_at is null or deleted_at > date_trunc('month',confirmed_at))
group by 1
) a ,
(
SELECT date_trunc('month', inv.created_at) AS month_date
,COUNT(DISTINCT em.user_id) AS active_users
FROM follower.invitees inv
INNER JOIN follower.events
ON inv.event_id = em.event_id
where inv.created_at::date >= dateadd(day,-90,getdate())::date
GROUP BY 1
) b
where a.month_date=b.month_date
This returns three columns month date, total signups and active users, what I need is a rolling total for all users in the fourth column (rolling total of signups). I've tried over and partition functions with no luck. Could someone help? Appreciate it very much.

Try adding this column definition to your first Select:
SUM(total_signups)
OVER (ORDER BY b.month_date ASC rows between unbounded preceding and current row)
AS running_total
Here's a mini-demo

Related

ETL query need some changes go get it right

Hello guys I have a query which is working but when I remove 2 filters (2 where clauses at the end doesn't work as expected but still have to be removed from the query)
I have accounts 1000001,1000002,1000003,1000004 and 1000005
I only get 1000005 accounts, Pretty sure that it`s is about the window MAX function, but still.
I want to get the all values for the accounts.
SELECT a12.month_id,
a12.populate_id AS account_id,
LAST_VALUE(current_bal IGNORE NULLS) OVER
(PARTITION BY Populate_id ORDER BY date_id ASC ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS avg_dly_bal
FROM (SELECT TO_CHAR(date_id, 'YYYYMM') AS month_id,
date_id,
account_id AS "account_id",
MAX(account_id) OVER (PARTITION by TO_CHAR(date_id, 'YYYYMM')) as populate_id,
current_bal
FROM (SELECT t.date_id, ad.account_id, ad.current_bal
FROM timedate t
FULL OUTER JOIN (SELECT src_extract_dt, account_id, current_bal
FROM account_dly
WHERE account_id = 1000001) ad
on t.date_id = ad.src_extract_dt
WHERE TO_CHAR(date_id, 'YYYYMM') = '201908'
order by t.date_id)) a12;
https://i.stack.imgur.com/xphVh.png

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

Retrieve records by continuation of days in oracle

I want to retrieve records where cash deposits are more than 4 totaling to 1000000 during a day and continues for more than 5 days.
I have came up with below query.
SELECT COUNT(a.txamt) AS "txcount"
, SUM(a.txamt) AS "txsum"
, b.custcd
, a.txdate
FROM tb_transactions a
INNER JOIN tb_accounts b
ON a.acctno = b.acctno
WHERE a.cashflowtype = 'CR'
GROUP BY b.custcd, a.txdate
HAVING COUNT(a.txamt)>4 and SUM(a.txamt)>='1000000'
ORDER BY a.txdate;
But I'm stuck on how to fetch the records if the pattern continues for 5 days.
How to achieve the desired result?
Something like:
SELECT *
FROM (
SELECT t.*,
COUNT( txdate ) OVER ( PARTITION BY custcd
ORDER BY txdate
RANGE BETWEEN INTERVAL '0' DAY PRECEDING
AND INTERVAL '4' DAY FOLLOWING ) AS
num_days
FROM (
select count(a.txamt) as "txcount",
sum(a.txamt) as "txsum",
b.custcd,
a.txdate
from tb_transactions a inner join tb_accounts b on a.acctno=b.acctno
where a.cashflowtype='CR'
group by b.custcd, a.txdate
having count(a.txamt)>4 and sum(a.txamt)>=1000000
) t
)
WHERE num_days = 5
order by a.txdate;

Running count shows all values instead of the total number of values

My data is stored in an Amazon Redshift db. I am attempting to get a running count of loans by month. This is my query:
SELECT
TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM') AS INITIAL_PURCHASE,
COUNT( LD.LOAN_ID) OVER (ORDER BY TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM') ROWS UNBOUNDED PRECEDING ) AS TOTAL_LOANS
FROM LOANS_DETAILS
INNER JOIN LOANS L ON LD.LOAN_ID = L.ID
WHERE L.UNDERWRITING_STATUS IN ('...')
AND LD.INITIAL_PURCHASE_DATE IS NOT NULL
GROUP BY
LD.LOAN_ID,
LD.INITIAL_PURCHASE_DATE;
My expected result is as follow:
INITIAL_PURCHASE|TOTAL_LOANS
...|...
2016-10|369
2016-11|424
But instead I get one record for every day of the month like so
INITIAL_PURCHASE|TOTAL_LOANS
...|...
2016-10|366
2016-10|367
2016-10|368
2016-10|369
2016-11|371
I checked the source system and confirmed there were a total of 369 loans in October, 424 in November so I know data's correct.
How do I get the total number of loans per month?
SOLUTION:
This is the correct query.
SELECT
TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM') AS INITIAL_PURCHASE_DATE,
SUM(COUNT( LD_LOANS.LOAN_ID )) OVER (ORDER BY TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM') ROWS UNBOUNDED PRECEDING ) AS TOTAL_LOANS
FROM LOANS_DETAIL LD
INNER JOIN LOANS L ON LD.LOAN_ID = L.ID
WHERE L.UNDERWRITING_STATUS IN ('...') AND LD.INITIAL_PURCHASE_DATE IS NOT NULL
GROUP BY TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM')
Your group by needs to be by month, not day, and you need to remove LOAN_ID from the GROUP BY:
SELECT TO_CHAR(LD.INITIAL_PURCHASE_DATE, 'YYYY-MM') AS INITIAL_PURCHASE,
SUM(COUNT( LD.LOAN_ID)) OVER (ORDER BY TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM') ROWS UNBOUNDED PRECEDING ) AS TOTAL_LOANS
FROM LOANS_DETAILS LD INNER JOIN
LOANS L
ON LD.LOAN_ID = L.ID
WHERE L.UNDERWRITING_STATUS IN ('...') AND
LD.INITIAL_PURCHASE_DATE IS NOT
GROUP BY TO_CHAR(LD.INITIAL_PURCHASE_DATE, 'YYYY-MM')
Notes:
I think Amazon Redshift allows aliases in the GROUP BY, so you could use GROUP BY INITIAL_PURPOSE, LD.LOAN_ID.
The SUM(COUNT(*)) should give you the running sum.
LOAN_ID should not be in the GROUP BY if you want totals by month.
This is what you were aiming for.
You group by INITIAL_PURCHASE ('YYYY-MM') and do a running total on count(*).
SELECT TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM') AS INITIAL_PURCHASE
,sum(count(*)) OVER
(ORDER BY TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM')
ROWS UNBOUNDED PRECEDING ) AS TOTAL_LOANS
FROM LOANS_DETAILS LD
INNER JOIN LOANS L
ON LD.LOAN_ID = L.ID
WHERE L.UNDERWRITING_STATUS IN ('...')
AND LD.INITIAL_PURCHASE_DATE IS NOT NULL
GROUP BY INITIAL_PURCHASE
P.s.
I think the alias INITIAL_PURCHASE should be recognized in the GROUP BY clause, if I am mistaken then use TO_CHAR(LD.INITIAL_PURCHASE_DATE,'YYYY-MM')

Query for getting previous date in oracle in specific scenario

I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out