Retrieve records by continuation of days in oracle - sql

I want to retrieve records where cash deposits are more than 4 totaling to 1000000 during a day and continues for more than 5 days.
I have came up with below query.
SELECT COUNT(a.txamt) AS "txcount"
, SUM(a.txamt) AS "txsum"
, b.custcd
, a.txdate
FROM tb_transactions a
INNER JOIN tb_accounts b
ON a.acctno = b.acctno
WHERE a.cashflowtype = 'CR'
GROUP BY b.custcd, a.txdate
HAVING COUNT(a.txamt)>4 and SUM(a.txamt)>='1000000'
ORDER BY a.txdate;
But I'm stuck on how to fetch the records if the pattern continues for 5 days.
How to achieve the desired result?

Something like:
SELECT *
FROM (
SELECT t.*,
COUNT( txdate ) OVER ( PARTITION BY custcd
ORDER BY txdate
RANGE BETWEEN INTERVAL '0' DAY PRECEDING
AND INTERVAL '4' DAY FOLLOWING ) AS
num_days
FROM (
select count(a.txamt) as "txcount",
sum(a.txamt) as "txsum",
b.custcd,
a.txdate
from tb_transactions a inner join tb_accounts b on a.acctno=b.acctno
where a.cashflowtype='CR'
group by b.custcd, a.txdate
having count(a.txamt)>4 and sum(a.txamt)>=1000000
) t
)
WHERE num_days = 5
order by a.txdate;

Related

Getting the number of users for this year and last year in SQL query

My table structure like this
root_tstamp
userId
2022-01-26T00:13:24.725+00:00
d2212
2022-01-26T00:13:24.669+00:00
ad323
2022-01-26T00:13:24.629+00:00
adfae
2022-01-26T00:13:24.573+00:00
adfa3
2022-01-26T00:13:24.552+00:00
adfef
...
...
2021-01-26T00:12:24.725+00:00
d2212
2021-01-26T00:15:24.669+00:00
daddfe
2021-01-26T00:14:24.629+00:00
adfda
2021-01-26T00:12:24.573+00:00
466eff
2021-01-26T00:12:24.552+00:00
adfafe
I want to get the number of users in the current year and in previous year like below using SQL.
Date
Users
previous_year
2022-01-01
10
5
2022-01-02
20
15
and the query I have used is:
with base as (
select
date(root_tstamp) as current_date
, count(distinct userid) as signup_counts
from table1
group by 1
)
select
t1.current_date
, t1.signup_counts as signups_this_year
, t2.signup_counts as signups_last_year
, t1.signup_counts - t2.signups_counts as difference
from base t1
left join base t2 on t1.current_date = t2.current_date + interval '1 year'
group by t1.current_date
order by t1.current_date Desc
But I getting this error:
ERROR: column t2.signups_counts does not exist
It's because you have t2.signup_counts is misspelled as t2.signups_counts.
Another note is that your query only has a GROUP BY on current_date and since the other columns are not aggregates you've to include these columns too.
Here is the modified query:
with base as (
select
date(root_tstamp) as current_date
, count(distinct userid) as signup_counts
from table1
group by 1
)
select
t1.current_date
, t1.signup_counts as signups_this_year
, t2.signup_counts as signups_last_year
, t1.signup_counts - t2.signup_counts as difference
from base t1
left join base t2 on t1.current_date = t2.current_date - interval '1 year'
group by t1.current_date, t2.signup_counts, t1.signup_counts
order by t1.current_date Desc

Query which tabulates on multiple different dates in given record

I have a table that has single line entries for customers and events. For instance:
custid lead_created_date, signed_contract_date, completed_date
9999 01-01-1980 02-23-1980 03-15-1980
2222 01-15-1980 01-18-1980 02-13-1980
Now I need to get a set of stats on in-period events. Essentially how many events happened each month. In the above case:
Leads Created Signed Contracts Completed Projects
01-1980 2 1 0
02-1980 0 1 1
03-1980 0 0 1
Now this is different from a cohorted view where I would simply take the lead dates and show you the cohorts. I actually need the in-period activity. This is a simple view and I actually have a dozen or so stages. So the code below works for two elements, however, when I simply nest a third run for the third set of dates I get a resources exceeded/too complex subqueries error from BigQuery.
There has to be a better way to accomplish this in a single view.
SELECT * FROM (
### Count SRA in period
WITH dates AS(
SELECT format_datetime("%Y %B", day) as YEAR_MONTH
FROM UNNEST(
GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 MONTH)) as day
)
SELECT dates.YEAR_MONTH
, COUNT(CASE WHEN dates.YEAR_MONTH = FORMAT_TIMESTAMP("%Y %B",Lead_Created_Datetime) THEN 1 END) AS Leads_Created
FROM dates
JOIN `Main_Reporting_V` LEADS ON dates.YEAR_MONTH = FORMAT_TIMESTAMP("%Y %B", LEADS.Lead_Created_Datetime)
GROUP BY dates.YEAR_MONTH) L
#### Count Opptys in period
JOIN (
WITH dates AS(
SELECT format_datetime("%Y %B", day) as YEAR_MONTH
FROM UNNEST(
GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 MONTH)) as day
)
SELECT dates.YEAR_MONTH
, COUNT(CASE WHEN dates.YEAR_MONTH = FORMAT_TIMESTAMP("%Y %B",Opptys.Lead_Created_Datetime) THEN 1 END) AS Opptys_Created
FROM dates
JOIN `Main_Reporting_V` OPPTYS ON dates.YEAR_MONTH = FORMAT_TIMESTAMP("%Y %B", Opptys.Oppty_Created_Datetime)
GROUP BY dates.YEAR_MONTH
) O
ON L.YEAR_MONTH = O.YEAR_MONTH
I believe you mean the following which still throws the flag of "too many subqueries or subqueries too complex"
### Count SRA IN period
WITH
dates AS (SELECT format_datetime("%Y %B",day) AS YEAR_MONTH
FROM UNNEST( GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 MONTH)) AS day ),
LEADS AS (SELECT
dates.YEAR_MONTH,
COUNT(CASE WHEN dates.YEAR_MONTH = FORMAT_TIMESTAMP("%Y %B",Lead_Created_Datetime) THEN 1
END) AS Leads_Created
FROM dates
LEFT JOIN`CCA_Main_Reporting_V` LEADS
ON dates.YEAR_MONTH = FORMAT_TIMESTAMP("%Y %B", LEADS.Lead_Created_Datetime)
GROUP BY dates.YEAR_MONTH)
### Count Opptys IN period
,OPPTYS AS (SELECT
dates.YEAR_MONTH,
COUNT(CASE WHEN dates.YEAR_MONTH = FORMAT_TIMESTAMP("%Y %B",Opptys.Lead_Created_Datetime) THEN 1
END ) AS Opptys_Created
FROM dates
JOIN `CCA_Main_Reporting_V` OPPTYS
ON dates.YEAR_MONTH = FORMAT_TIMESTAMP("%Y %B", Opptys.Oppty_Created_Datetime)
GROUP BY dates.YEAR_MONTH )
#
SELECT
dates.YEAR_MONTH,
Leads_Created,
Opptys_Created
FROM
dates
LEFT JOIN LEADS ON dates.YEAR_MONTH = LEADS.YEAR_MONTH
LEFT JOIN OPPTYS on dates.YEAR_MONTH = OPPTYS.YEAR_MONTH
#LEFT JOIN
# OPPTYS ON dates.YEAR_MONTH = LEADS.YEAR_MONTH
LIMIT 1000
Maybe simplifying a query could help. Try using a single GENERATE_DATE_ARRAY subquery and then LEFT JOINING other 3 subqueries (select lead_created_date, count(*) from table group by lead_created_date) to it.
UPDATE:
More like:
WITH dates AS (
SELECT format_datetime("%Y %B", day) AS YEAR_MONTH
FROM UNNEST( GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 MONTH)) AS day
),
LEADS AS (
SELECT
dates.YEAR_MONTH,
COUNT(*) AS Leads_Created
FROM `CCA_Main_Reporting_V` LEADS
GROUP BY dates.YEAR_MONTH
)
,OPPTYS AS (
SELECT
dates.YEAR_MONTH,
COUNT(*) AS Opptys_Created
FROM `CCA_Main_Reporting_V` OPPTYS
GROUP BY dates.YEAR_MONTH
)
SELECT
dates.YEAR_MONTH,
Leads_Created,
Opptys_Created
FROM
dates
LEFT JOIN LEADS ON dates.YEAR_MONTH = LEADS.YEAR_MONTH
LEFT JOIN OPPTYS on dates.YEAR_MONTH = OPPTYS.YEAR_MONTH
#LEFT JOIN
# OPPTYS ON dates.YEAR_MONTH = LEADS.YEAR_MONTH
LIMIT 1000
After much soul searching, I realized that the underlying table was a view. My query wasn't too complicated but the view build is crazy convoluted (not my code). Once I saved the view as a table then Sergey's answer works brilliantly!

Sum for a rolling total

I have the following query:
select b.month_date,total_signups,active_users from
(
SELECT date_trunc('month',confirmed_at) as month_date
, count(distinct id) as total_signups
FROM follower.users
WHERE confirmed_at::date >= dateadd(day,-90,getdate())::date
and (deleted_at is null or deleted_at > date_trunc('month',confirmed_at))
group by 1
) a ,
(
SELECT date_trunc('month', inv.created_at) AS month_date
,COUNT(DISTINCT em.user_id) AS active_users
FROM follower.invitees inv
INNER JOIN follower.events
ON inv.event_id = em.event_id
where inv.created_at::date >= dateadd(day,-90,getdate())::date
GROUP BY 1
) b
where a.month_date=b.month_date
This returns three columns month date, total signups and active users, what I need is a rolling total for all users in the fourth column (rolling total of signups). I've tried over and partition functions with no luck. Could someone help? Appreciate it very much.
Try adding this column definition to your first Select:
SUM(total_signups)
OVER (ORDER BY b.month_date ASC rows between unbounded preceding and current row)
AS running_total
Here's a mini-demo

Oracle SQL Hierarchy Summation

I have a table TRANS that contains the following records:
TRANS_ID TRANS_DT QTY
1 01-Aug-2020 5
1 01-Aug-2020 1
1 03-Aug-2020 2
2 02-Aug-2020 1
The expected output:
TRANS_ID TRANS_DT BEGBAL TOTAL END_BAL
1 01-Aug-2020 0 6 6
1 02-Aug-2020 6 0 6
1 03-Aug-2020 6 2 8
2 01-Aug-2020 0 0 0
2 02-Aug-2020 0 1 1
2 03-Aug-2020 1 0 1
Each trans_id starts with a beginning balance of 0 (01-Aug-2020). For succeeding days, the beginning balance is the ending balance of the previous day and so on.
I can create PL/SQL block to create the output. Is it possible to get the output in 1 SQL statement?
Thanks.
Try this following script using CTE-
Demo Here
WITH CTE
AS
(
SELECT DISTINCT A.TRANS_ID,B.TRANS_DT
FROM your_table A
CROSS JOIN (SELECT DISTINCT TRANS_DT FROM your_table) B
),
CTE2
AS
(
SELECT C.TRANS_ID,C.TRANS_DT,SUM(D.QTY) QTY
FROM CTE C
LEFT JOIN your_table D
ON C.TRANS_ID = D.TRANS_ID
AND C.TRANS_DT = D.TRANS_DT
GROUP BY C.TRANS_ID,C.TRANS_DT
ORDER BY C.TRANS_ID,C.TRANS_DT
)
SELECT F.TRANS_ID,F.TRANS_DT,
(
SELECT COALESCE (SUM(QTY), 0) FROM CTE2 E
WHERE E.TRANS_ID = F.TRANS_ID AND E.TRANS_DT < F.TRANS_DT
) BEGBAL,
(
SELECT COALESCE (SUM(QTY), 0) FROM CTE2 E
WHERE E.TRANS_ID = F.TRANS_ID AND E.TRANS_DT = F.TRANS_DT
) TOTAL ,
(
SELECT COALESCE (SUM(QTY), 0) FROM CTE2 E
WHERE E.TRANS_ID = F.TRANS_ID AND E.TRANS_DT <= F.TRANS_DT
) END_BAL
FROM CTE2 F
You can as well do like this (I would assume it's a bit faster): Demo
with
dt_between as (
select mindt + level - 1 as trans_dt
from (select min(trans_dt) as mindt, max(trans_dt) as maxdt from t)
connect by level <= maxdt - mindt + 1
),
dt_for_trans_id as (
select *
from dt_between, (select distinct trans_id from t)
),
qty_change as (
select distinct trans_id, trans_dt,
sum(qty) over (partition by trans_id, trans_dt) as total,
sum(qty) over (partition by trans_id order by trans_dt) as end_bal
from t
right outer join dt_for_trans_id using (trans_id, trans_dt)
)
select
trans_id,
to_char(trans_dt, 'DD-Mon-YYYY') as trans_dt,
nvl(lag(end_bal) over (partition by trans_id order by trans_dt), 0) as beg_bal,
nvl(total, 0) as total,
nvl(end_bal, 0) as end_bal
from qty_change q
order by trans_id, trans_dt
dt_between returns all the days between min(trans_dt) and max(trans_dt) in your data.
dt_for_trans_id returns all these days for each trans_id in your data.
qty_change finds difference for each day (which is TOTAL in your example) and cumulative sum over all the days (which is END_BAL in your example).
The main select takes END_BAL from previous day and calls it BEG_BAL, it also does some formatting of final output.
First of all, you need to generate dates, then you need to aggregate your values by TRANS_DT, and then left join your aggregated data to dates. The easiest way to get required sums is to use analitic window functions:
with dates(dt) as ( -- generating dates between min(TRANS_DT) and max(TRANS_DT) from TRANS
select min(trans_dt) from trans
union all
select dt+1 from dates
where dt+1<=(select max(trans_dt) from trans)
)
,trans_agg as ( -- aggregating QTY in TRANS
select TRANS_ID,TRANS_DT,sum(QTY) as QTY
from trans
group by TRANS_ID,TRANS_DT
)
select -- using left join partition by to get data on daily basis for each trans_id:
dt,
trans_id,
nvl(sum(qty) over(partition by trans_id order by dates.dt range between unbounded preceding and 1 preceding),0) as BEGBAL,
nvl(qty,0) as TOTAL,
nvl(sum(qty) over(partition by trans_id order by dates.dt),0) as END_BAL
from dates
left join trans_agg tr
partition by (trans_id)
on tr.trans_dt=dates.dt;
Full example with sample data:
alter session set nls_date_format='dd-mon-yyyy';
with trans(TRANS_ID,TRANS_DT,QTY) as (
select 1,to_date('01-Aug-2020'), 5 from dual union all
select 1,to_date('01-Aug-2020'), 1 from dual union all
select 1,to_date('03-Aug-2020'), 2 from dual union all
select 2,to_date('02-Aug-2020'), 1 from dual
)
,dates(dt) as ( -- generating dates between min(TRANS_DT) and max(TRANS_DT) from TRANS
select min(trans_dt) from trans
union all
select dt+1 from dates
where dt+1<=(select max(trans_dt) from trans)
)
,trans_agg as ( -- aggregating QTY in TRANS
select TRANS_ID,TRANS_DT,sum(QTY) as QTY
from trans
group by TRANS_ID,TRANS_DT
)
select
dt,
trans_id,
nvl(sum(qty) over(partition by trans_id order by dates.dt range between unbounded preceding and 1 preceding),0) as BEGBAL,
nvl(qty,0) as TOTAL,
nvl(sum(qty) over(partition by trans_id order by dates.dt),0) as END_BAL
from dates
left join trans_agg tr
partition by (trans_id)
on tr.trans_dt=dates.dt;
You can use a recursive query to generate the overall date range, cross join it with the list of distinct tran_id, then bring the table with a left join. The last step is aggregation and window functions:
with all_dates (trans_dt, max_dt) as (
select min(trans_dt), max(trans_dt) from trans group by trans_id
union all
select trans_dt + interval '1' day, max_dt from all_dates where trans_dt < max_dt
)
select
i.trans_id,
d.trans_dt,
coalesce(sum(sum(t.qty)) over(partition by i.trans_id order by d.trans_dt), 0) - coalesce(sum(t.qty), 0) begbal,
coalesce(sum(t.qty), 0) total,
coalesce(sum(sum(t.qty)) over(partition by i.trans_id order by d.trans_dt), 0) endbal
from all_dates d
cross join (select distinct trans_id from trans) i
left join trans t on t.trans_id = i.trans_id and t.trans_dt = d.trans_dt
group by i.trans_id, d.trans_dt
order by i.trans_id, d.trans_dt

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.