Oracle SQL query to get Weekly Records - sql

I have oracle database table 'my_table' with few sample rows as following-
Case_ID Start_Date End_Date STATUS
123 01/10/2018 03/10/2018 Close
124 02/10/2018 Open
125 03/10/2018 05/10/2018 Close
126 04/10/2018 Open
127 05/10/2018 07/10/2018 Close
128 06/10/2018 Open
129 07/10/2018 09/10/2018 Close
130 08/10/2018 10/10/2018 Close
131 09/10/2018 Open
I want to get output in following format -
Week_No Inflow Outflow Total_Backlog
40 7 4 3
41 2 1 4
How to combine following three queries to get the desired output in above format using one query?
SELECT to_char(Start_Date,'IW') Week_No, count(CASE_ID) as Inflow
FROM my_table;
SELECT to_char(End_Date,'IW') Week_No, count(CASE_ID) as Outflow
FROM my_table
WHERE status='Close';
SELECT to_char(Start_Date,'IW') Week_No, count(CASE_ID) as Total_Backlog
FROM my_table
WHERE status <> 'Close';

You might use:
select week_no, sum(nvl(Inflow,0)) as Inflow,
sum(nvl(Outflow,0)) as Outflow,
sum(nvl(Total_Backlog,0)) as Total_Backlog
from
(
select to_char(Start_Date,'IW') Week_No,
count(CASE_ID) as Inflow,
( case when STATUS != 'Close' then count(CASE_ID) end ) as Total_Backlog,
null Outflow
from my_table
group by to_char(Start_Date,'IW'), status
union all
select to_char(End_Date,'IW') Week_No,
null as Inflow, null as Total_Backlog,
( case when STATUS = 'Close' then count(CASE_ID) end ) as Outflow
from my_table
where End_Date is not null
group by to_char(End_Date,'IW'), status
)
group by week_no
order by week_no;
WEEK_NO INFLOW OUTFLOW TOTAL_BACKLOG
40 7 3 3
41 2 2 1
Rextester Demo
or you may prefer in a little different way as :
select week_no, sum(nvl(Inflow,0)) as Inflow,
sum(nvl(Outflow,0)) as Outflow,
sum(nvl(Total_Backlog,0)) as Total_Backlog
from
(
select to_char(Start_Date,'IW') Week_No,
count(CASE_ID) as Inflow,
count( case when STATUS != 'Close' then CASE_ID end ) as Total_Backlog,
null Outflow
from my_table
group by to_char(Start_Date,'IW')
union all
select to_char(End_Date,'IW') Week_No,
null as Inflow, null as Total_Backlog,
count( case when STATUS = 'Close' then CASE_ID end ) as Outflow
from my_table
where End_Date is not null
group by to_char(End_Date,'IW')
)
group by week_no
order by week_no;
WEEK_NO INFLOW OUTFLOW TOTAL_BACKLOG
40 7 3 3
41 2 2 1

Related

Categorising transactions in past 90 days

I have a table where the columns are:
Transaction_id(T_id): Distinct id generated for each transactions
Date(Dt): Date of Transaction
account-id(Ac_id): The id from which the transaction is done
Org_id(O_id): It is the id given to the organizations. One organization can have multiple accounts thereby different account id can have the same org_id
Sample table:
T_id
Dt
Ac_id
O_id
101
23/4/22
1
A
102
06/7/22
3
C
103
01/8/22
2
A
104
13/3/22
6
B
*The question is to mark the o_id where transactions are done in the past 90 days as 1 and others as 0
Output
T_id
Dt.
Ac_id.
O_id
Mark
101
23/4/22
1
A
0
102
06/7/22
3
C
1
103
01/8/22
2
A
1
104
13/3/22
6
B
0
The query I am using is:
Select *,
Case when datediff('day', Dt, current_date()) between 0 and 90 then '1'
Else '0'
End as Mark
From Table1
Desired Output:
T_id
Dt.
Ac_id.
O_id
Mark
101
23/4/22
1
A
1
102
06/7/22
3
C
1
103
01/8/22
2
A
1
104
13/3/22
6
B
0
for o_id 'A' from the output the mark I want is 1 in all cases as one transaction is done past 90 days, irrespective of other transactions done prior to 90days.
I have to join this out to another table so need all o_id where ever any one transaction is done in the past 90 days as '1'.
Please help me with it quickly.
The easisest approach is to compare date difference of current date against windowed MAX partitioned by o_id:
SELECT *,
CASE
WHEN DATEDIFF('day', (MAX(Dt) OVER(PARTITION BY o_id)), CURRENT_DATE()) <= 90
THEN 1
ELSE 0
END AS Mark
FROM Tab;
Sample data:
ALTER SESSION SET DATE_INPUT_FORMAT = 'DD/MM/YYYY';
CREATE OR REPLACE TABLE tab(t_id INT,
Dt Date,
Ac_id INT,
O_id TEXT)
AS
SELECT 101, '23/04/2022' ,1 ,'A' UNION
SELECT 102, '06/07/2022' ,3 ,'C' UNION
SELECT 103, '01/08/2022' ,2 ,'A' UNION
SELECT 104, '13/03/2022' ,6 ,'B';
Output:
Snowflake supports natively BOOLEAN data types so entire query could be just:
SELECT *,
DATEDIFF('day', (MAX(Dt) OVER(PARTITION BY o_id)), CURRENT_DATE()) <= 90 AS Mark
FROM tab
Create a subquery where you identify all the distinct o_id where there is a recent transaction, and use that to update the main result.
The subquery would be:
select o_id, dt from table1
group by o_id
having datediff('day', max(Dt), current_date()) between 0 and 90;
Then your main query becomes:
Select *,'1' as Mark
From Tab
where o_id in
(select x.o_id from (select o_id, max(Dt)
from tab
group by o_id
having datediff('day', max(Dt), current_date()) between 0 and 90) x)
union all
select *,'0' as Mark
from Tab
where o_id not in
(select x.o_id from (select o_id, max(Dt)
from tab
group by o_id
having datediff('day', max(Dt), current_date()) between 0 and 90) x);

Get cumulative distinct count of active ids(ids where deleted date is null as of/before the modified date)

I am facing a problem while getting the cumulative distinct count of resource ids as of different modified dates in vertica. If you see the below table I have resource id, modified date and deleted date and I want to calculate the count of distinct active resources as of all unique modified dates. A resource is considered active when deleted date is null as of/before that modified date.
I was able to get the count when for a particular resource lets say resource id 1 the active count(deleted date null) or inactive count(deleted date not null) dont occur consecutively.
But when they occur consecutively I have to take the count as 1 till it becomes inactive and then I have to consider count as 0 for that resource id when it becomes inactive and all consecutive inactive values till it becomes active again. Likewise for all the distinct resource ids and cumulative sum of those.
sa_resource_id
modified_date
deleted_Date
1
2022-01-22 15:46:06.758
2
2022-01-22 15:46:06.758
16
2022-04-22 15:46:06.758
17
2022-04-22 15:46:06.758
18
2022-04-22 15:46:06.758
16
2022-04-29 15:46:06.758
2022-04-29 15:46:06.758
17
2022-04-29 15:46:06.758
2022-04-29 15:46:06.758
1
2022-05-22 15:46:06.758
2022-05-22 15:46:06.758
2
2022-05-22 15:46:06.758
2022-05-22 15:46:06.758
1
2022-05-23 22:16:06.758
1
2022-05-24 22:16:06.758
2022-05-24 22:16:06.758
1
2022-05-25 22:16:06.758
1
2022-05-27 22:16:06.758
This is the partition and sum query I have tried out where I partition the table based on resource ids and do sum over different modified dates.
SELECT md,
dca_agent_count
FROM
(
SELECT modified_date AS md,
SUM(SUM(CASE WHEN deleted_Date IS NULL THEN 1
WHEN deleted_Date IS NOT NULL THEN -1 ELSE 0
END)) OVER (ORDER BY modified_date) AS dca_agent_count
FROM
(
SELECT sa_resource_id,
modified_date,
deleted_Date,
ROW_NUMBER() OVER (
PARTITION BY sa_Resource_id, deleted_Date
ORDER BY modified_date desc
) row_num
FROM mf_Shared_provider_Default.dca_entity_resource_raw
WHERE sa_ResourcE_id IS NOT NULL
AND sa_resource_id IN ('1','2','34','16','17','18')
) t
GROUP BY modified_date
ORDER BY modified_Date
) b
Current Output:
md
dca_agent_count
2022-01-22 15:46:06.758
2
2022-04-22 15:46:06.758
5
2022-04-29 15:46:06.758
3
2022-05-22 15:46:06.758
1
2022-05-23 22:16:06.758
2
2022-05-24 22:16:06.758
1
2022-05-25 22:16:06.758
2
2022-05-27 22:16:06.758
3
If you see the output above all the values are correct except for the last row 27-05-2022 where i need to get count 2 only instead of 3
How do I get the cumulative distinct count of sa resource ids as of the modified dates based on deleted date condition(null/not null) and count should not change when deleted date (null/not null) occur consecutively
To me, a DATE has no hours, minutes, seconds, let alone second fractions, so I renamed the time containing attributes to %_ts, as they are TIMESTAMPs.
I had to completely start from scratch to solve it.
I think this is the first problem I had to solve with as much as 5 Common Table Expressions:
Add a Boolean is_active that is never NULL
Add the previous obtained is_active using LAG(). NULL here means there is no predecessor for the same resource id.
remove the rows whose previous is_active is equal to the current is_active.
UNION SELECT the positive COUNT DISTINCTs of the active rows and the negative COUNT DISTINCTs of the inactive rows. This also removes the last timestamp.
get the distinct timestamps from the original input for the final query
The final query takes CTE 5 and LEFT JOINs it with CTE 4, making a running sum of the obtained distinct counts.
Here goes:
WITH
-- not part of the final query: this is your input data
indata(sa_resource_id,modified_ts,deleted_ts) AS (
SELECT 1,TIMESTAMP '2022-01-22 15:46:06.758',NULL
UNION ALL SELECT 2,TIMESTAMP '2022-01-22 15:46:06.758',NULL
UNION ALL SELECT 16,TIMESTAMP '2022-04-22 15:46:06.758',NULL
UNION ALL SELECT 17,TIMESTAMP '2022-04-22 15:46:06.758',NULL
UNION ALL SELECT 18,TIMESTAMP '2022-04-22 15:46:06.758',NULL
UNION ALL SELECT 16,TIMESTAMP '2022-04-29 15:46:06.758',TIMESTAMP '2022-04-29 15:46:06.758'
UNION ALL SELECT 17,TIMESTAMP '2022-04-29 15:46:06.758',TIMESTAMP '2022-04-29 15:46:06.758'
UNION ALL SELECT 1,TIMESTAMP '2022-05-22 15:46:06.758',TIMESTAMP '2022-05-22 15:46:06.758'
UNION ALL SELECT 2,TIMESTAMP '2022-05-22 15:46:06.758',TIMESTAMP '2022-05-22 15:46:06.758'
UNION ALL SELECT 1,TIMESTAMP '2022-05-23 22:16:06.758',NULL
UNION ALL SELECT 1,TIMESTAMP '2022-05-24 22:16:06.758',TIMESTAMP '2022-05-24 22:16:06.758'
UNION ALL SELECT 1,TIMESTAMP '2022-05-25 22:16:06.758',NULL
UNION ALL SELECT 1,TIMESTAMP '2022-05-27 22:16:06.758',NULL
)
-- real query starts here, replace the following comma with "WITH" ...
,
-- need a "active flag" that is never null
w_active_flag AS (
SELECT
*
, (deleted_ts IS NULL) AS is_active
FROM indata
)
,
-- need current and previous is_active to filter ..
w_prev_flag AS (
SELECT
*
, LAG(is_active) OVER w AS prev_flag
FROM w_active_flag
WINDOW w AS(PARTITION BY sa_resource_id ORDER BY modified_ts)
)
,
-- use obtained filter arguments to filter out two consecutive
-- active or non-active rows for same sa_resource_id
-- this can remove timestamps from the final result
de_duped AS (
SELECT
sa_resource_id
, modified_ts
, is_active
FROM w_prev_flag
WHERE prev_flag IS NULL OR prev_flag <> is_active
)
-- get count distinct "sa_resource_id" only now
,
grp AS (
SELECT
modified_ts
, COUNT(DISTINCT sa_resource_id) AS dca_agent_count
FROM de_duped
WHERE is_active
GROUP BY modified_ts
UNION ALL
SELECT
modified_ts
, COUNT(DISTINCT sa_resource_id) * -1 AS dca_agent_count
FROM de_duped
WHERE NOT is_active
GROUP BY modified_ts
)
,
-- get back all input timestamps in a help table
tslist AS (
SELECT DISTINCT
modified_ts
FROM indata
)
SELECT
tslist.modified_ts
, SUM(NVL(dca_agent_count,0)) OVER w AS dca_agent_count
FROM tslist LEFT JOIN grp USING(modified_ts)
WINDOW w AS (ORDER BY tslist.modified_ts);
-- out modified_ts | dca_agent_count
-- out -------------------------+-----------------
-- out 2022-01-22 15:46:06.758 | 2
-- out 2022-04-22 15:46:06.758 | 5
-- out 2022-04-29 15:46:06.758 | 3
-- out 2022-05-22 15:46:06.758 | 1
-- out 2022-05-23 22:16:06.758 | 2
-- out 2022-05-24 22:16:06.758 | 1
-- out 2022-05-25 22:16:06.758 | 2
-- out 2022-05-27 22:16:06.758 | 2

SQL BigQuery. Check if user bought something previous month = old, if he didn't buy anything next month = churn

What we have:
user_id month
---- 2021-08
1414 2021-09
1414 2021-10
1414 2021-11
---- 2021-12
What we need:
user_id month
---- 2021-08
1414 2021-09 new user
1414 2021-10 old
1414 2021-11 churn
---- 2021-12
in the end, I'll aggregate all of the users with COUNT(new_user) and GROUP BY status...
However, I have a problem with this stage where I need to assign correct values to users within certain months
Something like this might work.
The first test CTE term is just to provide the test table data.
WITH test (user_id, month) AS (
SELECT 1414, '2021-09' UNION
SELECT 1414, '2021-10' UNION
SELECT 1414, '2021-11' UNION
SELECT null, '2021-08' UNION
SELECT null, '2021-12'
)
, xrows AS (
SELECT *
, LAG(month) OVER (PARTITION BY user_id ORDER BY month) AS lastval
, LEAD(month) OVER (PARTITION BY user_id ORDER BY month) AS nextval
FROM test
)
SELECT user_id, month
, CASE WHEN user_id IS NOT NULL THEN
CASE WHEN nextval IS NULL THEN 'churn'
WHEN lastval IS NULL THEN 'new user'
ELSE 'old'
END
END AS status
FROM xrows
ORDER BY month
;
-- or
WITH test (user_id, month) AS (
SELECT 1414, '2021-09' UNION
SELECT 1414, '2021-10' UNION
SELECT 1414, '2021-11' UNION
SELECT null, '2021-08' UNION
SELECT null, '2021-12'
)
, xrows AS (
SELECT *
, LAG(month) OVER w AS lastval
, LEAD(month) OVER w AS nextval
FROM test
WINDOW w AS (PARTITION BY user_id ORDER BY month)
)
SELECT user_id, month
, CASE WHEN user_id IS NOT NULL THEN
CASE WHEN nextval IS NULL THEN 'churn'
WHEN lastval IS NULL THEN 'new user'
ELSE 'old'
END
END AS status
FROM xrows
ORDER BY month
;
Result:
user_id
month
status
2021-08
1414
2021-09
new user
1414
2021-10
old
1414
2021-11
churn
2021-12

Pick right Effective Dates with backdated and future dates scenario

Scenario 1:
Table as,
IF OBJECT_ID('TEMPDB..#RUN_ID') IS NOT NULL
DROP TABLE #RUN_ID
;WITH RUN_ID as (
SELECT 1 AS RUN_ID,1 AS EMP_ID, '1/1/2018' STARTDT, 'A' AS VALUE
UNION
SELECT 2 AS RUN_ID,1 AS EMP_ID, '2/1/2018' STARTDT, 'A' AS VALUE
UNION
SELECT 3 AS RUN_ID,1 AS EMP_ID, '12/1/2017' STARTDT, 'A' AS VALUE
UNION
SELECT 4 AS RUN_ID,1 AS EMP_ID, '3/1/2018' STARTDT, 'A' AS VALUE
UNION
SELECT 5 AS RUN_ID,1 AS EMP_ID, '2/1/2018' STARTDT, 'A' AS VALUE
)
SELECT * INTO #RUN_ID from RUN_ID
RUN_ID EMP_ID STARTDT VALUE
1 1 1/1/2018 A
2 1 2/1/2018 A
3 1 12/1/2017 A
4 1 3/1/2018 A
5 1 2/1/2018 A
RUN_ID is every day incremental value in the table. VALUE Column can be same or different.
Need to derive the result for the STARTDT's as below:
RUN_ID EMP_ID STARTDT VALUE
3 1 12/1/2017 A
5 1 2/1/2018 A
Note: The last records of RUN ID 5 is over writing all other records, where StartDt of 2/1/2018 record to be in target and RUN ID 3 should be in result, as its over writing the previous RUN ID StartDT
Scenario2:
RUN_ID EMP_ID STARTDT VALUE
1 1 1/1/2018 A
2 1 11/1/2017 A
3 1 12/1/2017 A
4 1 3/1/2018 A
5 1 2/1/2018 A
In this case, result should be
RUN_ID EMP_ID STARTDT VALUE
2 1 11/1/2017 A
3 1 12/1/2017 A
5 1 2/1/2018 A
IF OBJECT_ID('TEMPDB..#RUN_ID') IS NOT NULL
DROP TABLE #RUN_ID
;WITH RUN_ID as (
SELECT 1 AS RUN_ID,1 AS EMP_ID, CAST('1/1/2018' AS DATE) STARTDT, 'A' AS VALUE
UNION
SELECT 2 AS RUN_ID,1 AS EMP_ID, CAST('11/1/2017' AS DATE) STARTDT, 'A' AS VALUE
UNION
SELECT 3 AS RUN_ID,1 AS EMP_ID, CAST('12/1/2017' AS DATE) STARTDT, 'A' AS VALUE
UNION
SELECT 4 AS RUN_ID,1 AS EMP_ID, CAST('3/1/2018' AS DATE) STARTDT, 'A' AS VALUE
UNION
SELECT 5 AS RUN_ID,1 AS EMP_ID, CAST('2/1/2018' AS DATE) STARTDT, 'A' AS VALUE
)
SELECT * INTO #RUN_ID from RUN_ID
SELECT *
FROM (
SELECT *
,LAG(STARTDT) OVER (PARTITION BY EMP_ID ORDER BY RUN_ID DESC) LAG_DATE
,CASE WHEN LAG(STARTDT) OVER (PARTITION BY EMP_ID ORDER BY RUN_ID DESC) IS NULL THEN 0
WHEN STARTDT < LAG(STARTDT) OVER (PARTITION BY EMP_ID ORDER BY RUN_ID DESC) THEN 0 ELSE 1 END SCD_IND
FROM (
SELECT *
, RANK() OVER (PARTITION BY EMP_ID,STARTDT ORDER BY RUN_ID DESC) RN
FROM #RUN_ID
) A
WHERE A.RN=1
) A WHERE SCD_IND=0
RUN_ID EMP_ID STARTDT VALUE RN LAG_DATE SCD_IND
5 1 2018-02-01 A 1 NULL 0
3 1 2017-12-01 A 1 2018-03-01 0
2 1 2017-11-01 A 1 2017-12-01 0

Finding Avg of following dataset

Following is the data.
select * from (
select to_date('20140601','YYYYMMDD') log_date, null weight from dual
union
select to_date('20140601','YYYYMMDD')+1 log_date, 0 weight from dual
union
select to_date('20140601','YYYYMMDD')+2 log_date, 4 weight from dual
union
select to_date('20140601','YYYYMMDD')+3 log_date, 4 weight from dual
union
select to_date('20140601','YYYYMMDD')+4 log_date, null weight from dual
union
select to_date('20140601','YYYYMMDD')+5 log_date, 8 weight from dual);
Log_date weight avg_weight
----------------------------------
6/1/2014 NULL 0 (0/1) Since no previous data, I consider it as 0
6/2/2014 0 0 ((0+0)/2)
6/3/2014 4 4/3 ((0+0+4)/3)
6/4/2014 4 2 (0+0+4+4)/4
6/5/2014 NULL 2 (0+0+4+4+2)/5 Since it is NULL I want to take previous day avg = 2
6/6/2014 8 3 (0+0+4+4+2+8)/6 =3
So the average for the above data should be 3.
How can I achieve this in SQL instead of PLSQL. Appreciate any help on this.
I just learned how to use recursive CTEs today, really excited! Hope this helps...
; WITH RawData (log_Date, Weight) AS (
select cast('2014-06-01' as SMALLDATETIME)+0, null
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+1, 0
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+2, 4
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+3, 4
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+4, null
UNION ALL select cast('2014-06-01' as SMALLDATETIME)+5, 8
)
, IndexedData (Id, log_Date, Weight) AS (
SELECT ROW_NUMBER() OVER (ORDER BY log_Date)
, log_Date
, Weight
FROM RawData
)
, ResultData (Id, log_Date, Weight, total, avg_weight) AS (
SELECT Id
, log_Date
, Weight
, CAST(CASE WHEN Weight IS NULL THEN 0 ELSE Weight END AS FLOAT)
, CAST(CASE WHEN Weight IS NULL THEN 0 ELSE Weight END AS FLOAT)
FROM IndexedData
WHERE Id = 1
UNION ALL
SELECT i.Id
, i.log_Date
, i.Weight
, CAST(r.total + CASE WHEN i.Weight IS NULL THEN r.avg_weight ELSE i.Weight END AS FLOAT)
, CAST(r.total + CASE WHEN i.Weight IS NULL THEN r.avg_weight ELSE i.Weight END AS FLOAT) / i.Id
FROM ResultData r
JOIN IndexedData i ON i.Id = r.Id + 1
)
SELECT Log_Date, Weight, avg_weight FROM ResultData
OPTION (MAXRECURSION 0)
This gives the output:
Log_Date Weight avg_weight
----------------------- ----------- ----------------------
2014-06-01 00:00:00 NULL 0
2014-06-02 00:00:00 0 0
2014-06-03 00:00:00 4 1.33333333333333
2014-06-04 00:00:00 4 2
2014-06-05 00:00:00 NULL 2
2014-06-06 00:00:00 8 3
Note that in my answer, I modified the "Data" section of your question as it didn't compile for me. It's still the same data though, hope it helps.
Edit: By default, MAXRECURSION is set to 100. This means that the query will not work for more than 101 rows of Raw Data. By adding the OPTION (MAXRECURSION 0), I have removed this limit so that the query works for all input data. However, this can be dangerous if the query isn't tested thoroughly because it might lead to infinite recursion.