How to select statistics table counting occurrences between two dates

How to select statistics table counting occurrences between two dates - sql

I need to count occurrences of protocol violations and durations between 2 dates from table to achieve effect like statistics table which will look like at the picture below:
Expected effect:
Explanation:
As you can see I need to select 'Country', 'Site' existing in Violations table and: 'Numbers', 'Maximum', 'Minimum' and 'Mean' of protocol violations duration existing in DB in the same table 'Violations' between two dates. So we have to count:
protocol violations occurrences existing in Violations table by country and site
min/max/avg durations of protocol violations by country and site
under two different conditions:
occurrences from Date Discovered to Date Reported
occurrences from Date Reported to Date Confirmed
Database Structure:
Available at SQLFILDDLE: Look HERE
I will add that code in attached SQLFIDDLE has more tables and an query but they are unnecessary right now for this problem. Feel free to use it.
I didn't remove old query because there is nice way to do:
'- All -' and
'- Unknown -' values. -
Violation table:
create table violations (
id long,
country varchar(20),
site varchar(20),
status_id int,
trial_id int,
discovered_date date,
reporded_date date,
confirmed_date date
);
Site table:
create table site (
id long,
site varchar(20)
);
My First try:
Here is my new SQLFIDDLE with query needed to improve commented lines:
SELECT v.country as country, v.site as site,
COUNT(*) as N --,
--MAX(list of durations in days between discovered date to repored date on each violation by country and site) as "Maximum",
--MIN(list of durations in days between discovered date to repored date on each violation by country and site) as "Minimum",
--AVG(list of durations in days between discovered date to repored date on each violation by country and site) as "Mean"
FROM violations v
WHERE v.trial_id = 3
GROUP BY ROLLUP (v.country, v.site)
I've managed to create abstract query with my idea. But I have a problem to write correct query for MAX, MIN and AVG where we must select max/min/avg value from list of durations in days between discovered date to reported date on each violation by country and site.
Could you help me please?

Please check this query. It is simplified and may give you an idea and direction. If you need more then this then let me know. Copy and paste to see results. This query will select and calc only the results between two dates in where clause. You need to run inner query first w/out where to see all dates etc... This query counts violations between 2 dates. Not sure what is the list of duration in days... See below for count of duration. You may add MAX/MIN etc...
-- Days between (duration) = (end_date-start_date) = number of days (number) --
SELECT (to_date('14-MAR-2013') - to_date('01-MAR-2013')) days_between
FROM dual
/
SELECT country, site
, Count(*) total_viol
, MAX(susp_viol) max_susp_viol
, MIN(susp_viol) min_susp_viol
FROM
(
SELECT 'GERMANY' country, '12222' site, 1 susp_viol, 2 conf_viol, trunc(Sysdate-30) disc_date, trunc(Sysdate-25) conf_date
FROM dual
UNION
SELECT 'GERMANY', '12222' , 3 , 14, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'GERMANY', '12222' , 6 , 25, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'GERMANY', '12222' , 2 , 1, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'GERMANY', '13333' , 10 , 5, trunc(Sysdate-15) , trunc(Sysdate-10) FROM dual
UNION
SELECT 'GERMANY', '13333' , 15 , 3, trunc(Sysdate-15) , trunc(Sysdate-10) FROM dual
UNION
SELECT 'GERMANY', 'Unknown Site' , 0 , 7, trunc(Sysdate-5) , trunc(Sysdate-2) FROM dual
UNION
SELECT 'RUSSIA', '12345' , 1 , 5, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'RUSSIA', '12345' , 2 , 10, trunc(Sysdate-15) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'RUSSIA', 'Unknown Site' , 10 , 10, trunc(Sysdate-3) , trunc(Sysdate-1) FROM dual
)
-- replace sysdate with your_date-default format is to_date('14-MAR-2013') or give format mask
WHERE conf_date BETWEEN trunc(Sysdate-20) AND trunc(Sysdate-10)
GROUP BY ROLLUP (country, site)
ORDER BY country, site
/
Count of duration:
SELECT country, site, (conf_date-disc_date) duration, count(*) total_durations
FROM
(
SELECT 'GERMANY' country, '12222' site, 1 susp_viol, 2 conf_viol, trunc(Sysdate-30) disc_date, trunc(Sysdate-20) conf_date
FROM dual
UNION
SELECT 'GERMANY', '12222' , 3 , 14, trunc(Sysdate-20) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'GERMANY', '12222' , 6 , 25, trunc(Sysdate-20) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'GERMANY', '12222' , 2 , 1, trunc(Sysdate-20) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'GERMANY', '13333' , 10 , 5, trunc(Sysdate-12) , trunc(Sysdate-6) FROM dual
UNION
SELECT 'GERMANY', '13333' , 15 , 3, trunc(Sysdate-17) , trunc(Sysdate-11) FROM dual
UNION
SELECT 'GERMANY', 'Unknown Site' , 0 , 7, trunc(Sysdate-5) , trunc(Sysdate-2) FROM dual
UNION
SELECT 'RUSSIA', '12345' , 1 , 5, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'RUSSIA', '12345' , 2 , 10, trunc(Sysdate-15) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'RUSSIA', 'Unknown Site' , 10 , 10, trunc(Sysdate-3) , trunc(Sysdate-1) FROM dual
)
WHERE conf_date BETWEEN trunc(Sysdate-20) AND trunc(Sysdate-10)
GROUP BY ROLLUP (country, site, (conf_date-disc_date))
ORDER BY country, site
/

Related

Using Pivot in Oracle SQL to dynamically show one or two columns in case multiple records are present

I am working in Oracle Fusion HCM and would like to create a query which pulls an employee's base data such as name, location, etc. We also want to include the managers.
Our manager structure is as such so that there's 1 line manager and 1 to n (realistically not more than 3) matrix managers, named 'REVIEWER'.
I have a working code that fetches the data, but it gives issues when there's not exactly 2 managers. When there's 1, it shows the same name twice and if there's 3, there is one that is not shown.
Can anyone help me out on how to fetch the correct manager names without using the MIN/MAX aggregrates? My query is already fetching the correct data, but my pivot clause is not working correctly.
Select DISTINCT *
from
(
SELECT DISTINCT
emplName.DISPLAY_NAME Worker_Name,
INITCAP(loc.LOCATION_NAME) Location_Name,
gra.NAME Grade_Name,
hou.NAME Department_Name,
ass.MANAGER_TYPE Manager_Type,
mgr.DISPLAY_NAME Manager_Name,
REPLACE(ctr.CONTRACT_END_DATE,'4712-12-31') Contract_End_Date,
aa.ASSIGNMENT_NUMBER
FROM
PER_ALL_ASSIGNMENTS_M aa,
PER_ASSIGNMENT_SUPERVISORS_F ass,
PER_PERSON_NAMES_F emplName,
PER_ALL_PEOPLE_F empl,
PER_PERSON_NAMES_F mgr,
HR_ORGANIZATION_UNITS hou,
HR_LOCATIONS_ALL_F_VL loc,
PER_GRADES_F_TL gra,
PER_CONTRACTS_F ctr
WHERE
aa.ASSIGNMENT_ID (+) = ass.ASSIGNMENT_ID
AND emplName.PERSON_ID = ass.PERSON_ID
AND ass.MANAGER_ID = mgr.PERSON_ID
AND empl.PERSON_ID = ass.PERSON_ID
AND hou.ORGANIZATION_ID = aa.ORGANIZATION_ID
AND loc.LOCATION_ID = aa.LOCATION_ID
AND gra.GRADE_ID = aa.GRADE_ID
AND ctr.CONTRACT_ID = aa.CONTRACT_ID
AND aa.ASSIGNMENT_STATUS_TYPE = 'ACTIVE'
AND to_char(ass.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712'
AND to_char(aa.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712'
AND to_char(ctr.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712'
AND gra.SOURCE_LANG = 'US'
AND gra.NAME in (:p_grade)
AND hou.NAME in (:p_department)
AND INITCAP(loc.LOCATION_NAME) in (:p_location)
AND (ctr.CONTRACT_END_DATE <= (:p_contractenddate)
OR (:p_contractenddate) is null)
) S
Pivot
(
MAX(Manager_Name) Manager1,
MIN(Manager_Name) Manager2
for manager_type in
('LINE_MANAGER' as Line_Manager,
'REVIEWER' as Reviewer
))
Piv
The data regarding managers is recorded in PER_ASSIGNMENT_SUPERVISORS_F ass as follows:
ASSIGNMENT_ID
MANAGER_TYPE
MANAGER_ID
0129312
LINE_MANAGER
2343943
0129312
REVIEWER
456756
0129312
REVIEWER
456334
0129312
REVIEWER
234324
1232232
LINE_MANAGER
232242
1232232
REVIEWER
122312
Edit: Table formatting was broken

Use:
Select *
from (
SELECT ass.assignment_id,
ass.person_id,
ass.MANAGER_TYPE Manager_Type,
mgr.DISPLAY_NAME Manager_Name,
ROW_NUMBER() OVER (
PARTITION BY ass.assignment_id, ass.person_id, ass.manager_type
ORDER BY mgr.display_name
) AS rn
FROM PER_ASSIGNMENT_SUPERVISORS_F ass
INNER JOIN PER_PERSON_NAMES_F mgr
ON (ass.MANAGER_ID = mgr.PERSON_ID)
WHERE ass.EFFECTIVE_END_DATE = DATE '4712-12-31'
)
PIVOT (
MAX(Manager_Name)
for (manager_type, rn) in (
('LINE_MANAGER', 1) as Line_Manager,
('REVIEWER', 1) as Reviewer1,
('REVIEWER', 2) as Reviewer2,
('REVIEWER', 3) as Reviewer3
)
)
Then join the rest of the tables to that pivoted query (rather than trying to join first and then pivot).
Which, for the (minimal) sample data:
CREATE TABLE PER_ASSIGNMENT_SUPERVISORS_F (assignment_id, person_id, manager_id, manager_type, effective_end_date) AS
SELECT 1, 1, 2, 'LINE_MANAGER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 1, 1, 3, 'REVIEWER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 1, 1, 4, 'REVIEWER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 1, 1, 5, 'REVIEWER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 2, 2, 3, 'LINE_MANAGER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 2, 2, 4, 'REVIEWER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 2, 2, 5, 'REVIEWER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 3, 3, 4, 'LINE_MANAGER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 3, 3, 5, 'REVIEWER', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 4, 4, 5, 'LINE_MANAGER', DATE '4712-12-31' FROM DUAL;
CREATE TABLE PER_PERSON_NAMES_F (person_id, display_name) AS
SELECT 1, 'Alice' FROM DUAL UNION ALL
SELECT 2, 'Beryl' FROM DUAL UNION ALL
SELECT 3, 'Carol' FROM DUAL UNION ALL
SELECT 4, 'Debra' FROM DUAL UNION ALL
SELECT 5, 'Emily' FROM DUAL;
Outputs:
ASSIGNMENT_ID
PERSON_ID
LINE_MANAGER
REVIEWER1
REVIEWER2
REVIEWER3
1
1
Beryl
Carol
Debra
Emily
2
2
Carol
Debra
Emily
null
3
3
Debra
Emily
null
null
4
4
Emily
null
null
null
fiddle

Rewrote the query based on MT0s answer. For anyone interested in the end-result:
Select *
from
(
SELECT
emplName.DISPLAY_NAME Worker_Name,
INITCAP(loc.LOCATION_NAME) Location_Name,
gra.NAME Grade_Name,
hou.NAME Department_Name,
ass.MANAGER_TYPE Manager_Type,
mgr.DISPLAY_NAME Manager_Name,
ROW_NUMBER() OVER (
PARTITION BY aa.ASSIGNMENT_NUMBER, ass.assignment_id, ass.person_id, gra.NAME, hou.NAME, ass.manager_type
ORDER BY mgr.display_name
) AS rn,
REPLACE(ctr.CONTRACT_END_DATE,'4712-12-31') Contract_End_Date,
aa.ASSIGNMENT_NUMBER
FROM
PER_ALL_ASSIGNMENTS_F aa
LEFT JOIN PER_ASSIGNMENT_SUPERVISORS_F ass
ON (aa.ASSIGNMENT_ID = ass.ASSIGNMENT_ID
AND to_char(ass.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712')
LEFT JOIN PER_PERSON_NAMES_F_V emplName
ON (ass.PERSON_ID = emplName.PERSON_ID
AND to_char(emplName.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712'
AND emplName.NAME_TYPE = 'GLOBAL')
LEFT JOIN PER_ALL_PEOPLE_F empl
ON (empl.PERSON_ID = ass.PERSON_ID
AND to_char(empl.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712')
LEFT JOIN PER_PERSON_NAMES_F mgr
ON (mgr.PERSON_ID = ass.MANAGER_ID
AND to_char(mgr.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712'
AND mgr.NAME_TYPE = 'GLOBAL')
LEFT JOIN HR_ORGANIZATION_UNITS hou
ON (hou.ORGANIZATION_ID = aa.ORGANIZATION_ID
AND to_char(hou.DATE_TO, 'DD/MM/YYYY') = '31/12/4712')
LEFT JOIN HR_LOCATIONS_ALL_F_VL loc
ON (loc.LOCATION_ID = aa.LOCATION_ID
AND to_char(loc.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712')
LEFT JOIN PER_GRADES_F_TL gra
ON (gra.GRADE_ID = aa.GRADE_ID
AND gra.LANGUAGE = 'US'
AND to_char(gra.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712')
LEFT JOIN PER_CONTRACTS_F ctr
ON (ctr.CONTRACT_ID = aa.CONTRACT_ID
AND to_char(ctr.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712')
WHERE 1=1
AND aa.ASSIGNMENT_STATUS_TYPE = 'ACTIVE'
AND to_char(aa.EFFECTIVE_END_DATE, 'DD/MM/YYYY') = '31/12/4712'
-- PARAMETERS
AND gra.NAME in (:p_grade)
AND hou.NAME in (:p_department)
AND INITCAP(loc.LOCATION_NAME) in (:p_location)
AND (ctr.CONTRACT_END_DATE <= (:p_contractenddate)
OR (:p_contractenddate) is null)
) S
Pivot
(
MAX(Manager_Name)
for (manager_type, rn) in (
('LINE_MANAGER', 1) as Line_Manager,
('REVIEWER', 1) as Reviewer1,
('REVIEWER', 2) as Reviewer2,
('REVIEWER', 3) as Reviewer3
))
Piv

sql: cummulative sum (partition by clients order by date)

Would you, pleace, help me, to count cummulative sum in sql server 2017. Condition is: 1) partition by client 2) order by date_tm. Desirable result is in the table below.
create table #clients (client nvarchar(1)
, date_tm datetime
,sum_pay int
, desirable_result int)
insert into #clients
(client, date_tm, sum_pay, desirable_result)
select '1', '2020-01-01', 10, 10 union all
select '1', '2020-01-02', 20, 30 union all
select '2', '2020-01-03', 20, 60 union all
select '2', '2020-01-01', 20, 20 union all
select '2', '2020-01-02', 20, 40 union all
select '3', '2020-01-01', 20, 20 union all
select '3', '2020-01-04', 20, 70 union all
select '3', '2020-01-02', 30, 50
select * from #clients
drop table if exists #clients
Thank you very much.

are finding below
select c.*,sum(sum_pay) over(partition by client order by date_tm)
from #clients c

You can use sum()over() window function as below:
select * ,SUM (sum_pay) OVER (partition by client order by date_tm) AS cummulativesum from #clients

SELECT * ,
CASE WHEN desirable_result = cum_sum THEN 'OK' ELSE 'NO' END AS Status
FROM
(
select
*,
SUM (sum_pay) OVER (partition by client order by date_tm) AS cum_sum
from #clients as tbl
) as a
with this code you can compare, desirable_result and cummilative sum

SQL query to find ids in the same table but different timestamp events (cohorts)

I need to write a query that gives me the count with the following logic. The example below shows that ACCOUNT_ID 123 signup in 2020-02-21 so M0 is 1 and then the same ACCOUNT_ID had an event in the consecutive month so M1 is 1.
M0 is a the signup date
M1 is signup date + 1 month
M2 is signup date + 2 consecutive months
M3 is signup date + 3 consecutive months
WITH M_O AS (
SELECT
parsed_data."ACCOUNT_ID" AS "parsed_data.account_id",
MIN(TO_CHAR(TO_DATE(parsed_data."TIMESTAMP"::timestamp_ntz ), 'YYYY-MM-DD')) AS "SIGNUP",
COUNT(DISTINCT (parsed_data."ACCOUNT_ID") ) AS "COUNT_USERS_O"
FROM "PUBLIC"."PARSED_DATA"
AS parsed_data
WHERE (parsed_data."ACCOUNT_ID") IS NOT NULL
AND (((parsed_data."EVENT") = 'Started'))
AND (
((TO_CHAR(TO_DATE(parsed_data."TIMESTAMP"::timestamp_ntz ), 'YYYY-MM-DD')) >= '2020-02-21')
AND ((parsed_data."TIMESTAMP"::timestamp_ntz ) < CURRENT_DATE())
)
GROUP BY 1),
M_1 AS (
SELECT
parsed_data."ACCOUNT_ID" AS "parsed_data.account_id",
TO_CHAR(TO_DATE(parsed_data."TIMESTAMP"::timestamp_ntz ), 'YYYY-MM-DD') AS "parsed_data.timestamp_date",
COUNT(DISTINCT (parsed_data."ACCOUNT_ID") ) AS "COUNT_USERS_1"
FROM "PUBLIC"."PARSED_DATA"
AS parsed_data INNER JOIN M_O ON parsed_data.account_id = M_O."parsed_data.account_id"
WHERE
(parsed_data."ACCOUNT_ID") IS NOT NULL
AND (((parsed_data."EVENT") = 'Started'))
AND (
(TO_CHAR(TO_DATE(parsed_data."TIMESTAMP"::timestamp_ntz ), 'YYYY-MM-DD')) >= DATEADD('MONTH', 1, SIGNUP)
AND ((parsed_data."TIMESTAMP"::timestamp_ntz ) < CURRENT_DATE())
)
GROUP BY 1,2
)

It looks like you want to create cohorts? As in "establish the creation date for each id, and then look how they changed their behavior every month thereafter".
This code should work:
with events as (
select 1 id, '2020-01-01'::date e_date
union all select 1, '2020-02-03'
union all select 2, '2020-03-01'
union all select 2, '2020-05-08'
union all select 3, '2020-08-01'
union all select 3, '2020-09-02'
union all select 3, '2020-09-22'
union all select 3, '2020-09-30'
union all select 3, '2020-10-10'
),
first_per_id as (
select id, min(e_date) first_date
from events
group by id
)
select a.id
, count_if(e_date>=dateadd(month, 0, first_date) and e_date<dateadd(month, 1, first_date)) m0
, count_if(e_date>=dateadd(month, 1, first_date) and e_date<dateadd(month, 2, first_date)) m1
, count_if(e_date>=dateadd(month, 2, first_date) and e_date<dateadd(month, 3, first_date)) m2
from events a
join first_per_id b
where a.id=b.id
group by 1

How to select next value

The task is to see the date of payment of the loan. If it falls on a date where there is no such number, it does not show data but should show the first date from the next month
Sql work good if i enter date 15.01.2019
But if i can enter date 31.01.2019 i have problem .
I can not see correct result sql request.
With days as (
Select rownum As Day from All_Objects where Rownum<=31
),
a as (Select 'WHWWHHWWWWWHHWWWWWHHWWWWWHHWWWW' as hl ,1 as Mnth,2019 as Yr from Dual
Union All
Select 'WHHWWWWWHHWWWWWHHWWWWWHHWWWW' as hl ,2 as Mnth,2019 as Yr from Dual
Union All
Select 'WHHWWWWHHHWWWWWHHWWHHHHHHHWWWHH' as hl ,3 as Mnth,2019 as Yr from Dual
Union All
Select 'WWWWWHHWWWWWHHWWWWWHHWWWWWHHWW' as hl ,4 as Mnth,2019 as Yr from Dual
Union All
Select 'WWWHHWWWHWHHWWWWWHHWWWWWHHWHWWW' as hl ,5 as Mnth,2019 as Yr from Dual
Union All
Select 'HHWWHHWHHWWWWWHHHWWWWHHWWHWWHH' as hl ,6 as Mnth,2019 as Yr from Dual
Union All
Select 'WWWWWHHWWWWWHHWWWWWHHWWWWWHHWWW' as hl ,7 as Mnth,2019 as Yr from Dual
)
,
Alll as
(Select TO_Date(Yr|| substr('0'||Mnth,-2,2)||substr('0'||Day,-2,2),'YYYYMMDD') as Dt,a.Yr,a.Mnth,Days.Day,substr(a.Hl,Days.Day,1) as Daytype from Days,a Where Days.Day<=Length(a.Hl)
),
Taksit as
(
Select To_Date('31.01.2019') as TDate, 1000 as Amount ,3 as Tcount from Dual
),
PD as (
Select
A.Dt,A.DayType , Case when A.DayType='H' then Min(W.Dt) else A.Dt end As PayableDate
From Alll A inner Join Alll W on W.DT>=A.DT and W.DayType='W'
Group by A.Dt, A.Daytype
Order by 1
),
PreResult as
(
Select PD.PayableDate,Amount,TCount,Max(PD.PayableDate) over (Partition by 'Contract') as MPD
From PD inner join Taksit T on PD.DT between add_months(T.TDate,1) and Add_Months(T.TDate,TCount)
and TO_Char(PD.DT,'DD')=TO_Char(T.TDate,'DD')
)
Select
PayableDate, Case when PayableDate=MPD then Amount-(Round(Amount/TCount,2)*(TCount-1)) else Round(Amount/TCount,2) end PayAmount
from PreResult

You have used TO_CHAR(PD.DT, 'DD') = TO_CHAR(T.TDATE, 'DD') but I don't think that Feb month has any date which will match with it.
Ideally, you should use add_month function as following in PRERESULT (I believe you need only 3 months data)
PRERESULT AS (
SELECT
PD.PAYABLEDATE,
AMOUNT,
TCOUNT,
MAX(PD.PAYABLEDATE) OVER(
PARTITION BY 'Contract'
) AS MPD
FROM
PD
INNER JOIN TAKSIT T ON PD.DT BETWEEN ADD_MONTHS(T.TDATE, 1) AND ADD_MONTHS(T.TDATE, TCOUNT)
AND PD.DT IN (ADD_MONTHS(T.TDATE, 1), ADD_MONTHS(T.TDATE, 2), ADD_MONTHS(T.TDATE, 3))
-- AND TO_CHAR(PD.DT, 'DD') = TO_CHAR(T.TDATE, 'DD')
)
It is giving 3 dates with 31.01.2019 and also it is working as expected in the case of 15.01.2019 also.
I think you should check if it is giving an expected result with 31.01.2019 as you have not mentioned the expected result. see this db<>fiddle demo
Cheers!!

Hive windowing query

I have a base Hive table with following schema:
And I want the below output:
So basically, grouping on all columns, and calculating the count distinct Encounters in that month and last 3 months (including that month).
For example, for DischargeMonthYear Jan-2018, num_discharges_last_30_days would be patients discharged in Jan-2018 (3) and num_discharges_last_90_days would be patients discharged in Nov-17, Dec-17 and Jan-18. Since there is no data before Jan-18 in this case, both counts would be the same.
Similarly for Mar-18, num_discharges_last_90_days should include counts for Jan, Feb and Mar-18 months (3+2+2 = 7).
For Jun-18, since we have no data for Apr and May-18, it should include counts only for Jun-18 and NOT got to the previous group/partition.
I have the below query that gives me the correct total for num_discharges_last_90_days till Jun-18 but does not follow the grouping of earlier columns and for Jul-18 it also includes Jun-18 totals which should not be the case since the region is different.
If I add a PARTITION BY region (and others) clause for it, num_discharges_last_90_days is correct for Jul-18 now, but incorrect for Jun-18 since it includes the Feb and Mar-18 totals.
`
DROP TABLE IF EXISTS Encounter;
CREATE TEMPORARY TABLE Encounter
(
Encounter_no int,
Admit_date date,
discharge_date date,
region varchar(50),
Facilityname varchar(50),
Payertype varchar(10),
Payernamme varchar(20),
patient_type varchar(10)
);
INSERT INTO Encounter
select 12345, '2018-01-01', '2018-01-05', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12346, '2018-01-02', '2018-01-06', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12347, '2018-01-03', '2018-01-07', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12348, '2018-02-04', '2018-02-08', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12349, '2018-02-05', '2018-02-09', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12350, '2018-03-06', '2018-03-10', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12351, '2018-03-07', '2018-03-11', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12352, '2018-06-08', '2018-06-12', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12353, '2018-06-09', '2018-06-13', 'Midwest', 'ABC', 'MCR', 'MCR123', 'IP' union all
select 12354, '2018-07-10', '2018-07-14', 'NorthEast', 'ABC', 'MCR', 'MCR123', 'IP'
;
--SELECT from_unixtime(unix_timestamp(e.discharge_date, 'yyyy-MM-dd'),'MM') AS `Discharge_Month` FROM Encounter e
--Below CTE is used to get all month numbers
WITH R AS
(
SELECT '01' AS MonthNum
UNION ALL SELECT '02'
UNION ALL SELECT '03'
UNION ALL SELECT '04'
UNION ALL SELECT '05'
UNION ALL SELECT '06'
UNION ALL SELECT '07'
UNION ALL SELECT '08'
UNION ALL SELECT '09'
UNION ALL SELECT '10'
UNION ALL SELECT '11'
UNION ALL SELECT '12'
)
SELECT * FROM
(
--Perform a left join on CTE with your query to get all months
SELECT
R.MonthNum,
e.region,
e.facilityname,
from_unixtime(unix_timestamp(e.discharge_date, 'yyyy-MM-dd'),'MMM-yyyy') AS Discharge_Month,
e.Payertype,
e.Payernamme,
e.patient_type,
CASE WHEN COALESCE(e.region, '') <> ''
THEN COUNT(1)
ELSE 0
END
as num_discharges_last_30_days,
SUM(
CASE WHEN COALESCE(e.region, '') <> ''
THEN COUNT(1)
ELSE 0
END
)
OVER (ORDER BY R.MonthNum
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as num_discharges_last_90_days
FROM R
LEFT JOIN Encounter e
ON R.MonthNum = from_unixtime(unix_timestamp(e.discharge_date, 'yyyy-MM-dd'),'MM')
GROUP BY
R.MonthNum,
e.region,
e.facilityname,
from_unixtime(unix_timestamp(e.discharge_date, 'yyyy-MM-dd'),'MMM-yyyy'),
e.Payertype,
e.Payernamme,
e.patient_type
) A
WHERE A.region IS NOT NULL
;
`

My colleague cracked the question using the below query. It needed a self-join and CASE & WHERE clauses to only count the last 3 months calculation.
WITH CTE AS (
SELECT a.region,a.facilityname,a.payertype,a.payernamme,a.patient_type, LAST_DAY(a.discharge_date) AS month_year, COUNT(encounter_no) AS measure_1
FROM Encounter AS a
GROUP BY a.region,a.facilityname,a.payertype,a.payernamme,a.patient_type, LAST_DAY(a.discharge_date)
)
-- SELECT * FROM CTE AS a;
SELECT a.region,a.facilityname,a.payertype,a.payernamme,a.patient_type, a.month_year, MAX(a.measure_1) AS measure_1,
SUM(IF(b.month_year IS NULL, a.measure_1, b.measure_1)) AS measure_2
FROM CTE AS a
LEFT JOIN CTE AS b
ON a.region = b.region
AND a.facilityname = b.facilityname
AND a.payertype = b.payertype
AND a.payernamme = b.payernamme
AND a.patient_type = b.patient_type
WHERE ( b.month_year BETWEEN add_months(a.month_year, -2) AND a.month_year
OR b.month_year IS NULL)
GROUP BY a.region,a.facilityname,a.payertype,a.payernamme,a.patient_type, a.month_year;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to select statistics table counting occurrences between two dates - sql

Related

Using Pivot in Oracle SQL to dynamically show one or two columns in case multiple records are present

sql: cummulative sum (partition by clients order by date)

SQL query to find ids in the same table but different timestamp events (cohorts)

How to select next value

Hive windowing query

Categories

Resources