Oracle SQL: How to fill Null value with data from most recent previous date that is not null? - sql

Essentially date field is updated every month along with other fields, however one field is only updated ~6 times throughout the year. For months where that field is not updated, looking to show the most recent previous data
Date
Emp_no
Sales
Group
Jan
1234
100
Med
Feb
1234
200
---
Mar
1234
170
---
Apr
1234
150
Low
May
1234
180
---
Jun
1234
90
High
Jul
1234
100
---
Need it to show:
Date
Emp_no
Sales
Group
Jan
1234
100
Med
Feb
1234
200
Med
Mar
1234
170
Med
Apr
1234
150
Low
May
1234
180
Low
Jun
1234
90
High
Jul
1234
100
High
This field is not updated at set intervals, could be 1-4 months of Nulls in a row
Tried something like this to get the second most recent date but unsure how to deal with the fact that i could need between 1-4 months prior
LAG(Group)
OVER(PARTITION BY emp_no
ORDER BY date)
Thanks!

This is the traditional "gaps and islands" problem.
There are various ways to solve it, a simple version will work for you.
First, create a new identifier that splits the rows in to "groups", where only the first row in each group is NOT NULL.
SUM(CASE WHEN "group" IS NOT NULL THEN 1 ELSE 0 END) OVER (PARTION BY emp_no ORDER BY "date") AS emp_group_id
Then you can use MAX() in another window function, as all "groups" will only have one NOT NULL value.
WITH
gaps
AS
(
SELECT
t.*,
SUM(
CASE WHEN "group" IS NOT NULL
THEN 1
ELSE 0
END
)
OVER (
PARTITION BY emp_no
ORDER BY "date"
)
AS emp_group_id
FROM
your_table t
)
SELECT
"date",
emp_no,
sales,
MAX("group")
OVER (
PARTITION BY emp_no, emp_group_id
)
AS "group"
FROM
gaps
Edit
Ignore all that.
Oracle has IGNORE NULLS.
LAST_VALUE("group" IGNORE NULLS)
OVER (
PARTITION BY emp_no
ORDER BY "date"
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
)
AS "group"

Related

How to select rows where logged in last month and logged min 1 time in one of month preceding August in Oracle SQL?

I have table in Oracle SQL presents ID of clients and date with time of their login to application:
ID | LOGGED
----------------
11 | 2021-07-10 12:55:13.278
11 | 2021-08-10 13:58:13.211
11 | 2021-02-11 12:22:13.364
22 | 2021-01-10 08:34:13.211
33 | 2021-04-02 14:21:13.272
I need to select only these clients (ID) who has logged minimum 1 time in last month (August) and minimum 1 time in one month preceding August (June or July)
Currently we have September, so...
I need clients who has logged min 1 time in August
and min 1 time in July or Jun,
if logged in June -> not logg in July
if logged in July -> not logged in June
As a result I need like below:
ID
----
11
How can do that in Oracle SQL ? be aware that column "LOGGED" has Timestamp like: 2021-01-10 08:34:13.211
May be you consider this:
select id
from yourtable
group by id
having count(case
months_between(trunc(sysdate,'MM'),
trunc(logged,'MM')
) when 1 then 1 end
) >= 1
and count
(case when
months_between(trunc(sysdate,'MM') ,
trunc(logged,'MM')
) in (2,3) then 1 end
) = 1
I don't understand one thing:
You wrote :
minimum 1 time in one month preceding August (June or July)
and after then:
if logged in June -> not logg in July
if logged in July -> not logged in June
If you need EXACTLY one month- June or July
just consider my query above.
If you need minimum one logon in June and July, then:
select id
from yourtable
group by id
having count(case
months_between(trunc(sysdate,'MM'),
trunc(logged,'MM')
) when 1 then 1 end
) >= 1
and count
(case when
months_between(trunc(sysdate,'MM') ,
trunc(logged,'MM')
) in (2,3) then 1 end
) >= 1
Your question needs some clarification, but based on what you were describing I am seeing a couple of options.
The simplest one is probably using a combo of data densification (for generating a row for every month for each id) plus an analytical function (for enabling inter-row calculations. Here's a simple example of this:
rem create a dummy table with some more data (you do not seem to worry about the exact timestamp)
drop table logs purge;
create table logs (ID number, LOGGED timestamp);
insert into logs values (11, to_timestamp('2021-07-10 12:55:13.278','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-07-11 12:55:13.278','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-08-10 13:58:13.211','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-02-11 12:22:13.364','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-04-11 12:22:13.364','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (22, to_timestamp('2021-01-10 08:34:13.211','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (33, to_timestamp('2021-04-02 14:21:13.272','yyyy-mm-dd HH24:MI:SS.FF'));
commit;
The following SQL gets your data densified and lists the total count of logins for a month and the previous month on the same row so that you could do a comparative calculation. I have not done then, but I am hoping you get the idea.
with t as
(-- dummy artificial table just to create a time dimension for densification
select distinct to_char(sysdate - rownum,'yyyy-mm') mon
from dual connect by level < 300),
l_sparse as
(-- aggregating your login info per month
select id, to_char(logged,'yyyy-mm') mon, count(*) cnt
from logs group by id, to_char(logged,'yyyy-mm') ),
l_dense as
(-- densification with partition outer join
select t.mon, l.id, cnt from l_sparse l partition by (id)
right outer join t on (l.mon = t.mon)
)
-- final analytical function to list current and previous row info in same record
select mon, id
, cnt
, lag(cnt) over (partition by id order by mon asc) prev_cnt
from l_dense
order by id, mon;
parts of the result:
MON ID CNT PREV_CNT
------- ---------- ---------- ----------
2020-12 11
2021-01 11
2021-02 11 2
2021-03 11 2
2021-04 11 1
2021-05 11 1
2021-06 11
2021-07 11 3
2021-08 11 2 3
2021-09 11 2
2020-12 22
2021-01 22 2
2021-02 22 2
2021-03 22
2021-04 22
...
You can see for ID 11 that for 2021-08 you have logins for the current and previous month, so you can math on it. (Would require another subselect/with branch).
Alternatives to this would be:
interrow calculation plus time math between two logged timestamps
pattern matching
Did not drill into those, not enough info about your real requirement.

Select based on existence of particular value after given value

I have a table 'Users' as follows.
UserID Status Effective Date
-------------------------------------------
0111 Rehire 5 Apr 2021
0111 Resign 4 Apr 2021
0111 Transfer 10 Mar 2021
0111 Hire 5 Aug 2014
0112 PayrollChange 4 Apr 2021
0112 Resign 3 Apr 2021
0112 Hire 1 Jul 2001
0113 Resign withdraw 3 Apr 2021
0113 Resign 1 Apr 2021
0113 Transfer 1 Nov 2019
0113 Hire 10 Aug 2007
I would like create a SQL query to identify those users who got resigned (need not be the latest record) but not rehired or resignation withdrawn. Considering an employee can resign and withdraw multiple times. Need to consider based on effective date. How can I create such a script.
Output expected
UserID Status Effective Date
-------------------------------------------
0112 Resign 3 Apr 2021
Please help on the same.
Note: I have tried multiple times and unable to arrive at a proper query. Hence posting this.
Thanks in advance.
Try with this query:
SELECT *
FROM Users U1
WHERE U1.Status = 'Resign'
AND NOT EXISTS (SELECT 1
FROM Users U2
wHERE U2.UserId = U1.UserId
AND U2.Status IN ('Rehire', 'Resign withdraw'))
I would like create a SQL query to identify those users who got resigned (need not be the latest record) but not rehired or resignation withdrawn.
If I understand correctly, you want to compare the maximum of the resigned date to the maximum of the other dates:
select userId, 'Resign',
max(case when status = 'Resign' then EffectiveDate end)
from t
where max(case when status = 'Resign' then EffectiveDate end) > max(case when status in ('Rehire', 'Resign withdraw') then EffectiveDate end) or
max(case when status in ('Rehire', 'Resign withdraw') then EffectiveDate end) is null;
So here's one approach that you can try, given you simply want to find any user with a status of "Resign" but exclude any that also have a status of "Rehire" or "Resign withdraw" - you can simply assign a suitable value to each and sum them for each user and filter accordingly; the effective date will always be the most recent date with the required status. This will be the better performing query given a suitable index.
select UserId, 'Resign' Status, EffectiveDate from (
select
Sum(case when status in ('Rehire','Resign withdraw') then 1000 when status='Resign' then 1 else 0 end) Stot,
Max( case when status='resign' then Effectivedate end) EffectiveDate,
UserId
from T
where Status in ('Rehire', 'Resign withdraw', 'Resign' )
group by UserId
)x
where Stot>0 and Stot<1000

SQL - Create multiple records from 1 record based on days between 2 dates

I have a table that holds an employee's leave. If an employee takes more than 1 day off in a row for example 22-05-2020 to the 26-05-2020 this will be displayed as one record. I am trying to get this displayed as 5 records, one for each day they were off.
My table is called: Emp_Annual_Leave
and has the following fields
emp_no leave_type leave_year half_day start_date end_date days_of_leave
12345 Annual 2020 N 22/05/2020 26/05/2020 5
above is how it is currently displayed and I am trying to display the above record like below:
emp_no leave_type leave_year half_day start_date end_date days_of_leave leave_date
12345 Annual 2020 N 22/05/2020 26/05/2020 1 22/05/2020
12345 Annual 2020 N 22/05/2020 26/05/2020 1 23/05/2020
12345 Annual 2020 N 22/05/2020 26/05/2020 1 24/05/2020
12345 Annual 2020 N 22/05/2020 26/05/2020 1 25/05/2020
12345 Annual 2020 N 22/05/2020 26/05/2020 1 26/05/2020
Does anyone know I would go about doing this? I have a feeling I need to use ROW_NUMBER() OVER(PARTITION BY) but any attempts I have made haven't worked well for me.
Thanks in advance,
EDIT:
the table I need to create here is a subquery in a bigger query and needs to be joined back to other queries and tables in my DB. I didn't include this as part of my original question, updated to include now incase this impacts the methods I need to use
You could use a recursive query:
with cte as (
select emp_no, leave_type, leave_year, half_day, start_date, end_date, days_of_leave, start_date as leave_date from emp_annual_leave
union all
select emp_no, leave_type, leave_year, half_day, start_date, end_date, days_of_leave, dateadd(day, 1, leave_date)
from cte
where leave_date < end_date
)
select * from cte
If a given leave may span over more than 100 days, you need to add option (maxrecursion 0) at the end of the query.

How to identify and aggregate sequence from start and end dates

I'm trying to identify a consecutive sequence in dates, per person, as well as sum amount for that sequence. My records table looks like this:
person start_date end_date amount
1 2015-09-10 2015-09-11 500
1 2015-09-11 2015-09-12 100
1 2015-09-13 2015-09-14 200
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-05 300
2 2015-10-06 2015-10-06 1000
3 2015-04-23 2015-04-23 900
The resulting query should be this:
person sequence_start_date sequence_end_date amount
1 2015-09-10 2015-09-14 800
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-06 1400
3 2015-04-23 2015-04-23 900
Below, I can use LAG and LEAD to identify the sequence start_date and end_date, but I don't have a way to aggregate the amount. I'm assuming the answer will involve some sort of ROW_NUMBER() window function that will partition by sequence, I just can't figure out how to make the sequence identifiable to the function.
SELECT
person
,COALESCE(sequence_start_date, LAG(sequence_start_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_start_date"
,COALESCE(sequence_end_date, LEAD(sequence_end_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_end_date"
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' = start_date
THEN NULL
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' = end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
) sq
Even your updated (sub)query still isn't quite right for the data you've presented, which is inconsistent about whether the start date of the second and subsequent rows in a sequence should be equal to their previous rows' end date or one day later. The query can be updated pretty easily to accommodate both, if that's needed.
In any case, you cannot use COALESCE as a window function. Aggregate functions may be used as window functions by providing an OVER clause, but not ordinary functions. There are nevertheless ways to apply window function to this task. Here's a way to identify the sequences in your data (as presented):
SELECT
person
,MAX(sequence_start_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "sequence_start_date"
,MIN(sequence_end_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS "sequence_end_date"
,amount
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' >= start_date
THEN date '0001-01-01'
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' <= end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
order by person, start_date
) sq_part
ORDER BY person, sequence_start_date
That relies on MAX() and MIN() instead of COALESCE(), and it applies window framing to get the appropriate scope for each of those within each partition. Results:
person sequence_start_date sequence_end_date amount
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 500
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 100
1 October, 05 2015 00:00:00 October, 07 2015 00:00:00 2000
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 300
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 1000
3 April, 23 2015 00:00:00 April, 23 2015 00:00:00 900
Do note that that does not require an exact match of end date with subsequent start date; all rows for each person that abut or overlap will be assigned to the same sequence. If (person, start_date) cannot be relied upon to be unique, however, then you probably need to order the partitions by end date as well.
And now you have a way to identify the sequences: they are characterized by the triple person, sequence_start_date, sequence_end_date. (Or actually, you need only the person and one of those dates for identification purposes, but read on.) You can wrap the above query as an inline view of an outer aggregate query to produce your desired result:
SELECT
person,
sequence_start_date,
sequence_end_date,
SUM(amount) AS "amount"
FROM ( <above query> ) sq
GROUP BY person, sequence_start_date, sequence_end_date
Of course you need both dates as grouping columns if you're going to select them.
Why not:
select a1.person, a1.sequence_start_date, a1.sequence_end_date,
sum(rx.amount)
as amount
from (EXISTING_QUERY) a1
left join records rx
on rx.person = a1.person
and rx.start_date >= a1.start_date
and rx.end_date <= a1.end_date
group by a1.person, a1.sequence_start_date, a1.sequence_end_date

select column values based on other column date

I have a dataset being returned that has monthly values for different 'Goals.' The goals have unique ID's and the month/date values will always be the same for the goals. The difference is sometimes one goal doesn't have values for all the same months as the other goal because it might start at a later date, and i want to 'consolidate' the results and sum them together based on the 'First' startBalance for each goal. Example dataset would be;
goalID monthDate startBalance
1 1/1/2014 10
1 2/1/2014 15
1 3/1/2014 22
1 4/1/2014 30
2 4/1/2014 13
2 5/1/2014 29
What i want to do is display these consolidated (summed) values in a table based on the 'First' (earliest Month/Year) value for each goal. The result would look like;
Year startBalance
2014 23
This is because the 'First' value for goalID of 1 is 10 and the 'First' value for goalID of 2 is '13'
I am trying to ultimately use this dataset in an SSRS report through Report Builder, but the groupings are not working correctly for me so i figured if i could achieve this through my queries and just display the data that would be a viable solution.
An example of real result data would be
so i'd want the overall resultset to be;
Year startBalance
2014 876266.00
2015 888319.92
2016 ---------
and so on, i understand for 2015 in that result set there is a value of 0.00 for ID 71, but usually that will contain an actual dollar amount, which would automatically adjust.
WITH balances AS (
SELECT ROW_NUMBER() OVER (PARTITION BY goalID ORDER BY monthDate ASC) n, startBalance, DATEPART(year, monthDate) [year]
FROM Goals
)
SELECT [year], SUM(startBalance) startBalance
FROM balances
WHERE n = 1
GROUP BY [year]