Select based on existence of particular value after given value - sql

I have a table 'Users' as follows.
UserID Status Effective Date
-------------------------------------------
0111 Rehire 5 Apr 2021
0111 Resign 4 Apr 2021
0111 Transfer 10 Mar 2021
0111 Hire 5 Aug 2014
0112 PayrollChange 4 Apr 2021
0112 Resign 3 Apr 2021
0112 Hire 1 Jul 2001
0113 Resign withdraw 3 Apr 2021
0113 Resign 1 Apr 2021
0113 Transfer 1 Nov 2019
0113 Hire 10 Aug 2007
I would like create a SQL query to identify those users who got resigned (need not be the latest record) but not rehired or resignation withdrawn. Considering an employee can resign and withdraw multiple times. Need to consider based on effective date. How can I create such a script.
Output expected
UserID Status Effective Date
-------------------------------------------
0112 Resign 3 Apr 2021
Please help on the same.
Note: I have tried multiple times and unable to arrive at a proper query. Hence posting this.
Thanks in advance.

Try with this query:
SELECT *
FROM Users U1
WHERE U1.Status = 'Resign'
AND NOT EXISTS (SELECT 1
FROM Users U2
wHERE U2.UserId = U1.UserId
AND U2.Status IN ('Rehire', 'Resign withdraw'))

I would like create a SQL query to identify those users who got resigned (need not be the latest record) but not rehired or resignation withdrawn.
If I understand correctly, you want to compare the maximum of the resigned date to the maximum of the other dates:
select userId, 'Resign',
max(case when status = 'Resign' then EffectiveDate end)
from t
where max(case when status = 'Resign' then EffectiveDate end) > max(case when status in ('Rehire', 'Resign withdraw') then EffectiveDate end) or
max(case when status in ('Rehire', 'Resign withdraw') then EffectiveDate end) is null;

So here's one approach that you can try, given you simply want to find any user with a status of "Resign" but exclude any that also have a status of "Rehire" or "Resign withdraw" - you can simply assign a suitable value to each and sum them for each user and filter accordingly; the effective date will always be the most recent date with the required status. This will be the better performing query given a suitable index.
select UserId, 'Resign' Status, EffectiveDate from (
select
Sum(case when status in ('Rehire','Resign withdraw') then 1000 when status='Resign' then 1 else 0 end) Stot,
Max( case when status='resign' then Effectivedate end) EffectiveDate,
UserId
from T
where Status in ('Rehire', 'Resign withdraw', 'Resign' )
group by UserId
)x
where Stot>0 and Stot<1000

Related

Oracle SQL: How to fill Null value with data from most recent previous date that is not null?

Essentially date field is updated every month along with other fields, however one field is only updated ~6 times throughout the year. For months where that field is not updated, looking to show the most recent previous data
Date
Emp_no
Sales
Group
Jan
1234
100
Med
Feb
1234
200
---
Mar
1234
170
---
Apr
1234
150
Low
May
1234
180
---
Jun
1234
90
High
Jul
1234
100
---
Need it to show:
Date
Emp_no
Sales
Group
Jan
1234
100
Med
Feb
1234
200
Med
Mar
1234
170
Med
Apr
1234
150
Low
May
1234
180
Low
Jun
1234
90
High
Jul
1234
100
High
This field is not updated at set intervals, could be 1-4 months of Nulls in a row
Tried something like this to get the second most recent date but unsure how to deal with the fact that i could need between 1-4 months prior
LAG(Group)
OVER(PARTITION BY emp_no
ORDER BY date)
Thanks!
This is the traditional "gaps and islands" problem.
There are various ways to solve it, a simple version will work for you.
First, create a new identifier that splits the rows in to "groups", where only the first row in each group is NOT NULL.
SUM(CASE WHEN "group" IS NOT NULL THEN 1 ELSE 0 END) OVER (PARTION BY emp_no ORDER BY "date") AS emp_group_id
Then you can use MAX() in another window function, as all "groups" will only have one NOT NULL value.
WITH
gaps
AS
(
SELECT
t.*,
SUM(
CASE WHEN "group" IS NOT NULL
THEN 1
ELSE 0
END
)
OVER (
PARTITION BY emp_no
ORDER BY "date"
)
AS emp_group_id
FROM
your_table t
)
SELECT
"date",
emp_no,
sales,
MAX("group")
OVER (
PARTITION BY emp_no, emp_group_id
)
AS "group"
FROM
gaps
Edit
Ignore all that.
Oracle has IGNORE NULLS.
LAST_VALUE("group" IGNORE NULLS)
OVER (
PARTITION BY emp_no
ORDER BY "date"
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
)
AS "group"

Running SQL function in HUE IMPALA

I have started working on HUE IMPALA and I am stuck at a complex problem which I am not able to get through. So my table looks like this.
Id
Month
Base Rate
Payment
New Payment
a
Jan
1
100
NULL
a
Feb
1
100
NULL
a
Mar
1
100
NULL
a
Apr
2
NULL
NULL
a
May
3
NULL
NULL
a
Jun
4
NULL
NULL
a
Jul
5
NULL
NULL
So my aim is to fill the values in new payment column with this logic.
if Payment IS NULL THEN New Payment = (New Base Rate (current base rate - previous base rate)* Previous Payment) + Previous Payment... ELSE Payment
Eg: For mar new payment = 100
But for Apr, New Payment = 100 + (100* (1-1)) = 100
For this I have written the following code:
Select id, month,
CASE WHEN payment is NULL then
LAG(payment)
over(Partition BY id order by month) +
((LAG(payment)
over(Partition BY id order by month))*
(base_rate-lag(base_rate)
OVER (Partition by id order by month)))
Else payment end as New Payment
With this I get following answer
Id
Month
Base Rate
Payment
New Payment
a
Jan
1
100
100
a
Feb
1
100
100
a
Mar
1
100
100
a
Apr
2
NULL
100
a
May
3
NULL
NULL
a
Jun
4
NULL
NULL
a
Jul
5
NULL
NULL
Now the problem is the New Payment variable stops at May Month because there is NULL value in the Previous month (Apr) in the payment column. What I want is once the NULL value comes in the payment column, the code then starts using the updated value in new payment column in the above mentioned logic. So the answer I want is this:
Id
Month
Base Rate
Payment
New Payment
a
Jan
1
100
100
a
Feb
1
100
100
a
Mar
1
100
100
a
Apr
2
NULL
100
a
May
3
NULL
200
a
Jun
4
NULL
400
a
Jul
5
NULL
800
May -- New Payment = 100 + (100*(2-1)) = 200
June -- New Payment = 200 + (200 * (3-2))= 400
It's okay if a new variable needs to be created or if I have to split this code into multiple parts like create a table first then apply the rest of the logic. Entirely new logic which doesn't use the lag function is also welcome.

How to select rows where logged in last month and logged min 1 time in one of month preceding August in Oracle SQL?

I have table in Oracle SQL presents ID of clients and date with time of their login to application:
ID | LOGGED
----------------
11 | 2021-07-10 12:55:13.278
11 | 2021-08-10 13:58:13.211
11 | 2021-02-11 12:22:13.364
22 | 2021-01-10 08:34:13.211
33 | 2021-04-02 14:21:13.272
I need to select only these clients (ID) who has logged minimum 1 time in last month (August) and minimum 1 time in one month preceding August (June or July)
Currently we have September, so...
I need clients who has logged min 1 time in August
and min 1 time in July or Jun,
if logged in June -> not logg in July
if logged in July -> not logged in June
As a result I need like below:
ID
----
11
How can do that in Oracle SQL ? be aware that column "LOGGED" has Timestamp like: 2021-01-10 08:34:13.211
May be you consider this:
select id
from yourtable
group by id
having count(case
months_between(trunc(sysdate,'MM'),
trunc(logged,'MM')
) when 1 then 1 end
) >= 1
and count
(case when
months_between(trunc(sysdate,'MM') ,
trunc(logged,'MM')
) in (2,3) then 1 end
) = 1
I don't understand one thing:
You wrote :
minimum 1 time in one month preceding August (June or July)
and after then:
if logged in June -> not logg in July
if logged in July -> not logged in June
If you need EXACTLY one month- June or July
just consider my query above.
If you need minimum one logon in June and July, then:
select id
from yourtable
group by id
having count(case
months_between(trunc(sysdate,'MM'),
trunc(logged,'MM')
) when 1 then 1 end
) >= 1
and count
(case when
months_between(trunc(sysdate,'MM') ,
trunc(logged,'MM')
) in (2,3) then 1 end
) >= 1
Your question needs some clarification, but based on what you were describing I am seeing a couple of options.
The simplest one is probably using a combo of data densification (for generating a row for every month for each id) plus an analytical function (for enabling inter-row calculations. Here's a simple example of this:
rem create a dummy table with some more data (you do not seem to worry about the exact timestamp)
drop table logs purge;
create table logs (ID number, LOGGED timestamp);
insert into logs values (11, to_timestamp('2021-07-10 12:55:13.278','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-07-11 12:55:13.278','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-08-10 13:58:13.211','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-02-11 12:22:13.364','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (11, to_timestamp('2021-04-11 12:22:13.364','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (22, to_timestamp('2021-01-10 08:34:13.211','yyyy-mm-dd HH24:MI:SS.FF'));
insert into logs values (33, to_timestamp('2021-04-02 14:21:13.272','yyyy-mm-dd HH24:MI:SS.FF'));
commit;
The following SQL gets your data densified and lists the total count of logins for a month and the previous month on the same row so that you could do a comparative calculation. I have not done then, but I am hoping you get the idea.
with t as
(-- dummy artificial table just to create a time dimension for densification
select distinct to_char(sysdate - rownum,'yyyy-mm') mon
from dual connect by level < 300),
l_sparse as
(-- aggregating your login info per month
select id, to_char(logged,'yyyy-mm') mon, count(*) cnt
from logs group by id, to_char(logged,'yyyy-mm') ),
l_dense as
(-- densification with partition outer join
select t.mon, l.id, cnt from l_sparse l partition by (id)
right outer join t on (l.mon = t.mon)
)
-- final analytical function to list current and previous row info in same record
select mon, id
, cnt
, lag(cnt) over (partition by id order by mon asc) prev_cnt
from l_dense
order by id, mon;
parts of the result:
MON ID CNT PREV_CNT
------- ---------- ---------- ----------
2020-12 11
2021-01 11
2021-02 11 2
2021-03 11 2
2021-04 11 1
2021-05 11 1
2021-06 11
2021-07 11 3
2021-08 11 2 3
2021-09 11 2
2020-12 22
2021-01 22 2
2021-02 22 2
2021-03 22
2021-04 22
...
You can see for ID 11 that for 2021-08 you have logins for the current and previous month, so you can math on it. (Would require another subselect/with branch).
Alternatives to this would be:
interrow calculation plus time math between two logged timestamps
pattern matching
Did not drill into those, not enough info about your real requirement.

oracle: Count the classes start each month

I am trying to count of classes to start each month.
select
to_char(START_DATE_TIME,'MON'),
count(START_DATE_TIME)
from
SECTION
having
count(START_DATE_TIME) > 1
group by
START_DATE_TIME
It give me this output
MAY 4
APR 3
MAY 2
JUN 2
APR 2
JUL 7
JUL 7
JUN 3
APR 4
MAY 2
APR 6
MAY 4
JUN 2
JUN 2
JUN 3
MAY 5
JUN 2
APR 3
MAY 3
JUN 3
MAY 2
APR 2
MAY 3
I need a output similar to this
Start_Month Count
July 14
June 17
April 21
May 26
Use "to_char(START_DATE_TIME,'MON')" in all of your count, group by, having and order by.
select
to_char(START_DATE_TIME,'MON') as Start_Month ,
count(to_char(START_DATE_TIME,'MON')) as Count
from
SECTION
having
count(to_char(START_DATE_TIME,'MON')) > 1
group by
to_char(START_DATE_TIME,'MON')
order by
count(to_char(START_DATE_TIME,'MON'));
With group by START_DATE_TIME you tell the DBMS you want one result row per START_DATE_TIME. But what you actually want is one result row per month, so group by month instead.
count(START_DATE_TIME) counts the rows for which START_DATE_TIME is not null. As you group by this date, this makes no sense. Count the rows unconditionally instead (COUNT(*)).
having count(START_DATE_TIME) > 1 occurs after GROUP BY of course and should hence be placed behind it. It looks strange to see it in the wrong place. Moreover: What are you trying to achieve with this condition? You get one result row per START_DATE_TIME, because there is at least one record for the date in the table. So of course this connition is true for all dates. (Except for null, if START_DATE_TIME is nullable. But then you'd merely apply WHERE START_DATE_TIME IS NOT NULL.)
The query corrected:
select
to_char(start_date_time, 'Month') as "Start_Month",
count(*) as "Count"
from section
group by to_char(start_date_time, 'Month')
order by "Count";
BTW: I guess you are aware that you are looking at months regardless of the year. If you want to change this, change the TO_CHAR format accordingly.

Output number of occurrences of id in a table

PK Date ID
=== =========== ===
1 07/04/2017 22
2 07/05/2017 22
3 07/07/2017 03
4 07/08/2017 04
5 07/09/2017 22
6 07/09/2017 22
7 07/10/2017 05
8 07/11/2017 03
9 07/11/2017 03
10 07/11/2017 03
I want to count the number of ID occurred in a given week/month, something like this.
ID Count
22 3 --> count as 1 only in the same date occurred twice one 07/09/2017
03 2 --> same as above, increment only one regardless how many times it occurred in a same date
04 1
05 1
I'm trying to implement this in a perl file, to output/print it in a csv file, I have no idea on what query will I execute.
Seems like a simple case of count distinct and group by:
SELECT Id, COUNT(DISTINCT [Date]) As [Count]
FROM TableName
WHERE [Date] >= #StartDate
AND [Date] <= #EndDate
GROUP BY Id
ORDER BY [Count] DESC
You can use COUNT with DISTINCT e.g.:
SELECT ID, COUNT(DISTINCT Date)
FROM table
GROUP BY ID;
You can read more abot how to get month from a date in get month from a date (it also works for year).
Your query will be :
select DATEPART(mm,Date) AS month, COUNT(ID) AS count from table group by month
Hope that helped you.