fill in a null cell with cell from previous record - sql

Hi I am using DB2 sql to fill in some missing data in the following table:
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL NULL NULL
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL NULL NULL
Where person 2 has lived in 3 houses, but the middle address it is not known where, and when. I can't do anything about what house they were in, but I would like to take the previous house they lived at, and use the previous To date to replace the NULL From date, and use the next address info and use the From date to replace the null To date ie.
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL 2012-09-27 2004-01-01
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL 2009-08-01 9999-01-01
I understand that if there is no previous address before a null address, that will have to stay null, but if a null address is the last know address I would like to change the To date to 9999-01-01 as in person 3.
This type of problem seems to me where set theory no longer becomes a good solution, however I am required to find a DB2 solution because that's what my boss uses!
any pointers/suggestions welcome.
Thanks.

It might look something like this:
select
person,
house,
coalesce(from_date, prev_to_date) from_date,
case when rn = 1 then coalesce (to_date, '9999-01-01')
else coalesce(to_date, next_from_date) end to_date
from
(select person, house, from_date, to_date,
lag(to_date) over (partition by person order by from_date nulls last) prev_to_date,
lead(from_date) over (partition by person order by from_date nulls last) next_from_date,
row_number() over (partition by person order by from_date desc nulls last) rn
from temp
) t
The above is not tested but it might give you an idea.
I hope in your actual table you have a column other than to_date and from_date that allows you to order rows for each person, otherwise you'll have trouble sorting NULL dates, as you have no way of knowing the actual sequence.

create table Temp
(
person varchar(2),
house int,
from_date date,
to_date date
)
insert into temp values
(1,586,'2000-04-16','2010-12-03 '),
(2,123,'2001-01-01','2012-09-27'),
(2,NULL,NULL,NULL),
(2,104,'2004-01-01','2012-11-24'),
(3,987,'1999-12-31','2009-08-01'),
(3,NULL,NULL,NULL)
select A.person,
A.house,
isnull(A.from_date,BF.to_date) From_date,
isnull(A.to_date,isnull(CT.From_date,'9999-01-01')) To_date
from
((select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) A left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) BF
on A.person = BF.person and
A.rownum = BF.rownum + 1)left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) CT
on A.person = CT.person and
A.rownum = CT.rownum - 1

Related

Find last job change date with JOB_TITLE and EVENT_DATE

Hi I am working in an Azure Databricks and I am looking for a SQL query solution.
Assuming that my db has five columns:
ID
EVENT_DATE
JOB_TITLE
PAY
12345
2021-01-01
VP1
100,000
12345
2020-01-10
VP1
90,000
12345
2019-01-20
Analyst1
80,000
12346
2021-02-01
VP2
200,000
12346
2020-02-10
Analyst2
150,000
12346
2020-01-20
Analyst2
110,000
Basically I want the EVENT_DATE when JOB_TITLE changed the last time. This is my desired output:
ID
JOB_TITLE
PAY
LAST_JOB_CHANGE_DATE
12345
VP1
90,000
2021-01-10
12346
VP2
200,000
2021-02-01
For the last column LAST_JOB_CHANGE_DATE, we are pulling from the 2nd and 4th row of the table because that's the date when they changed job the last time.
Thank you!
You can just use INNER JOIN to accomplish that, ie
%sql
SELECT a.*
FROM yourTable a
INNER JOIN
(
SELECT id, MAX(event_date) event_date
FROM yourTable b
GROUP BY id
) b ON a.id = b.id
AND a.event_date = b.event_date
The ROW_NUMBER approach would also work well:
%sql
WITH cte AS
(
SELECT
ROW_NUMBER() OVER( PARTITION BY id ORDER BY event_date DESC ) AS rn,
*
FROM yourTable a
)
SELECT *
FROM cte
WHERE rn = 1
My results:
There's probably a simpler solution for this but the following should work.
I'm assuming you wanted the MOST resent job change for each employee. To illustrate this, I added an extra row for an Engineer1. The ROW_NUMBER() window function helps us with this.
ID
EVENT_DATE
JOB_TITLE
PAY
12345
2021-01-01
VP1
100,000
12345
2020-01-10
VP1
90,000
12345
2019-01-20
Analyst1
80,000
12345
2018-01-04
Engineer1
75,000
12346
2021-02-01
VP2
200,000
12346
2020-02-10
Analyst2
150,000
12346
2020-01-20
Analyst2
110,000
Here is the query:
SELECT <---- (4)
c.ID,
c.JOB_TITLE,
c.PAY,
c.last_job_change_date
FROM
(
SELECT <---- (3)
b.ID,
ROW_NUMBER() OVER (PARTITION BY b.ID ORDER BY b.last_job_change_date DESC) AS row_id,
b.JOB_TITLE,
b.PAY,
b.last_job_change_date
FROM
(
SELECT <---- (2)
a.ID,
a.JOB_TITLE,
a.PAY,
a.EVENT_DATE as last_job_change_date
FROM
(
SELECT <---- (1)
ID,
EVENT_DATE,
PAY,
JOB_TITLE,
LEAD(JOB_TITLE, 1) OVER (
PARTITION BY ID ORDER BY EVENT_DATE DESC) job_change
FROM yourtable
) a
WHERE JOB_TITLE <> job_change
) b
) c
WHERE row_id = 1
I used a 4 step process and annotated the query with each step:
Returns a table with a column for the subsequent job title (ordered by most recent title) of each employee.
Returns the table from (1) but removes rows where the employee did not change their job
Add row numbers so we can get the most recent job change of each employee
Return most recent job changes for each employee

How to select rows where values changed for an ID

I have a table that looks like the following
id effective_date number_of_int_customers
123 10/01/19 0
123 02/01/20 3
456 10/01/19 6
456 02/01/20 6
789 10/01/19 5
789 02/01/20 4
999 10/01/19 0
999 02/01/20 1
I want to write a query that looks at each ID to see if the salespeople have newly started working internationally between October 1st and February 1st.
The result I am looking for is the following:
id effective_date number_of_int_customers
123 02/01/20 3
999 02/01/20 1
The result would return only the salespeople who originally had 0 international customers and now have at least 1.
I have seen similar posts here that use nested queries to pull records where the first date and last have different values. But I only want to pull records where the original value was 0. Is there a way to do this in one query in SQL?
In your case, a simple aggregation would do -- assuming that 0 is the earliest value:
select id, max(number_of_int_customers)
from t
where effective_date in ('2019-10-01', '2020-02-01')
group by id
having min(number_of_int_customers) = 0;
Obviously, this is not correct if the values can decrease to zero. But this having clause fixes that problem:
having min(case when number_of_int_customers = 0 then effective_date end) = min(effective_date)
An alternative is to use window functions, such asfirst_value():
select distinct id, last_noic
from (select t.*,
first_value(number_of_int_customers) over (partition by id order by effective_date) as first_noic,
first_value(number_of_int_customers) over (partition by id order by effective_date desc) as last_noic,
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where first_noic = 0;
Hmmm, on second thought, I like lag() better:
select id, number_of_int_customers
from (select t.*,
lag(number_of_int_customers) over (partition by id order by effective_date) as prev_noic
from t
where effective_date in ('2019-10-01', '2020-02-01')
) t
where prev_noic = 0;

Expanding/changing my query to find more entries using (potentially) IFELSE

My question will use this dataset as an example. I have a query setup (I have changed variables to more generic variables for the sake of posting this on the internet so the query may not make perfect sense) that picks the most recent date for a given account. So the query returns values with a reason_type of 1 with the most recent date. This query has effective_date set to is not null.
account date effective_date value reason_type
123456 4/20/2017 5/1/2017 5 1
123456 1/20/2017 2/1/2017 10 1
987654 2/5/2018 3/1/2018 15 1
987654 12/31/2017 2/1/2018 20 1
456789 4/27/2018 5/1/2018 50 1
456789 1/24/2018 2/1/2018 60 1
456123 4/25/2017 null 15 2
789123 5/1/2017 null 16 2
666888 2/1/2018 null 31 2
333222 1/1/2018 null 20 2
What I am looking to do now is to basically use that logic to only apply to reason_type
if there is an entry for it, otherwise have it default to reason_type
I think I should be using an IFELSE, but I'm admittedly not knowledgeable about how I would go about that.
Here is the code that I currently have to return the reason_type 1s most recent entry.
I hope my question is clear.
SELECT account, date, effective_date, value, reason_type
from
(
SELECT account, date, effective_date, value, reason_type
ROW_NUMBER() over (partition by account order by date desc) rn
from mytable
WHERE value is not null
AND effective_date is not null
)
WHERE rn =1
I think you might want something like this (do you really have a column named date by the way? That seems like a bad idea):
SELECT account, date, effective_date, value, reason_type
FROM (
SELECT account, date, effective_date, value, reason_type
, ROW_NUMBER() OVER ( PARTITION BY account ORDER BY date DESC ) AS rn
FROM mytable
WHERE value IS NOT NULL
) WHERE rn = 1
-- effective_date IS NULL or is on or before today's date
AND ( effective_date IS NULL OR effective_date < TRUNC(SYSDATE+1) );
Hope this helps.

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

Create table with distinct values based on date

I have a table which fills up with lots of transactions monthly, like below.
Name ID Date OtherColumn
_________________________________________________
John Smith 11111 2012-11-29 Somevalue
John Smith 11111 2012-11-30 Somevalue
Adam Gray 22222 2012-12-11 Somevalue
Tim Blue 33333 2012-12-15 Somevalue
John NewName 11111 2013-01-01 Somevalue
Adam Gray 22222 2013-01-02 Somevalue
From this table i want to create a dimension table with the unique names and id's. The problem is that a person can change his/her name, like "John" in the example above. The Id's are otherwise always unique. In those cases I want to only use the newest name (the one with the latest date).
So that I end up with a table like this:
Name ID
______________________
John NewName 11111
Adam Gray 22222
Tim Blue 33333
How do I go about achieving this?
Can I do it in a single query?
Use a CTE for this. It simplifies ranking and window functions.
;WITH CTE as
(SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
ID,
Name
FROM
YourTable)
SELECT
Name,
ID
FROM
CTE
WHERE
RN = 1
I think creating a table is a bad idea, but this is how you get the most recent name.
select name
from yourtable yt join
(select id, max(date) maxdate
from yourtable
group by id ) temp on temp.id = yt.id and yt.date = maxdate
JNK's CTE solution is an equivalent of the following.
SELECT
Name,
ID
FROM (
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
Name,
ID
FROM theTable
)
WHERE RN = 1
Trying to think a way to get rid of the partition function without introducing the possible duplicates.