Select a distinct ID with the maximum date - sql

I see some questions relating to mine, but they are not exactly the same.
I need to make a SELECT in a DB2 database where I keep only distinct IDs with their data.
Example, I have some datas :
ID DATE_BEGIN DATE_END
1111 2014-01-01 2016-01-02
1111 2018-01-05 2018-01-03
1111 1990-01-01 9999-12-31
2222 1998-02-02 2000-12-20
In my case, I want to keep :
1111 1990-01-01 9999-12-31
2222 1998-02-02 2000-12-20
My SELECT statement:
SELECT
ID, DATE_BEGIN, DATE_END
FROM TABLE_NAME T1
WHERE DATE_END = (SELECT
MAX(DATE_END)
FROM TABLE_NAME T2
WHERE T2.DATE_END = T1.DATE_END)
But I keep getting every records.
Thanks for the help !

I asked a similar question previously, please refer to my post here: Get the latest date for each record
SELECT ID, DATE_BEGIN, DATE_END
FROM (
SELECT ID, DATE_BEGIN, DATE_END
,ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [DATE_END] DESC) RN
FROM TABLE_NAME
)A
WHERE A.RN = 1
Credit goes to the original answer in my post.

Related

Expanding/changing my query to find more entries using (potentially) IFELSE

My question will use this dataset as an example. I have a query setup (I have changed variables to more generic variables for the sake of posting this on the internet so the query may not make perfect sense) that picks the most recent date for a given account. So the query returns values with a reason_type of 1 with the most recent date. This query has effective_date set to is not null.
account date effective_date value reason_type
123456 4/20/2017 5/1/2017 5 1
123456 1/20/2017 2/1/2017 10 1
987654 2/5/2018 3/1/2018 15 1
987654 12/31/2017 2/1/2018 20 1
456789 4/27/2018 5/1/2018 50 1
456789 1/24/2018 2/1/2018 60 1
456123 4/25/2017 null 15 2
789123 5/1/2017 null 16 2
666888 2/1/2018 null 31 2
333222 1/1/2018 null 20 2
What I am looking to do now is to basically use that logic to only apply to reason_type
if there is an entry for it, otherwise have it default to reason_type
I think I should be using an IFELSE, but I'm admittedly not knowledgeable about how I would go about that.
Here is the code that I currently have to return the reason_type 1s most recent entry.
I hope my question is clear.
SELECT account, date, effective_date, value, reason_type
from
(
SELECT account, date, effective_date, value, reason_type
ROW_NUMBER() over (partition by account order by date desc) rn
from mytable
WHERE value is not null
AND effective_date is not null
)
WHERE rn =1
I think you might want something like this (do you really have a column named date by the way? That seems like a bad idea):
SELECT account, date, effective_date, value, reason_type
FROM (
SELECT account, date, effective_date, value, reason_type
, ROW_NUMBER() OVER ( PARTITION BY account ORDER BY date DESC ) AS rn
FROM mytable
WHERE value IS NOT NULL
) WHERE rn = 1
-- effective_date IS NULL or is on or before today's date
AND ( effective_date IS NULL OR effective_date < TRUNC(SYSDATE+1) );
Hope this helps.

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

fill in a null cell with cell from previous record

Hi I am using DB2 sql to fill in some missing data in the following table:
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL NULL NULL
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL NULL NULL
Where person 2 has lived in 3 houses, but the middle address it is not known where, and when. I can't do anything about what house they were in, but I would like to take the previous house they lived at, and use the previous To date to replace the NULL From date, and use the next address info and use the From date to replace the null To date ie.
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL 2012-09-27 2004-01-01
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL 2009-08-01 9999-01-01
I understand that if there is no previous address before a null address, that will have to stay null, but if a null address is the last know address I would like to change the To date to 9999-01-01 as in person 3.
This type of problem seems to me where set theory no longer becomes a good solution, however I am required to find a DB2 solution because that's what my boss uses!
any pointers/suggestions welcome.
Thanks.
It might look something like this:
select
person,
house,
coalesce(from_date, prev_to_date) from_date,
case when rn = 1 then coalesce (to_date, '9999-01-01')
else coalesce(to_date, next_from_date) end to_date
from
(select person, house, from_date, to_date,
lag(to_date) over (partition by person order by from_date nulls last) prev_to_date,
lead(from_date) over (partition by person order by from_date nulls last) next_from_date,
row_number() over (partition by person order by from_date desc nulls last) rn
from temp
) t
The above is not tested but it might give you an idea.
I hope in your actual table you have a column other than to_date and from_date that allows you to order rows for each person, otherwise you'll have trouble sorting NULL dates, as you have no way of knowing the actual sequence.
create table Temp
(
person varchar(2),
house int,
from_date date,
to_date date
)
insert into temp values
(1,586,'2000-04-16','2010-12-03 '),
(2,123,'2001-01-01','2012-09-27'),
(2,NULL,NULL,NULL),
(2,104,'2004-01-01','2012-11-24'),
(3,987,'1999-12-31','2009-08-01'),
(3,NULL,NULL,NULL)
select A.person,
A.house,
isnull(A.from_date,BF.to_date) From_date,
isnull(A.to_date,isnull(CT.From_date,'9999-01-01')) To_date
from
((select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) A left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) BF
on A.person = BF.person and
A.rownum = BF.rownum + 1)left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) CT
on A.person = CT.person and
A.rownum = CT.rownum - 1

SQL Query to return the difference between records of two most recent dates

I have the following table:
**TABLE1**
RecordID UserID UserName Balance TranDate
---------------------------------------------------------------
100 10001 John Doe 10213.00 2013-02-12 00:00:00.000
101 10001 John Doe 1932.00 2013-04-30 00:00:00.000
102 10001 John Doe 10213.00 2013-03-25 00:00:00.000
103 10001 John Doe 14514.00 2013-04-12 00:00:00.000
104 10001 John Doe 5430.00 2013-02-19 00:00:00.000
105 10001 John Doe 21242.00 2010-02-11 00:00:00.000
106 10001 John Doe 13342.00 2013-05-22 00:00:00.000
Now what i'm trying to do is to query the two most recent transactions and arrive at this data:
RecordID UserID UserName Balance TranDate
---------------------------------------------------------------
106 10001 John Doe 13342.00 2013-05-22 00:00:00.000
101 10001 John Doe 1932.00 2013-04-30 00:00:00.000
Then using the data above I would like to compare the balances to show the difference:
UserID UserName Difference
---------------------------------------------------------------
10001 John Doe -11410.00
This just shows the difference between the two previous balances (the latest and the balance before the latest)
Now I have the following query below. This works okay to show the two most recent transactions.
SELECT
TOP 2 *
FROM Table1
WHERE UserID = '1001'
ORDER
BY TranDate DESC
Now my issues are:
Is the sql above safe to use? I am just relying on the sorting of the TranDate by the ORDER BY DESC keyword and I am not so sure if this is very much reliable or not.
How do I select the difference between the two Balances (Row 2 - Row 1 )? I was looking for some answers online and I find stuff about self-joining. I tried it but it doesn't show me my desired output.
EDIT:
This is the closest I can get to my desired result. Can someone help me out on this please? Thanks!
DECLARE #SampleTable TABLE
(
UserID INT,
UserName VARCHAR(20),
Balance DECIMAL(9,2) DEFAULT 0
)
INSERT
INTO #SampleTable
(UserID, UserName, Balance)
SELECT
TOP 2 UserID,
UserName,
Balance
FROM Table1
WHERE UserID = '1001'
ORDER
BY TranDate DESC
SELECT A.UserID,
A.UserName,
B.Balance - A.Balance AS Difference
FROM #SampleTable A
JOIN #SampleTable B
ON A.UserID = B.UserID
Thanks a lot!
You should be able to use something like the following assuming SQL Server as the RDBMS:
;with cte as
(
select recordid, userid, username, balance, trandate,
row_number() over(partition by userid order by trandate desc) rn
from table1
)
select c1.userid, c1.username,
c1.balance - c2.balance diff
from cte c1
cross apply cte c2
where c1.rn = 1
and c2.rn = 2;
See SQL Fiddle with demo.
Or this could be done using an INNER JOIN on the row_number value:
;with cte as
(
select recordid, userid, username, balance, trandate,
row_number() over(partition by userid order by trandate desc) rn
from table1
)
select c1.userid, c1.username,
c1.balance - c2.balance diff
from cte c1
inner join cte c2
on c1.rn + 1 = c2.rn
where c1.rn = 1
See SQL Fiddle with Demo

Create table with distinct values based on date

I have a table which fills up with lots of transactions monthly, like below.
Name ID Date OtherColumn
_________________________________________________
John Smith 11111 2012-11-29 Somevalue
John Smith 11111 2012-11-30 Somevalue
Adam Gray 22222 2012-12-11 Somevalue
Tim Blue 33333 2012-12-15 Somevalue
John NewName 11111 2013-01-01 Somevalue
Adam Gray 22222 2013-01-02 Somevalue
From this table i want to create a dimension table with the unique names and id's. The problem is that a person can change his/her name, like "John" in the example above. The Id's are otherwise always unique. In those cases I want to only use the newest name (the one with the latest date).
So that I end up with a table like this:
Name ID
______________________
John NewName 11111
Adam Gray 22222
Tim Blue 33333
How do I go about achieving this?
Can I do it in a single query?
Use a CTE for this. It simplifies ranking and window functions.
;WITH CTE as
(SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
ID,
Name
FROM
YourTable)
SELECT
Name,
ID
FROM
CTE
WHERE
RN = 1
I think creating a table is a bad idea, but this is how you get the most recent name.
select name
from yourtable yt join
(select id, max(date) maxdate
from yourtable
group by id ) temp on temp.id = yt.id and yt.date = maxdate
JNK's CTE solution is an equivalent of the following.
SELECT
Name,
ID
FROM (
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
Name,
ID
FROM theTable
)
WHERE RN = 1
Trying to think a way to get rid of the partition function without introducing the possible duplicates.