Create table with distinct values based on date - sql

I have a table which fills up with lots of transactions monthly, like below.
Name ID Date OtherColumn
_________________________________________________
John Smith 11111 2012-11-29 Somevalue
John Smith 11111 2012-11-30 Somevalue
Adam Gray 22222 2012-12-11 Somevalue
Tim Blue 33333 2012-12-15 Somevalue
John NewName 11111 2013-01-01 Somevalue
Adam Gray 22222 2013-01-02 Somevalue
From this table i want to create a dimension table with the unique names and id's. The problem is that a person can change his/her name, like "John" in the example above. The Id's are otherwise always unique. In those cases I want to only use the newest name (the one with the latest date).
So that I end up with a table like this:
Name ID
______________________
John NewName 11111
Adam Gray 22222
Tim Blue 33333
How do I go about achieving this?
Can I do it in a single query?

Use a CTE for this. It simplifies ranking and window functions.
;WITH CTE as
(SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
ID,
Name
FROM
YourTable)
SELECT
Name,
ID
FROM
CTE
WHERE
RN = 1

I think creating a table is a bad idea, but this is how you get the most recent name.
select name
from yourtable yt join
(select id, max(date) maxdate
from yourtable
group by id ) temp on temp.id = yt.id and yt.date = maxdate

JNK's CTE solution is an equivalent of the following.
SELECT
Name,
ID
FROM (
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
Name,
ID
FROM theTable
)
WHERE RN = 1
Trying to think a way to get rid of the partition function without introducing the possible duplicates.

Related

Select records based on Id and most updated record of the same Id

I have a table Applications
The user can submit more than one application. The user can also update an existing application, but instead of updating the record itself, we will insert a new record with the same ApplicationNumber
Id ApplicationNum ApplicantId ApplicantName CreateDate
1 101 789 John May-20-2021
2 101 789 John May-21-2021
3 102 789 John May-22-2021
4 103 123 Maria May-31-2021
I want to return the list of applications based on the ApplicantId, but I don’t want to display both records of the same ApplicationNumber
If I use this select statement
Select * from Applications where ApplicantId = 789
This is the result I currently get
1 101 789 John May-20-2021
2 101 789 John May-21-2021
3 102 789 John May-22-2021
This is the result I want to get
2 101 789 John May-21-2021
3 102 789 John May-22-2021
Notice that record Id = 1 is not displayed because it is an old version of record Id = 2
How can I achieve this?
I like using ROW_NUMBER along with a TIES trick here:
SELECT TOP 1 WITH TIES *
FROM Applications
WHERE ApplicantId = 789
ORDER BY ROW_NUMBER() OVER (PARTITION BY ApplicantId, ApplicationNum ORDER BY Id DESC);
Might be easier to just use:
select max(Id) as Id, ApplicationNum, ApplicantId, ApplicantName, max(CreateDate) as CreateDate
from Applications
where ApplicantId = 789
group by ApplicationNum, ApplicantId, ApplicantName
The traditional way which is usually the most performant is to use row_number and select the desired row from each group of Applicants
select Id, ApplicationNum, ApplicantId, ApplicantName, CreateDate
from (
select *, Row_Number() over(partition by ApplicantId, ApplicationNum order by Id desc) rn
from Applications
where ApplicantId=789
)a
where rn=1

How to select only the most recent

Table A has ID and date and name. Each time the record is changed the first 11 digits of the Id remain the same but the final digit would increase by 1. For example
123456789110 01-01-2020 John smith
119876543210 01-01-2020 Peter Griffin
119876543211 05-01-2020 Peter Griffin
How could I write a statement that shows The iD associated with John smith as well as the most recent Id of Peter Griffin? Thanks
Yet another option is using WITH TIES
Select top 1 with ties *
From YourTable
Order by row_number() over (partition by left(id,11) order by date desc)
Why not just use max()?
select name, max(id)
from t
group by name;

sql that identifies which account numbers have multiple agents

I dont think a count will work here, can someone help me get an sql that identifies which account numbers have multiple agents, more than two agents in the where condition.
AGENT_NAME ACCOUNT_NUMBER
Clemons, Tony 123
Cipollo, Michael 123
Jepsen, Sarah 567
Joanos, James 567
McMahon, Brian 890
Novak, Jason 437
Ralph, Melissa 197
Reitwiesner, John 221
Roman, Marlo 123
Rosenzweig, Marcie 890
Results should be something like this.
ACCOUNT_NUMBER AGENT_NAME
123 Cipollo, Michael
123 Roman, Marlo
123 Clemons, Tony
890 Rosenzweig, Marcie
890 McMahon, Brian
567 Joanos, James
567 Jepsen, Sarah
You can do this using window functions:
select t.account_number, t.agent_name
from (select t.*, min(agent_name) over (partition by account_number) as minan,
max(agent_name) over (partition by account_number) as maxan
from table t
) t
where minan <> maxan;
If you know the agent names are never duplicated, you could just do:
select t.account_number, t.agent_name
from (select t.*, count(*) over (partition by account_number) as cnt
from table t
) t
where cnt > 1;
Assuming your table name is test, this should pull all the records with duplicate ACCOUNT_NUMBER:
select * from test where ACCOUNT_NUMBER in
(select ACCOUNT_NUMBER from test
group by ACCOUNT_NUMBER having
count(ACCOUNT_NUMBER)>1)
order by ACCOUNT_NUMBER
Using count function u can get the result
CREATE TABLE #TEMP
(
AGENT_NAME VARCHAR(100),
ACCOUNT_NUMBER INT
)
INSERT INTO #TEMP
VALUES ('CLEMONS, TONY',123),
('CIPOLLO, MICHAEL',123),
('JEPSEN, SARAH',567),
('JOANOS, JAMES',567),
('MCMAHON, BRIAN',890),
('NOVAK, JASON',437),
('RALPH, MELISSA',197),
('REITWIESNER, JOHN',221),
('ROMAN, MARLO',123),
('ROSENZWEIG, MARCIE',890)
SELECT a.ACCOUNT_NUMBER,a.AGENT_NAME
FROM #TEMP A
JOIN(SELECT COUNT(1) CNT,
ACCOUNT_NUMBER
FROM #TEMP
GROUP BY ACCOUNT_NUMBER) B
ON A.ACCOUNT_NUMBER = B.ACCOUNT_NUMBER
WHERE B.CNT != 1

fill in a null cell with cell from previous record

Hi I am using DB2 sql to fill in some missing data in the following table:
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL NULL NULL
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL NULL NULL
Where person 2 has lived in 3 houses, but the middle address it is not known where, and when. I can't do anything about what house they were in, but I would like to take the previous house they lived at, and use the previous To date to replace the NULL From date, and use the next address info and use the From date to replace the null To date ie.
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL 2012-09-27 2004-01-01
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL 2009-08-01 9999-01-01
I understand that if there is no previous address before a null address, that will have to stay null, but if a null address is the last know address I would like to change the To date to 9999-01-01 as in person 3.
This type of problem seems to me where set theory no longer becomes a good solution, however I am required to find a DB2 solution because that's what my boss uses!
any pointers/suggestions welcome.
Thanks.
It might look something like this:
select
person,
house,
coalesce(from_date, prev_to_date) from_date,
case when rn = 1 then coalesce (to_date, '9999-01-01')
else coalesce(to_date, next_from_date) end to_date
from
(select person, house, from_date, to_date,
lag(to_date) over (partition by person order by from_date nulls last) prev_to_date,
lead(from_date) over (partition by person order by from_date nulls last) next_from_date,
row_number() over (partition by person order by from_date desc nulls last) rn
from temp
) t
The above is not tested but it might give you an idea.
I hope in your actual table you have a column other than to_date and from_date that allows you to order rows for each person, otherwise you'll have trouble sorting NULL dates, as you have no way of knowing the actual sequence.
create table Temp
(
person varchar(2),
house int,
from_date date,
to_date date
)
insert into temp values
(1,586,'2000-04-16','2010-12-03 '),
(2,123,'2001-01-01','2012-09-27'),
(2,NULL,NULL,NULL),
(2,104,'2004-01-01','2012-11-24'),
(3,987,'1999-12-31','2009-08-01'),
(3,NULL,NULL,NULL)
select A.person,
A.house,
isnull(A.from_date,BF.to_date) From_date,
isnull(A.to_date,isnull(CT.From_date,'9999-01-01')) To_date
from
((select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) A left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) BF
on A.person = BF.person and
A.rownum = BF.rownum + 1)left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) CT
on A.person = CT.person and
A.rownum = CT.rownum - 1

SQL Query to return the difference between records of two most recent dates

I have the following table:
**TABLE1**
RecordID UserID UserName Balance TranDate
---------------------------------------------------------------
100 10001 John Doe 10213.00 2013-02-12 00:00:00.000
101 10001 John Doe 1932.00 2013-04-30 00:00:00.000
102 10001 John Doe 10213.00 2013-03-25 00:00:00.000
103 10001 John Doe 14514.00 2013-04-12 00:00:00.000
104 10001 John Doe 5430.00 2013-02-19 00:00:00.000
105 10001 John Doe 21242.00 2010-02-11 00:00:00.000
106 10001 John Doe 13342.00 2013-05-22 00:00:00.000
Now what i'm trying to do is to query the two most recent transactions and arrive at this data:
RecordID UserID UserName Balance TranDate
---------------------------------------------------------------
106 10001 John Doe 13342.00 2013-05-22 00:00:00.000
101 10001 John Doe 1932.00 2013-04-30 00:00:00.000
Then using the data above I would like to compare the balances to show the difference:
UserID UserName Difference
---------------------------------------------------------------
10001 John Doe -11410.00
This just shows the difference between the two previous balances (the latest and the balance before the latest)
Now I have the following query below. This works okay to show the two most recent transactions.
SELECT
TOP 2 *
FROM Table1
WHERE UserID = '1001'
ORDER
BY TranDate DESC
Now my issues are:
Is the sql above safe to use? I am just relying on the sorting of the TranDate by the ORDER BY DESC keyword and I am not so sure if this is very much reliable or not.
How do I select the difference between the two Balances (Row 2 - Row 1 )? I was looking for some answers online and I find stuff about self-joining. I tried it but it doesn't show me my desired output.
EDIT:
This is the closest I can get to my desired result. Can someone help me out on this please? Thanks!
DECLARE #SampleTable TABLE
(
UserID INT,
UserName VARCHAR(20),
Balance DECIMAL(9,2) DEFAULT 0
)
INSERT
INTO #SampleTable
(UserID, UserName, Balance)
SELECT
TOP 2 UserID,
UserName,
Balance
FROM Table1
WHERE UserID = '1001'
ORDER
BY TranDate DESC
SELECT A.UserID,
A.UserName,
B.Balance - A.Balance AS Difference
FROM #SampleTable A
JOIN #SampleTable B
ON A.UserID = B.UserID
Thanks a lot!
You should be able to use something like the following assuming SQL Server as the RDBMS:
;with cte as
(
select recordid, userid, username, balance, trandate,
row_number() over(partition by userid order by trandate desc) rn
from table1
)
select c1.userid, c1.username,
c1.balance - c2.balance diff
from cte c1
cross apply cte c2
where c1.rn = 1
and c2.rn = 2;
See SQL Fiddle with demo.
Or this could be done using an INNER JOIN on the row_number value:
;with cte as
(
select recordid, userid, username, balance, trandate,
row_number() over(partition by userid order by trandate desc) rn
from table1
)
select c1.userid, c1.username,
c1.balance - c2.balance diff
from cte c1
inner join cte c2
on c1.rn + 1 = c2.rn
where c1.rn = 1
See SQL Fiddle with Demo