How to get the newest address for each person - sql

I have created a simple Access database an example of table structure and a query. Two tables, person(contains 3 records) & address(contains 5 records), provide the ability to capture multiple addresses for each person. I am good with normal conditional statements, but this one is throwing me for a loop...
I am looking for a query that will return only the newest address for a given person.
Table Relationship
Current sql for the query:
SELECT Person.PersonID_PK, Address.Address, Address.StatusDate
FROM Person INNER JOIN Address ON Person.[PersonID_PK] = Address.PersonID_FK;
My current returns:
EmployeeID_PK Address StatusDate
1 12 Elm St, MN 23569 11/13/2017
1 15 Apple Ln, NY 12345 7/15/2018
2 30 Mulberry, TN 38456 6/11/2018
2 10 Lonesome Pine, KY 15487 12/4/2018
3 100 Plaze Place, LA 14563 6/17/2018
I need to return each person along with the greatest(newest) StatusDate
My expected return should be:
EmployeeID_PK Address StatusDate
1 15 Apple Ln, NY 12345 7/15/2018
2 10 Lonesome Pine, KY 15487 12/4/2018
3 100 Plaze Place, LA 14563 6/17/2018

You can use a correlated subquery:
select a.*
from address as a
where a.statusdate = (select max(a2.statusdate)
from address as a2
where a2.EmployeeID_PK = a.EmployeeID_PK
);

Using a CTE with a ranking function works neatly.
;WITH empaddress AS
(
SELECT person.personid_pk,
address.address,
address.statusdate,
Dense_rank() OVER (partition BY id ORDER BY statusdate DESC) AS d_rank
FROM person
INNER JOIN address
ON person.[PersonID_PK] = address.personid_fk; )
SELECT person.personid_pk,
address.address,
address.statusdate
FROM empaddress
WHERE d_rank = 1;

Thanks for the help. I modified Gordon's code to support my query, the following provided the answer.
SELECT Person.PersonID_PK, Address.Address, Address.StatusDate
FROM Person
INNER JOIN Address
ON Person.[PersonID_PK] = Address.PersonID_FK
WHERE (((Address.StatusDate)=(SELECT MAX(Address.StatusDate)
FROM Address
WHERE Person.PersonID_PK = Address.PersonID_FK)));

Related

Sql Query: How to Base on the row name to display

I have the table data as listed on below:
name | score
andy | 1
leon | 2
aaron | 3
I want to list out as below, even no jacky's data, but list his name and score set to 0
aaron 3
andy 2
jacky 0
leon 2
You didn't specify your DBMS, but the following is 100% standard ANSI SQL:
select v.name, coalesce(t.score, 0) as score
from (
values ('andy'),('leon'),('aaron'),('jacky')
) as v(name)
left join your_table t on t.name = v.name;
The values clause builds up a "virtual table" that contains the names you are interested in. Then this is used in a left join so that all names from the virtual table are returned plus the existing scores from your (unnamed table). For non-existing scores, NULL is returned which is turned to 0 using coalesce()
If you only want to specify the missing names, you can use a UNION in the virtual table:
select v.name, coalesce(t.score, 0) as score
from (
select t1.name
from your_table t1
union
select *
from ( values ('jacky')) as x
) as v(name)
left join your_table t on t.name = v.name;
fixed the query, could list out the data, but still missing jacky, only could list out as shown on below, the DBMS. In SQL is SQL2008.
data
name score scoredate
andy 1 2021-08-10 01:23:16
leon 2 2021-08-10 03:25:16
aaron 3 2021-08-10 06:25:16
andy 4 2021-08-10 11:25:16
leon 5 2021-08-10 13:25:16
result set
name | score
aaron | 1
andy | 5
leon | 7
select v.name as Name,
coalesce(sum(t.score),0) as Score
from (
values ('aaron'), ('andy'), ('jacky'), ('leon')
) as v(name)
left join Score t on t.name=v.name
where scoredate>='2021-08-10 00:00:00'
and scoredate<='2021-08-10 23:59:59'
group by v.name
order by v.name asc
Your question lacks a bunch of information, such as where "Jacky"s name comes from. If you have a list of names that you know are not in the table, just use union all:
select name, score
from t
union all
select 'Jacky', 0;

SQL: Finding duplicate records based on custom criteria

I need to find duplicates based on two tables and based on custom criteria. The following determines whether it's a duplicate, and if so, show only the most recent one:
If Employee Name and all EmployeePolicy CoverageId(s) are an exact match another record, then that's considered a duplicate.
--Employee Table
EmployeeId Name Salary
543 John 54000
785 Alex 63000
435 John 75000
123 Alex 88000
333 John 67000
--EmployeePolicy Table
EmployeePolicyId EmployeeId CoverageId
1 543 8888
2 543 7777
3 785 5555
4 435 8888
5 435 7777
6 123 4444
7 333 8888
8 333 7776
For example, the duplicates in the example above are the following:
EmployeeId Name Salary
543 John 54000
435 John 75000
This is because they are the only ones that have a matching name in the Employee table as well as both have the same exact CoverageIds in the EmployeePolicy table.
Note: EmployeeId 333 also with Name = John is not a match because both of his CoverageIDs are not the same as the other John's CoverageIds.
At first I have been trying to find duplicates the old fashioned way by Grouping records and saying having count(*) > 1, but then quickly realized that it would not work because while in English my criteria defines a duplicate, in SQL the CoverageIDs are different so they are NOT considered duplicates.
By that same accord, I tried something like:
-- Create a TMP table
INSERT INTO #tmp
SELECT *
FROM Employee e join EmployeePolicy ep on e.EmpoyeeId = ep.EmployeeId
SELECT info.*
FROM
(
SELECT
tmp.*,
ROW_NUMBER() OVER(PARTITION BY tmp.Name, tmp.CoverageId ORDER BY tmp.EmployeeId DESC) AS RowNum
FROM #tmp tmp
) info
WHERE
info.RowNum = 1 AND
Again, this does not work because SQL does not see this as duplicates. Not sure how to translate my English definition of duplicate into SQL definition of duplicate.
Any help is most appreciated.
The easiest way is to concatenate the policies into a string. That, alas, is cumbersome in SQL Server. Here is a set-based approach:
with ep as (
select ep.*, count(*) over (partition by employeeid) as cnt
from employeepolicy ep
)
select ep.employeeid, ep2.employeeid
from ep join
ep ep2
on ep.employeeid < ep2.employeeid and
ep.CoverageId = ep2.CoverageId and
ep.cnt = ep2.cnt
group by ep.employeeid, ep2.employeeid, ep.cnt
having count(*) = cnt -- all match
The idea is to match the coverages for different employees. A simple criteria is that the number of coverages need to match. Then, it checks that the number of matching coverages is the actual count.
Note: This puts the employee id pairs in a single row. You can join back to the employees table to get the additional information.
I have not tested the T-SQL but I believe the following should give you the output you are looking for.
;WITH CTE_Employee
AS
(
SELECT E.[Name]
,E.[EmployeeId]
,P.[CoverageId]
,E.[Salary]
FROM Employee E
INNER JOIN EmployeePolicy P ON E.EmployeeId = P.EmployeeId
)
, CTE_DuplicateCoverage
AS
(
SELECT E.[Name]
,E.[CoverageId]
FROM CTE_Employee E
GROUP BY E.[Name], E.[CoverageId]
HAVING COUNT(*) > 1
)
SELECT E.[EmployeeId]
,E.[Name]
,MAX(E.[Salary]) AS [Salary]
FROM CTE_Employee E
INNER JOIN CTE_DuplicateCoverage D ON E.[Name] = D.[Name] AND E.[CoverageId] = D.[CoverageId]
GROUP BY E.[EmployeeId], E.[Name]
HAVING COUNT(*) > 1
ORDER BY E.[EmployeeId]

SQL - Return unique rows based on 2 columns and a condition

I have a table on HSQLDB with data as
Id Account Opendate Baldate LastName ........ State
1 1234 040111 041217 Jackson AZ
2 1234 040111 051217 James FL
3 2345 050112 061213 Thomas CA
4 2345 050112 061213 Kay DE
How can i write a query that gives me rows that have distinct values in Account and Opendate columns, having the maximum Baldate. If Baldate is also same, then return the first row ordered by Id.
So the resultset should contain
Id Account Opendate Baldate LastName........State
2 1234 040111 051217 James FL
3 2345 050112 061213 Thomas CA
I have gotten this far.
select LastName,...,State, max(BalDate) from ACCOUNTS group by Account, Opendate
But the query fails since I cannot use an aggregate function for columns not in group by (lastname, state etc). How can I resolve this?
HSQLDB supports correlated subqueries, so I think this will work:
select a.*
from accounts a
where a.id = (select a2.id
from accounts a2
where a2.account = a.account and a2.opendate = a.opendate
order by baldate desc, id asc
limit 1
);
I'm not familiar with hslqdb, so this is just ANSI. I see that it doesn't support analytical functions, which would make life easier.
If it works the other answer is a lot cleaner.
SELECT ACC_T1.Id,
ACC_T1.Opendate,
ACC_T1.Baldate
ACC_T1.LastName,
...
ACC_T1.State
FROM Account_Table AS ACC_T1
INNER
JOIN (SELECT account,
OpenDate,
MAX(BalDate) AS m_BalDate
FROM AccountTable
GROUP
BY account,
OpenDate
) AS SB1
ON ACC_T1.Account = SB1.Account
AND ACC_T1.OpenDate = SB1.OpenDate
AND ACC_T1.BalDate = SB1.m_BalDate
INNER
JOIN (SELECT account,
OpenDate,
BalDate,
MIN(id) AS m_id
FROM Account_Table
GROUP
BY account,
OpenDate,
BalDate
) AS SB2
ON ACC_T1.Account = SB2.Account
AND ACC_T1.OpenDate = SB2.OpenDate
AND ACC_T1.BalDate = SB2.BalDate
AND ACC_T1.id = SB2.m_id

Query to select distinct values from different tables and not have them repeat (show them as a flat file)

I'm trying to get all phones, emails, and organizations for a person and show it in a flat file format. There should be n number of rows, where n is the max count of organizations, emails, or phones. NULL values will be shown once all values have been shown in the rows, with NULL being the last values. The emails and phones can only have 1 PreferredInd per person. I want these to be on the same row (1 of them can be NULL). I've tried to do this on a more complex query, but couldn't get it to work, so I've started over using this simpler example.
Example tables and values:
#ContactPerson
Id Name
1 John Doe
#ContactEmail
Id PersonId Email PreferredInd
1 1 johndoe#us.gov 0
2 1 jdoe#us.gov 1
3 1 johndoe#gmail.com 0
#ContactPhone
Id PersonId Phone PreferredInd
1 1 888-867-5309 0
2 1 305-476-5234 1
#ContactOrganization
Id PersonId Organization
1 1 US Government
2 1 US Army
I want a resulting set to look like:
Name Organization PreferredInd Email Phone
John Doe US Government 1 jdoe#us.gov 888-867-5309
John Doe US Army 0 johndoe#us.gov 305-467-5234
John Doe NULL 0 johndoe#gmail.com NULL
The complete sql code that I have for this example is here on pastebin. It also includes code to create the sample tables. It works when the count of emails exceeds the count of organizations or phones, but that won't always be true. I can't seem to figure out how to get the result that I'm looking for. The actual tables I'm working with can have 0 or infinity emails, phones, or organizations per person. There will also be many more values, but I can fix that myself.
Can you help me fix my query or show me a simpler way to do it? If you have any questions, just let me know and I can try to answer them.
something like this?
with cte_e as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactEmail
), cte_p as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactPhone
), cte_o as (
select
*,
row_number() over(order by Organization) as rn
from ContactOrganization
), cte_d as (
select distinct rn, PersonId from cte_e union
select distinct rn, PersonId from cte_p union
select distinct rn, PersonId from cte_o
)
select
pr.Name, o.Organization, e.Email, p.Phone
from cte_d as d
left outer join ContactPerson as pr on pr.Id = d.PersonId
left outer join cte_e as e on e.PersonId = d.PersonId and e.rn = d.rn
left outer join cte_p as p on p.PersonId = d.PersonId and p.rn = d.rn
left outer join cte_o as o on o.PersonId = d.PersonId and o.rn = d.rn
sql fiddle demo
it's a bit clumsy, I can think of couple of other possible ways to do this, but I think this one is most readable one
Step 1
Write a query that does the full join of all the tables, which will end up with lots of duplicate rows for each person (for each email or phone number)
Step 2
Write a second query that uses GroupBy to group the rows, and that uses the Case or Decode keywords (like a c# switch statement) to find the preferred row value and select it as the value to display

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks
SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name
Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;
Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.
What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3