Selecting the most recent entry on a timestamp cell - sql

I have these two tables:
User:
=======================
id | Name | Email
=======================
1 | User-A| a#mail
2 | User-B| b#mail
=======================
Entry:
=================================================
id | agree | createdOn | userId
=================================================
1 | true | 2020-11-10 19:22:23 | 1
2 | false | 2020-11-10 22:22:23 | 1
3 | true | 2020-11-11 12:22:23 | 1
4 | true | 2020-11-04 22:22:23 | 2
5 | false | 2020-11-12 02:22:23 | 2
================================================
I need to get the following result:
=============================================================
Name | Email | agree | createdOn
=============================================================
User-A | a#mail | true | 2020-11-11 22:22:23
User-B | b#mail | false | 2020-11-12 02:22:23
=============================================================
The Postgres query I'm running is:
select distinct on (e."createdOn", u.id)
u.id , e.id ,u."Name" , u.email, e.agree, e."createdOn" from "user" u
inner join public.entry e on u."id" = e."userId"
order by "createdOn" desc
But the problem is that it returns all the entries after doing the join! where I only want the most recent entry by the createdOn cell.

You want the latest entry per user. For this, you need the user id in the distinct on clause, and no other column. This guarantees one row in the resultset per user.
Then, you need to put that column first in the order by clause, followed by createdOn desc. This breaks the ties and decides which row will be retained in each group:
select distinct on (u.id) u.id , e.id ,u."Name" , u.email, e.agree, e."createdOn"
from "user" u
inner join public.entry e on u."id" = e."userId"
order by u.id, "createdOn" desc

You can also use row_number to select the latest rows then do the join
SELECT * FROM USER A
LEFT JOIN (
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY USERID ORDER BY CREATEDON DESC) AS RN
FROM ENTRY
) K WHERE RN = 1
) B
ON A.ID = B.USERID

Try Having createdon = max(createdon) function. Group By User

Related

SUM CASE when DISTINCT?

Joining two tables and grouping, we're trying to get the sum of a user's value but only include a user's value once if that user is represented in a grouping multiple times.
Some sample tables:
user table:
| id | net_worth |
------------------
| 1 | 100 |
| 2 | 1000 |
visit table:
| id | location | user_id |
-----------------------------
| 1 | mcdonalds | 1 |
| 2 | mcdonalds | 1 |
| 3 | mcdonalds | 2 |
| 4 | subway | 1 |
We want to find the total net worth of users visiting each location. User 1 visited McDonalds twice, but we don't want to double count their net worth. Ideally we can use a SUM but only add in the net worth value if that user hasn't already been counted for at that location. Something like this:
-- NOTE: Hypothetical query
SELECT
location,
SUM(CASE WHEN DISTINCT user.id then user.net_worth ELSE 0 END) as total_net_worth
FROM visit
JOIN user on user.id = visit.user_id
GROUP BY 1;
The ideal output being:
| location | total_net_worth |
-------------------------------
| mcdonalds | 1100 |
| subway | 100 |
This particular database is Redshift/PostgreSQL, but it would be interesting if there is a generic SQL solution. Is something like the above possible?
You don't want to consider duplicate entries in the visits table. So, select distinct rows from the table instead.
SELECT
v.location,
SUM(u.net_worth) as total_net_worth
FROM (SELECT DISTINCT location, user_id FROM visit) v
JOIN user u on u.id = v.user_id
GROUP BY v.location
ORDER BY v.location;
You can use a window function to get the unique users, then join that to the user table:
select v.location, sum(u.net_worth)
from "user" u
join (
select location, user_id,
row_number() over (partition by location, user_id) as rn
from visit
order by user_id, location, id
) v on v.user_id = u.id and v.rn = 1
group by v.location;
The above is standard ANSI SQL, in Postgres this can also be expressed using distinct on ()
select v.location, sum(u.net_worth)
from "user" u
join (
select distinct on (user_id, location) *
from visit
order by user_id, location, id
) v on v.user_id = u.id
group by v.location;
You can join the user table with distinct values of location & user id combination like the below generic SQL.
SELECT v.location, SUM(u.net_worth)
FROM (SELECT location, user_id FROM visit GROUP BY location, user_id) v
JOIN user u on u.id = v.user_id
GROUP BY v.location;

SQL Query Inactive Users with last end date

This is a follow up from a question I asked about a year ago Old thread
The answers I got then have worked fine but now I discovered that I need to tweak the query to be able to get the latest end date for the inactive users.
So again here's a quick example table of users, some are active and some are inactive and some have several period of employment.
when someone is reemployed a new row will be added for that employment period.
Username will always be the same.
So I want to find which users that is disabled and doesn't have an active employment also if there is several period of employment I want the one that has the latest end date. One row per username with all the columns.
The database is SQL Server 2016.
Example table:
| username | name | active | Job title | enddate
+-----------+----------- +--------+-------------+----------
| 1111 | Jane Doe | 1 | CIO | 1/3/2022
| 1111 | Jane Doe | 0 | Janitor | 1/2/2018
| 1112 | Bob Doe | 1 | Coder | NULL
| 1113 | James Doe | 0 | Coder | 1/3/2018
| 1114 | Ray Doe | 1 | Manager | NULL
| 1114 | Ray Doe | 0 | Clerk | 2/2/2019
| 1115 | Emma Doe | 1 | Waiter | NULL
| 1116 | Sarah Doe | 0 | Greeter | 3/4/2016
| 1116 | Sarah Doe | 0 | Trainer | 4/5/2019
So for user 1116 I would ideally get one row with enddate 4/5/2019
The query I use from the answers in the old thread is this one:
;WITH NonActiveDisabledUsers AS
(
SELECT DISTINCT
U.username
FROM
UserEmployment AS U
WHERE
U.active = 0 AND
NOT EXISTS (SELECT 'no current active employment'
FROM UserEmployment AS C
WHERE U.username = C.username AND
C.active = 1 AND
(C.enddate IS NULL OR C.enddate >= CONVERT(DATE, GETDATE())))
)
SELECT
R.*
FROM
NonActiveDisabledUsers AS N
CROSS APPLY (
SELECT TOP 1 -- Just 1 record
U.*
FROM
UserEmployment AS U
WHERE
N.username = U.username AND
U.active = 0
ORDER BY
U.enddate DESC -- Determine which record should we display
) AS R
This gives me the right user and employment status but not the latest end date since it will get the first result for user 1116
We can use conditional aggregation with a window aggregate to get the number of active rows for this user.
We then filter to only inactive, and row-number the result by enddate taking the first row per group:
SELECT
username,
name,
active,
[Job title],
enddate
FROM (
SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY username ORDER BY enddate DESC)
FROM (
SELECT *,
CountOfActive = COUNT(CASE WHEN
Active = 1 AND
(enddate IS NULL OR enddate >= CONVERT(DATE, GETDATE())) THEN 1 END
) OVER (PARTITION BY username)
FROM UserEmployment
) AS t
WHERE CountOfActive = 0
) AS t
WHERE rn = 1;
Note that the row-numbering does not take into account nulls in enddate which would be sorted last. You would need a conditional ordering:
ROW_NUMBER() OVER (PARTITION BY username ORDER BY CASE WHEN enddate IS NULL THEN 0 ELSE 1 END ASC, enddate DESC)
Hmmm . . . I think you can just get the most recent record and check that it is not active:
select ue.*
from (select ue.*,
row_number() over (partition by user_id
order by active desc, enddate desc
) as seqnum
from UserEmployment ue
) ue
where seqnum = 1 and active = 0;
Here is a variant that will produce the desired result:
SELECT distinct username, max(enddate)
FROM UserEmployment as t1
WHERE
t1.active = 0 AND
NOT EXISTS (select username from UserEmployment as t2 WHERE active = 1 AND
(t2.enddate IS NULL OR t2.enddate >= CONVERT(DATE, GETDATE())) AND
t1.username = t2.username)
GROUP BY username

Exclude first record associated with each parent record in Postgres

There are 2 tables, users and job_experiences.
I want to return a list of all job_experiences except the first associated with each user.
users
id
---
1
2
3
job_experiences
id | start_date | user_id
--------------------------
1 | 201001 | 1
2 | 201201 | 1
3 | 201506 | 1
4 | 200901 | 2
5 | 201005 | 2
Desired result
id | start_date | user_id
--------------------------
2 | 201201 | 1
3 | 201506 | 1
5 | 201005 | 2
Current query
select
*
from job_experiences
order by start_date asc
offset 1
But this doesn't work as it would need to apply the offset to each user individually.
You can do this with a lateral join:
select je.*
from users u cross join lateral
(select je.*
from job_experiences je
where u.id = je.user_id
order by id
offset 1 -- all except the first
) je;
For performance, an index on job_experiences(user_id, id) is recommended.
use row_number() window function
with cte as
(
select e.*,
row_number()over(partition by user_id order by start_date desc) rn,
count(*) over(partition by user_id) cnt
from users u join job_experiences e on u.id=e.user_id
)
, cte2 as
(
select * from cte
) select * from cte2 t1
where rn<=(select max(cnt)-1 from cte2 t2 where t1.user_id=t2.user_id)
You could use an intermediate CTE to get the first (MIN) jobs for each user, and then use that to determine which records to exclude:
WITH user_first_je("user_id", "job_id") AS
(
SELECT "user_id", MIN("id")
FROM job_experiences
GROUP BY "user_id"
)
SELECT job_experiences.*
FROM job_experiences
LEFT JOIN user_first_je ON
user_first_je.job_id = job_experiences.id
WHERE user_first_je.job_id IS NULL;

sql for status using most recent date

I've searched several other SO questions but I still can't seem to get the right result. I would like to get all records from the Document table based on CaseId and having the most recent Status. I have two tables:
Document:
DocumentId | CaseId | Name
----------------------------------------
2 | 23 | Document 1
3 | 23 | Document 2
4 | 24 | Document 3
AuditLog:
AuditLogId | Status | DocumentId | Date Created
---------------------------------------------------------
10 | Active | 2 | 4/2/2017
11 | Draft | 2 | 4/1/2017
12 | Released | 2 | 4/3/2017
13 | Draft | 3 | 4/17/2017
14 | Draft | 4 | 4/17/2017
So the desired result for CaseId: 23 would be:
Status | DocumentId | CaseId | Name
----------------------------------------------
Released | 2 | 23 | Document 1
Draft | 3 | 23 | Document 2
I have got close with this query, however this only gives me the most recent of all results for CaseId 23, rather than grouping by DocumentId:
Select s.Status, lh.* from LegalHold lh join(
Select Status, LegalHoldId
FROM LegalHoldAuditLog
WHERE DateCreated = (select max(DateCreated)
from LegalHoldAuditLog)) s on lh.LegalHoldId = s.LegalHoldId
WHERE lh.CaseId = 23
using cross apply() to get the latest Status for each DocumentId.
select d.*, al.Status
from Document d
cross apply (
select top 1 i.Status
from AuditLog i
where i.DocumentId = d.DocumentId
order by i.date_created desc
) as al
where d.CaseId = 23
top with ties version using row_number() :
select top 1 with ties d.*, al.Status
from Document d
inner join AuditLog al
on d.DocumentId = al.DocumentId
order by row_number() over (partition by al.DocumentId order by al.date_created desc)
I think this would be at least one way to do it.
All you need is the value of Status from the latest row, according to the Date Created column in the AuditLog table.
SELECT (SELECT TOP 1 Status
FROM AuditLog
WHERE AuditLog.DocumentId = Document.DocumentId
ORDER BY [Date Created] DESC),
DocumentId, CaseId, Name
FROM Document
WHERE CaseId = 23
And shouldn't the value for Document 2 be "Draft"?
You can use a Common Table Expression with ROW_NUMBER in order to prioritize records of AuditLog table. Then join to Document table to get expected result:
;WITH AuditLog_Rn AS (
SELECT Status, DocumentId,
ROW_NUMBER() OVER (PARTITION BY DocumentId
ORDER BY [Date Created] DESC) AS rn
FROM AuditLog
)
SELECT d.DocumentId, d.CaseId, d.Name, al.Status
FROM Document AS d
JOIN AuditLog_Rn AS al ON d.DocumentId = al.DocumentId AND al.rn = 1

Select distinct where date is max

This feels really stupid to ask, but i can't do this selection in SQL Server Compact (CE)
If i have two tables like this:
Statuses Users
id | status | thedate id | name
------------------------- -----------------------
0 | Single | 2014-01-01 0 | Lisa
0 | Engaged | 2014-01-02 1 | John
1 | Single | 2014-01-03
0 | Divorced | 2014-01-04
How can i now select the latest status for each person in statuses?
the result should be:
Id | Name | Date | Status
--------------------------------
0 | Lisa | 2014-01-04 | Divorced
1 | John | 2014-01-03 | Single
that is, select distinct id:s where the date is the highest, and join the name. As bonus, sort the list so the latest record is on top.
In SQL Server CE, you can do this using a join:
select u.id, u.name, s.thedate, s.status
from users u join
statuses s
on u.id = s.id join
(select id, max(thedate) as mtd
from statuses
group by id
) as maxs
on s.id = maxs.id and s.thedate = maxs.mtd;
The subquery calculates the maximum date and uses that as a filter for the statuses table.
Use the following query:
SELECT U.Id AS Id, U.Name AS Name, S.thedate AS Date, S.status AS Status
FROM Statuses S
INNER JOIN Users U on S.id = U.id
WHERE S.thedate IN (
SELECT MAX(thedate)
FROM statuses
GROUP BY id);