Select distinct where date is max - sql

This feels really stupid to ask, but i can't do this selection in SQL Server Compact (CE)
If i have two tables like this:
Statuses Users
id | status | thedate id | name
------------------------- -----------------------
0 | Single | 2014-01-01 0 | Lisa
0 | Engaged | 2014-01-02 1 | John
1 | Single | 2014-01-03
0 | Divorced | 2014-01-04
How can i now select the latest status for each person in statuses?
the result should be:
Id | Name | Date | Status
--------------------------------
0 | Lisa | 2014-01-04 | Divorced
1 | John | 2014-01-03 | Single
that is, select distinct id:s where the date is the highest, and join the name. As bonus, sort the list so the latest record is on top.

In SQL Server CE, you can do this using a join:
select u.id, u.name, s.thedate, s.status
from users u join
statuses s
on u.id = s.id join
(select id, max(thedate) as mtd
from statuses
group by id
) as maxs
on s.id = maxs.id and s.thedate = maxs.mtd;
The subquery calculates the maximum date and uses that as a filter for the statuses table.

Use the following query:
SELECT U.Id AS Id, U.Name AS Name, S.thedate AS Date, S.status AS Status
FROM Statuses S
INNER JOIN Users U on S.id = U.id
WHERE S.thedate IN (
SELECT MAX(thedate)
FROM statuses
GROUP BY id);

Related

SUM CASE when DISTINCT?

Joining two tables and grouping, we're trying to get the sum of a user's value but only include a user's value once if that user is represented in a grouping multiple times.
Some sample tables:
user table:
| id | net_worth |
------------------
| 1 | 100 |
| 2 | 1000 |
visit table:
| id | location | user_id |
-----------------------------
| 1 | mcdonalds | 1 |
| 2 | mcdonalds | 1 |
| 3 | mcdonalds | 2 |
| 4 | subway | 1 |
We want to find the total net worth of users visiting each location. User 1 visited McDonalds twice, but we don't want to double count their net worth. Ideally we can use a SUM but only add in the net worth value if that user hasn't already been counted for at that location. Something like this:
-- NOTE: Hypothetical query
SELECT
location,
SUM(CASE WHEN DISTINCT user.id then user.net_worth ELSE 0 END) as total_net_worth
FROM visit
JOIN user on user.id = visit.user_id
GROUP BY 1;
The ideal output being:
| location | total_net_worth |
-------------------------------
| mcdonalds | 1100 |
| subway | 100 |
This particular database is Redshift/PostgreSQL, but it would be interesting if there is a generic SQL solution. Is something like the above possible?
You don't want to consider duplicate entries in the visits table. So, select distinct rows from the table instead.
SELECT
v.location,
SUM(u.net_worth) as total_net_worth
FROM (SELECT DISTINCT location, user_id FROM visit) v
JOIN user u on u.id = v.user_id
GROUP BY v.location
ORDER BY v.location;
You can use a window function to get the unique users, then join that to the user table:
select v.location, sum(u.net_worth)
from "user" u
join (
select location, user_id,
row_number() over (partition by location, user_id) as rn
from visit
order by user_id, location, id
) v on v.user_id = u.id and v.rn = 1
group by v.location;
The above is standard ANSI SQL, in Postgres this can also be expressed using distinct on ()
select v.location, sum(u.net_worth)
from "user" u
join (
select distinct on (user_id, location) *
from visit
order by user_id, location, id
) v on v.user_id = u.id
group by v.location;
You can join the user table with distinct values of location & user id combination like the below generic SQL.
SELECT v.location, SUM(u.net_worth)
FROM (SELECT location, user_id FROM visit GROUP BY location, user_id) v
JOIN user u on u.id = v.user_id
GROUP BY v.location;

Selecting the most recent entry on a timestamp cell

I have these two tables:
User:
=======================
id | Name | Email
=======================
1 | User-A| a#mail
2 | User-B| b#mail
=======================
Entry:
=================================================
id | agree | createdOn | userId
=================================================
1 | true | 2020-11-10 19:22:23 | 1
2 | false | 2020-11-10 22:22:23 | 1
3 | true | 2020-11-11 12:22:23 | 1
4 | true | 2020-11-04 22:22:23 | 2
5 | false | 2020-11-12 02:22:23 | 2
================================================
I need to get the following result:
=============================================================
Name | Email | agree | createdOn
=============================================================
User-A | a#mail | true | 2020-11-11 22:22:23
User-B | b#mail | false | 2020-11-12 02:22:23
=============================================================
The Postgres query I'm running is:
select distinct on (e."createdOn", u.id)
u.id , e.id ,u."Name" , u.email, e.agree, e."createdOn" from "user" u
inner join public.entry e on u."id" = e."userId"
order by "createdOn" desc
But the problem is that it returns all the entries after doing the join! where I only want the most recent entry by the createdOn cell.
You want the latest entry per user. For this, you need the user id in the distinct on clause, and no other column. This guarantees one row in the resultset per user.
Then, you need to put that column first in the order by clause, followed by createdOn desc. This breaks the ties and decides which row will be retained in each group:
select distinct on (u.id) u.id , e.id ,u."Name" , u.email, e.agree, e."createdOn"
from "user" u
inner join public.entry e on u."id" = e."userId"
order by u.id, "createdOn" desc
You can also use row_number to select the latest rows then do the join
SELECT * FROM USER A
LEFT JOIN (
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY USERID ORDER BY CREATEDON DESC) AS RN
FROM ENTRY
) K WHERE RN = 1
) B
ON A.ID = B.USERID
Try Having createdon = max(createdon) function. Group By User

SQL server matching two table on a column

I have two tables one storing user skills another storing skills required for a job. I want to match how many skills a of each user matches with a job.
The table structure is
Table1: User_Skills
| ID | User_ID | Skill |
---------------------------
| 1 | 1 | .Net |
---------------------------
| 2 | 1 | Software|
---------------------------
| 3 | 1 | Engineer|
---------------------------
| 4 | 2 | .Net |
---------------------------
| 5 | 2 | Software|
---------------------------
Table2: Job_Skills_Requirement
| ID | Job_ID | Skill |
--------------------------
| 1 | 1 | .Net |
---------------------------
| 2 | 1 | Engineer|
---------------------------
| 3 | 1 | HTML |
---------------------------
| 4 | 2 | Software|
---------------------------
| 5 | 2 | HTML |
---------------------------
I was trying to have comma separated skills and compare but these can be in different order.
Edit
All the answers here are excellent. The result I am looking for is matching all jobs with all users as later on I will match other properties as well.
You could join the tables by the skill columns and count the matches:
SELECT user_id, job_id, COUNT(*) AS matching_skills
FROM user_skills u
JOIN job_skills_requirement j ON u.skill = j.skill
GROUP BY user_id, job_id
EDIT:
IF you want to also show users and jobs that have no matching skills, you can use a full outer join instead.
SELECT user_id, job_id, COUNT(*) AS matching_skills
FROM user_skills u
FULL OUTER JOIN job_skills_requirement j ON u.skill = j.skill
GROUP BY user_id, job_id
EDIT 2:
As Jiri Tousek commented, the above query will produce nulls where there's no match between a user and a job. If you want a full Cartesian products between them, you could use (abuse?) the cross join syntax and count how many skills actually match between each user and each job:
SELECT user_id,
job_id,
COUNT(CASE WHEN u.skill = j.skill THEN 1 END) AS matching_skills
FROM user_skills u
CROSS JOIN job_skills_requirement j
GROUP BY user_id, job_id
If you want to match all users and all jobs, then Mureinik's otherwise excellent answer is not correct.
You need to generate all the rows first, which I would do using a cross join and then count the matching ones:
select u.user_id, j.job_id, count(jsr.job_id) as skills_in_common
from users u cross join
jobs j left join
user_skills us
on us.user_id = u.user_id left join
Job_Skills_Requirement jsr
on jsr.job_id = j.job_id and
jsr.skill = us.skill
group by u.user_id, j.job_id;
Note: This assumes the existence of a users and a jobs table. You can of course generate these using subqueries.
WITH User_Skills(ID,User_ID,Skill)AS(
SELECT 1,1,'.Net' UNION ALL
SELECT 2,1,'Software' UNION ALL
SELECT 3,1,'Engineer' UNION ALL
SELECT 4,2,'.Net' UNION ALL
SELECT 5,2 ,'Software'
),Job_Skills_Requirement(ID,Job_ID,Skill)AS(
SELECT 1,1,'.Net' UNION ALL
SELECT 2,1,'Engineer' UNION ALL
SELECT 3,1,'HTML' UNION ALL
SELECT 4,2,'Software' UNION ALL
SELECT 5,2 ,'HTML'
),Job_User_Skill AS (
SELECT j.Job_ID,u.User_ID,u.Skill
FROM Job_Skills_Requirement AS j INNER JOIN User_Skills AS u ON u.Skill=j.Skill
)
SELECT jus.Job_ID,jus.User_ID,COUNT(jus.Skill),STUFF(c.Skills,1,1,'') AS Skill
FROM Job_User_Skill AS jus
CROSS APPLY(SELECT ','+j.Skill FROM Job_User_Skill AS j WHERE j.Job_ID=jus.Job_ID AND j.User_ID=jus.User_ID FOR XML PATH('')) c(Skills)
GROUP BY jus.Job_ID,jus.User_ID,c.Skills
ORDER BY jus.Job_ID
Job_ID User_ID Skill
----------- ----------- ----------- -------------
1 1 2 .Net,Engineer
1 2 1 .Net
2 1 1 Software
2 2 1 Software

Count how many times a value appears in tables SQL

Here's the situation:
So, in my database, a person is "responsible" for job X and "linked" to job Y. What I want is a query that returns: name of person, his ID and he number of jobs it's linked/responsible. So far I got this:
select id_job, count(id_job) number_jobs
from
(
select responsible.id
from responsible
union all
select linked.id
from linked
GROUP BY id
) id_job
GROUP BY id_job
And it returns a table with id in the first column and number of occurrences in the second. Now, what I can't do is associate the name of person to the table. When i put that in the "select" from beginning it gives me all the possible combinations... How can I solve this? Thanks in advance!
Example data and desirable output:
| Person |
id | name
1 | John
2 | Francis
3 | Chuck
4 | Anthony
| Responsible |
process_no | id
100 | 2
200 | 2
300 | 1
400 | 4
| Linked |
process_no | id
101 | 4
201 | 1
301 | 1
401 | 2
OUTPUT:
| OUTPUT |
id | name | number_jobs
1 | John | 3
2 | Francis | 3
3 | Chuck | 0
4 | Anthony | 2
Try this way
select prs.id, prs.name, count(*) from Person prs
join(select process_no, id
from Responsible res
Union all
select process_no, id
from Linked lin ) a on a.id=prs.id
group by prs.id, prs.name
I would recommend aggregating each of the tables by the person and then joining the results back to the person table:
select p.*, coalesce(r.cnt, 0) + coalesce(l.cnt, 0) as numjobs
from person p left join
(select id, count(*) as cnt
from responsible
group by id
) r
on r.id = p.id left join
(select id, count(*) as cnt
from linked
group by id
) l
on l.id = p.id;
select id, name, count(process_no) FROM (
select pr.id, pr.name, res.process_no from Person pr
LEFT JOIN Responsible res on pr.id = res.id
UNION
select pr.id, pr.name, lin.process_no from Person pr
LEFT JOIN Linked lin on pr.id = lin.id) src
group by id, name
order by id
Query ain't tested, give it a shot, but this is the way you want to go

Select one row from non unique rows based on row value

I have a quiz table
id | user_id | quiz_id
--------------------------
1 | 34567 | 12334
2 | 34567 | 12334
3 | 34567 | 23455
id 1 and 2 depicts a quiz that can be assigned to the same user twice
and a quiz transaction table
id | date | status
------------------------
1 | 2014 | assigned
2 | 2014 | assigned
3 | 2014 | assigned
------------------------
1 | 2014 | completed
id is foreign key to quiz table id, the last row depicts whenever a user finished the quiz, the row in the transaction table is updated with status 'completed'
Expected Result: I want a table with a structure like
id | user_id| course_id | date | status
------------------------------------------
1 | 34567 | 12334 | 2014 | completed
2 | 32567 | 12334 | 2014 | assigned
3 | 2014 | 23455 | 2014 | assigned
My query is
SELECT q.id, q.user_id, q.course_id, qt.date, qt.status FROM quiz q
LEFT JOIN
quiz_transaction qt ON
q.id = qt.id
but it gives me extra row (as the query will)
1 | 34567 | 12334 | 2014 | assigned
I cannot use
ON qt.type = 'completed'
Because if its completed it should return a completed row and if not it should return an assigned row but not both.
So in the result I cannot have
1 | 34567 | 12334 | 2014 | completed
1 | 34567 | 12334 | 2014 | assigned
How can I do it?
How about simply using the MAX() function with GROUP BY (SQL Fiddle):
SELECT q.id, q.user_id, q.course_id, qt.date, MAX(qt.status) AS Status
FROM quiz q
LEFT JOIN quiz_transaction qt ON q.id = qt.id
GROUP BY q.id, q.user_id, q.course_id, qt.date
EDIT: If you need to order a string a certain way, you could use a CASE statement to convert the string to a number. Get the MAX value and then convert it back (SQL Fiddle):
SELECT m.id, m.user_id, m.quiz_id, MAX(m.date),
CASE WHEN MAX(m.status) = 1 THEN 'assigned'
WHEN MAX(m.status) = 2 THEN 'doing'
WHEN MAX(m.status) = 3 THEN 'completed' END AS Status
FROM
(
SELECT q.id, q.user_id, q.quiz_id, qt.date,
CASE WHEN qt.status = 'assigned' THEN 1
WHEN qt.status = 'doing' THEN 2
WHEN qt.status = 'completed' THEN 3 END AS Status
FROM quiz q
LEFT JOIN quiz_transaction qt ON q.id = qt.id
) AS m
GROUP BY m.id, m.user_id, m.quiz_id;
Depending on your release SQLnServer supports Standard SQL's "Windowed Aggregate Functions". ROW_NUMBER will give you a single row:
SELECT
q.id
,q.user_id
,q.quiz_id
,qt.date
,qt.status
FROM quiz q
JOIN
(
SELECT
id
,date
,status
,ROW_NUMBER()
OVER (PARTITION BY id
ORDER BY Status DESC) as rn
FROM quiz_transaction
) as qt
ON q.id = qt.id
WHERE rn = 1
If you got more complex ordering rules you need to use a CASE:
,ROW_NUMBER()
OVER (PARTITION BY id
ORDER BY CASE Status WHEN 'completed' THEN 1
WHEN 'doing' THEN 2
WHEN 'assigned' THEN 3
END) as rn
try this
SELECT q.id, q.user_id, q.course_id, q1.date, qt.status FROM quiz q
LEFT JOIN
(Select id , convert(varchar,max(convert(varbinary,status ))) 'Status'
from quiz_transaction
group by id
) qt ON
q.id = qt.id
left join quiz_transaction q1 on q1.id = qt.id and q1.status=qt.status