Removing duplicates *after* ordering in a SQL query - sql

I have the following SQL query:
select
id,
name
from
project
inner join job on project.id = job.project_id
where
job.user_id = 'me'
order by
project.modified desc
limit 10
The idea is to get information about the 10 most recently used projects for a given user.
The problem is that this can return duplicates in the case where multiple jobs have the same project. Instead of having duplicates, I want to order all rows by modified desc, remove duplicates based on id and name, then limit to 10.
I've not been able to figure out how to achieve this. Can anyone point me in the right direction?

You are getting duplicates because of the join. As you only want columns from the project table (I assume id and name are from that table), not creating the duplicates in the first place would be better than removing them after the join:
select p.id,
p.name
from project p
where exists (select *
from job job
where job.project_id = p.id
and job.user_id = 'me')
order by p.modified desc
limit 10

try this using row_number - it will work on postgresql
select * from
(select
id,
name,row_number() over(partition by id,name order by modified desc) as rn
from
project inner join job on project.id = job.project_id
where
job.user_id = 'me')a where rn=1 order by modified desc limit 10

Did you try SELECT DISTINCT?
select distinct
id,
name
from
project
inner join job on project.id = job.project_id
where
job.user_id = 'me'
order by
project.modified desc
limit 10

I ended up with the following, which seems to work fine:
select
p.id,
p.name
from
project p
inner join
(
select
j.id,
max(j.modified) as max_modified
from
job j
where
t.user_id = 'me'
group by
j.id
order by
max_modified desc
limit 10
) ids on p.id = ids.id
order by
max_modified desc

Related

How to use DISTINCT ON but ORDER BY another expression?

The model Subscription has_many SubscriptionCart.
A SubscriptionCart has a status and an authorized_at date.
I need to pick the cart with the oldest authorized_at date from all the carts associated to a Subscription, and then I have to order all the returned Subscription results by this subscription_carts.authorized_at column.
The query below is working but I can't figure out how to select DISTINCT ON subscription.id to avoid duplicates but ORDER BY subscription_carts.authorized_at .
raw sql query so far:
select distinct on (s.id) s.id as subscription_id, subscription_carts.authorized_at, s.*
from subscriptions s
join subscription_carts subscription_carts on subscription_carts.subscription_id = s.id
and subscription_carts.plan_id = s.plan_id
where subscription_carts.status = 'processed'
and s.status IN ('authorized','in_trial', 'paused')
order by s.id, subscription_carts.authorized_at
If I try to ORDER BY subscription_carts.authorized_at first, I get an error because the DISTINCT ON and ORDER BY expressions must be in the same order.
The solutions I've found seem too complicated for what I need and I've failed to implement them because I don't understand them fully.
Would it be better to GROUP BY subscription_id and then pick from that group instead of using DISTINCT ON? Any help appreciated.
This requirement is necessary to make DISTINCT ON work; to change the final order, you can add an outer query with another ORDER BY clause:
SELECT *
FROM (SELECT DISTINCT ON (s.id)
s.id as subscription_id, subscription_carts.authorized_at, s.*
FROM subscriptions s
JOIN ...
WHERE ...
ORDER BY s.id, subscription_carts.authorized_at
) AS subq
ORDER BY authorized_at;
You don't have to use DISTINCT ON. While it is occasionally useful, I personally find window function based approaches much more clear:
-- Optionally, list all columns explicitly, to remove the rn column again
SELECT *
FROM (
SELECT
s.id AS subscription_id,
c.authorized_at,
s.*,
ROW_NUMBER () OVER (PARTITION BY s.id ORDER BY c.authorized_at) rn
FROM subscriptions s
JOIN subscription_carts c
ON c.subscription_id = s.id
AND c.plan_id = s.plan_id
WHERE c.status = 'processed'
AND s.status IN ('authorized', 'in_trial', 'paused')
) t
WHERE rn = 1
ORDER BY subscription_id, authorized_at

simple SQL subquery

It seems im a complete idiot when it comes to SQL....
All i need is get one value from other table, but there is multiple rows with same customerId on second table.. and i would need to get one with highest timestamp
CREATE OR REPLACE VIEW CUS_SETTINGS as
SELECT
c.id as id,
c.LANG as Language,
c.ALLOWEMAIL as AllowEmail,
l.CONFIRMED as confirmed
FROM cus.CUSTOMER c
????? something with l
/
LEFT JOIN will bring every row so i have multiple duplicate id's
What i need is propably subquery, but i cant get it to work...
(SELECT CONFIRMED FROM settings WHERE ?? c.id == l.id ?? AND MAX(TIMESTAMP) )
i've tried many many variations of joins and subqueries.. but for some reason.. SQL is just
too confusing....
You can use a ROW_NUMBER() in the subquery:
SELECT c.id as id, c.LANG as Language, c.ALLOWEMAIL as AllowEmail,
l.CONFIRMED as confirmed
FROM cus.CUSTOMER c JOIN
(SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY s.timestamp DESC) as seqnum
FROM settings s
) s
ON s.id = c.id AND s.seqnum = 1;
Note: You might want a LEFT JOIN if you want to keep all customers, even those with no settings.
You can use analytic functions. MAX() OVER(PARTITION BY) clause can give you max timestamped id.
Analytic Functions Docs 11gR2
SELECT SECONDD.CONFIRMED
FROM CUSTOMER CU
INNER JOIN
(SELECT
*
FROM (SELECT SECONDD.*,
MAX (S.TIMESTAMP) OVER (PARTITION BY S.ID)
AS MAXTIMESTAMP
FROM SETTINGS SECONDD)
WHERE TIMESTAMP = MAXTIMESTAMP) SECONDD
ON SECONDD.ID = CU.ID
Don't worry; while this sounds very basic, it isn't :-)
The easiest way to get the CONFIRMED for the latest TIMESTAMP in Oracle is KEEP LAST. E.g.:
SELECT customer_id, MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp)
FROM settings
GROUP BY customer_id;
The related CREATE VIEW statement:
CREATE OR REPLACE VIEW cus_settings as
SELECT
c.id AS id,
c.lang AS language,
c.allowemail AS allowemail,
l.last_confirmed AS confirmed
FROM cus.customer c
LEFT JOIN
(
SELECT
customer_id,
MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp) AS last_confirmed
FROM settings
GROUP BY customer_id;
) l ON l.customer_id = c.id;
Or:
CREATE OR REPLACE VIEW cus_settings as
SELECT
c.id AS id,
c.lang AS language,
c.allowemail AS allowemail,
(
SELECT MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp)
FROM settings s
WHERE s.customer_id = c.id
) AS confirmed
FROM cus.customer c;

How to filter data out, when using GROUP in SQL

I got this SQL-statement:
SELECT t.*
FROM (SELECT *
FROM accelerator_clinic_patient_status
ORDER BY status_id DESC) t
INNER JOIN accelerator_clinic_campaign_patient_signup signup
ON (signup.patient_id = t.patient_id)
INNER JOIN accelerator_clinic_campaign campaign
ON (signup.campaign_id = campaign.campaign_id
AND campaign.clinic_user_id = 4978)
GROUP BY t.patient_id
The code finds out the latest status in the table "accelerator_clinic_patient_status"
It contains columns patient_id which is a foregin-key, each patient_id can contain multiple status's.
But what I am interested in, is getting a list of patient id whose LATEST status is "Booked" -- I want to filter out Not Booked in the list, when doing the select-statement (for example in the posted image, the data should not be shown in the query, because the latest status is Not Booked). The code right now returns a list with the latest status.
Any idea how to do this?
Try This,
Please Improve the following code syntactically,
select *,Rank() over (partiton by X.Patient_Id order by X.date_added desc) as Rankk from (SELECT t.*
FROM (SELECT *
FROM accelerator_clinic_patient_status
ORDER BY status_id DESC) t
INNER JOIN accelerator_clinic_campaign_patient_signup signup
ON (signup.patient_id = t.patient_id)
INNER JOIN accelerator_clinic_campaign campaign
ON (signup.campaign_id = campaign.campaign_id
AND campaign.clinic_user_id = 4978)
GROUP BY t.patient_id) as X where Rankk=1 and Status = 'Booked'
In Mysql ,Something like this,
SET #rank=0;
SELECT #rank:=#rank+1,*
FROM (accelerator_clinic_patient_status
ORDER BY status_id DESC) t
INNER JOIN accelerator_clinic_campaign_patient_signup signup
ON (signup.patient_id = t.patient_id)
INNER JOIN accelerator_clinic_campaign campaign
ON (signup.campaign_id = campaign.campaign_id
AND campaign.clinic_user_id = 4978)
GROUP BY t.patient_id) X
Group By Patient_Id Order By date_added Desc

Sub query in a inner join

Well, in order to get the last entry by date i've done that query :
select cle
from ( select cle, clepersonnel, datedebut, row_number()
over(partition by clepersonnel order by datedebut desc) as rn
from periodeoccupation) as T
where rn = 1
This one is working and give me the last entry by date, so far i'm ok.
But its the first time i work with subquery and i have to make my query way more complex with multiple joins. But i cannot figure out how to make a subquery in a inner join.
This is what i try :
select personnel.prenom, personnel.nom
from personnel
inner join
( select cle, clepersonnel, datedebut, row_number()
over(partition by clepersonnel order by datedebut desc) as rn
from periodeoccupation) as T
ON personnel.cle = periodeoccupation.clepersonnel
where rn = 1
but its not working !
If you have any idea or tips...Thank you !
Simply change
ON personnel.cle = periodeoccupation.clepersonnel
to
ON personnel.cle = T.clepersonnel
The query join has been aliased with T and you have reference the alias as the table in the aliased query is out of scope in your outer statement.

How to group my table for latest date and ID?

I have a table like this:
I need group this table latest date for every ID.
I mean, I want to get last row every ID. Here is my query:
SELECT DISTINCT ch.Date,ID FROM dbo.tblrisk AS rk
inner join (Select TableIdentity, [Date] from tblCommonHistory ) ch
ON ch.TableIDentity = rk.ID order by ID
How can I do what I want?
EDIT: This query worked for me:
SELECT DISTINCT ch.dt,ID FROM dbo.tblrisk AS rk
inner join (Select TableIdentity, max([Date]) as dt from tblCommonHistory group by TableIdentity) ch ON ch.TableIDentity = rk.ID order by ID
Just use aggregation:
select TableIdentity, max([date])
from tblCommonHistory
group by TableIdentity;
Your question only mentions one table. Your query has two; I don't understand the discrepancy.
It's strange that you have duplicated TableIdentity in tblCommonHistory, but otherwise you should not be getting multiple dates for the same ID from your query.
And also, the only reason to join the 2 tables seems to be that you need to skip those ID that are not present in the tblrisk (is it what you need to do?)
In that case, I'd suggest
SELECT max(ch.Date) AS [Date],ID FROM dbo.tblrisk AS rk
inner join tblCommonHistory AS ch ON ch.TableIDentity = rk.ID
group by ID order by ID