How to use DISTINCT ON but ORDER BY another expression? - sql

The model Subscription has_many SubscriptionCart.
A SubscriptionCart has a status and an authorized_at date.
I need to pick the cart with the oldest authorized_at date from all the carts associated to a Subscription, and then I have to order all the returned Subscription results by this subscription_carts.authorized_at column.
The query below is working but I can't figure out how to select DISTINCT ON subscription.id to avoid duplicates but ORDER BY subscription_carts.authorized_at .
raw sql query so far:
select distinct on (s.id) s.id as subscription_id, subscription_carts.authorized_at, s.*
from subscriptions s
join subscription_carts subscription_carts on subscription_carts.subscription_id = s.id
and subscription_carts.plan_id = s.plan_id
where subscription_carts.status = 'processed'
and s.status IN ('authorized','in_trial', 'paused')
order by s.id, subscription_carts.authorized_at
If I try to ORDER BY subscription_carts.authorized_at first, I get an error because the DISTINCT ON and ORDER BY expressions must be in the same order.
The solutions I've found seem too complicated for what I need and I've failed to implement them because I don't understand them fully.
Would it be better to GROUP BY subscription_id and then pick from that group instead of using DISTINCT ON? Any help appreciated.

This requirement is necessary to make DISTINCT ON work; to change the final order, you can add an outer query with another ORDER BY clause:
SELECT *
FROM (SELECT DISTINCT ON (s.id)
s.id as subscription_id, subscription_carts.authorized_at, s.*
FROM subscriptions s
JOIN ...
WHERE ...
ORDER BY s.id, subscription_carts.authorized_at
) AS subq
ORDER BY authorized_at;

You don't have to use DISTINCT ON. While it is occasionally useful, I personally find window function based approaches much more clear:
-- Optionally, list all columns explicitly, to remove the rn column again
SELECT *
FROM (
SELECT
s.id AS subscription_id,
c.authorized_at,
s.*,
ROW_NUMBER () OVER (PARTITION BY s.id ORDER BY c.authorized_at) rn
FROM subscriptions s
JOIN subscription_carts c
ON c.subscription_id = s.id
AND c.plan_id = s.plan_id
WHERE c.status = 'processed'
AND s.status IN ('authorized', 'in_trial', 'paused')
) t
WHERE rn = 1
ORDER BY subscription_id, authorized_at

Related

simple SQL subquery

It seems im a complete idiot when it comes to SQL....
All i need is get one value from other table, but there is multiple rows with same customerId on second table.. and i would need to get one with highest timestamp
CREATE OR REPLACE VIEW CUS_SETTINGS as
SELECT
c.id as id,
c.LANG as Language,
c.ALLOWEMAIL as AllowEmail,
l.CONFIRMED as confirmed
FROM cus.CUSTOMER c
????? something with l
/
LEFT JOIN will bring every row so i have multiple duplicate id's
What i need is propably subquery, but i cant get it to work...
(SELECT CONFIRMED FROM settings WHERE ?? c.id == l.id ?? AND MAX(TIMESTAMP) )
i've tried many many variations of joins and subqueries.. but for some reason.. SQL is just
too confusing....
You can use a ROW_NUMBER() in the subquery:
SELECT c.id as id, c.LANG as Language, c.ALLOWEMAIL as AllowEmail,
l.CONFIRMED as confirmed
FROM cus.CUSTOMER c JOIN
(SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY s.timestamp DESC) as seqnum
FROM settings s
) s
ON s.id = c.id AND s.seqnum = 1;
Note: You might want a LEFT JOIN if you want to keep all customers, even those with no settings.
You can use analytic functions. MAX() OVER(PARTITION BY) clause can give you max timestamped id.
Analytic Functions Docs 11gR2
SELECT SECONDD.CONFIRMED
FROM CUSTOMER CU
INNER JOIN
(SELECT
*
FROM (SELECT SECONDD.*,
MAX (S.TIMESTAMP) OVER (PARTITION BY S.ID)
AS MAXTIMESTAMP
FROM SETTINGS SECONDD)
WHERE TIMESTAMP = MAXTIMESTAMP) SECONDD
ON SECONDD.ID = CU.ID
Don't worry; while this sounds very basic, it isn't :-)
The easiest way to get the CONFIRMED for the latest TIMESTAMP in Oracle is KEEP LAST. E.g.:
SELECT customer_id, MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp)
FROM settings
GROUP BY customer_id;
The related CREATE VIEW statement:
CREATE OR REPLACE VIEW cus_settings as
SELECT
c.id AS id,
c.lang AS language,
c.allowemail AS allowemail,
l.last_confirmed AS confirmed
FROM cus.customer c
LEFT JOIN
(
SELECT
customer_id,
MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp) AS last_confirmed
FROM settings
GROUP BY customer_id;
) l ON l.customer_id = c.id;
Or:
CREATE OR REPLACE VIEW cus_settings as
SELECT
c.id AS id,
c.lang AS language,
c.allowemail AS allowemail,
(
SELECT MAX(confirmed) KEEP (DENSE_RANK LAST ORDER BY timestamp)
FROM settings s
WHERE s.customer_id = c.id
) AS confirmed
FROM cus.customer c;

Removing duplicates *after* ordering in a SQL query

I have the following SQL query:
select
id,
name
from
project
inner join job on project.id = job.project_id
where
job.user_id = 'me'
order by
project.modified desc
limit 10
The idea is to get information about the 10 most recently used projects for a given user.
The problem is that this can return duplicates in the case where multiple jobs have the same project. Instead of having duplicates, I want to order all rows by modified desc, remove duplicates based on id and name, then limit to 10.
I've not been able to figure out how to achieve this. Can anyone point me in the right direction?
You are getting duplicates because of the join. As you only want columns from the project table (I assume id and name are from that table), not creating the duplicates in the first place would be better than removing them after the join:
select p.id,
p.name
from project p
where exists (select *
from job job
where job.project_id = p.id
and job.user_id = 'me')
order by p.modified desc
limit 10
try this using row_number - it will work on postgresql
select * from
(select
id,
name,row_number() over(partition by id,name order by modified desc) as rn
from
project inner join job on project.id = job.project_id
where
job.user_id = 'me')a where rn=1 order by modified desc limit 10
Did you try SELECT DISTINCT?
select distinct
id,
name
from
project
inner join job on project.id = job.project_id
where
job.user_id = 'me'
order by
project.modified desc
limit 10
I ended up with the following, which seems to work fine:
select
p.id,
p.name
from
project p
inner join
(
select
j.id,
max(j.modified) as max_modified
from
job j
where
t.user_id = 'me'
group by
j.id
order by
max_modified desc
limit 10
) ids on p.id = ids.id
order by
max_modified desc

How to group my table for latest date and ID?

I have a table like this:
I need group this table latest date for every ID.
I mean, I want to get last row every ID. Here is my query:
SELECT DISTINCT ch.Date,ID FROM dbo.tblrisk AS rk
inner join (Select TableIdentity, [Date] from tblCommonHistory ) ch
ON ch.TableIDentity = rk.ID order by ID
How can I do what I want?
EDIT: This query worked for me:
SELECT DISTINCT ch.dt,ID FROM dbo.tblrisk AS rk
inner join (Select TableIdentity, max([Date]) as dt from tblCommonHistory group by TableIdentity) ch ON ch.TableIDentity = rk.ID order by ID
Just use aggregation:
select TableIdentity, max([date])
from tblCommonHistory
group by TableIdentity;
Your question only mentions one table. Your query has two; I don't understand the discrepancy.
It's strange that you have duplicated TableIdentity in tblCommonHistory, but otherwise you should not be getting multiple dates for the same ID from your query.
And also, the only reason to join the 2 tables seems to be that you need to skip those ID that are not present in the tblrisk (is it what you need to do?)
In that case, I'd suggest
SELECT max(ch.Date) AS [Date],ID FROM dbo.tblrisk AS rk
inner join tblCommonHistory AS ch ON ch.TableIDentity = rk.ID
group by ID order by ID

Find MAX with JOIN where Field also shows up in another Table

I have 3 tables: Master, Paper and iCodes. For a certain set of Master.Ref's, I need to find Max(Paper.Date), where the Paper.Code is also in the iCodes table (i.e., Paper.Code is a type of iCode). Master is joined to Paper by the File field.
EDIT:
I only need the Max(Paper.Date) its corresponding Code; I do not need all of the Codes.
I wrote the following but it is very slow. I have a few hundred ref #'s to look for. What is a better way to do this?
SELECT Master.Ref,
Paper.Code,
mp.MaxDate
FROM ( SELECT p.File ,
MAX(p.Date) AS MaxDate ,
FROM Paper AS p
LEFT JOIN Master AS m ON p.File = m.File
WHERE m.Ref IN ('ref1', 'ref2', 'ref3', 'ref4', 'ref5', 'ref6'... )
AND p.Code IN ( SELECT DISTINCT i.iCode
FROM iCodes AS i
)
GROUP BY p.File
) AS mp
LEFT JOIN Master ON mp.File = Master.File
LEFT JOIN Paper ON Master.File = Paper.File
AND mp.MaxDate = Paper.Date
WHERE Paper.Code IN ( SELECT DISTINCT iCodes.iCode
FROM iCodes
)
Does this do what you want?
SELECT m.Ref, p.Code, max(p.date)
FROM Master m LEFT JOIN
Paper
ON m.File = p.File
WHERE p.Code IN (SELECT DISTINCT iCodes.iCode FROM iCodes) and
m.Ref IN ('ref1','ref2','ref3','ref4','ref5','ref6'...)
GROUP BY m.Ref, p.Code;
EDIT:
To get the code on the max date, then use window functions:
select ref, code, date
from (SELECT m.Ref, p.Code, p.date
row_number() over (partition by m.Ref order by p.date desc) as seqnum
FROM Master m LEFT JOIN
Paper
ON m.File = p.File
WHERE p.Code IN (SELECT DISTINCT iCodes.iCode FROM iCodes) and
m.Ref IN ('ref1','ref2','ref3','ref4','ref5','ref6'...)
) mp
where seqnum = 1;
The function row_number() assigns a sequential number starting at 1 to a group of rows. The groups are defined by the partition by clause, so in this case everything with the same m.Ref value would be in a single group. Within the group, rows are assigned the number based on the order by clause. So, the one with the biggest date gets the value of 1. That is the row you want.

order by count when you have to count only certain items

I have this query:
SELECT tips.*
FROM `tips` `t`
LEFT JOIN tip_usage
ON tip_usage.tip_id = t.id
GROUP BY t.id
ORDER BY COUNT(CASE
WHEN status = 'Active' THEN status
ELSE NULL
END) DESC
As you see here, I use a left join and I count only the records which are Active
Can I make this query to be different, and to lose the case stuff from the count?
If you want to return all tips, regardless of status, and but sort by number of Active records, then this is as pretty as you are going to get.
If you only want to return active tips, then you can add Where status = 'Active' and then just order by Count(t.id) desc.
One alternative is that you have a NumActive int column in the tips table, and you keep this update whenever a new tip_usage record is added or modified for a given tip. This puts more overhead into the insert/delete/update operations for tip_usage, but would make this query much simpler:
select *
from tips
Order by tips.NumActive desc
Another alternative is:
Select tips.*
From tips
Order by (
Select count(tip_id)
From tips_usage as t
Where t.tip_id = tips.id and status = 'Active') DESC
Though this exchanges a case for a subquery, so just complex in a different way.
Quick note, you cannot select t.* and group on t.id. So with that being said:
SELECT t.id,coalesce(tu.cntUsed,0) as cntUsed
FROM `tips` `t`
LEFT
JOIN (Select tip_id,count(*) as cntUsed
from tip_usage
WHERE status='Active'
group by tip_id
) tu
ON t.id = tu.tip_id
ORDER coalesce(tu.cntUsed,0)
Since you want to left-join and include the tips that have no usage, this at least sorts them all at the top with a value of zero, which is the most accurate statement of the reality of what is in the tables.
SELECT tips.*, COUNT(*) AS number
FROM tip_usage
LEFT JOIN tips ON tips.id = tip_id
WHERE STATUS = "Active"
GROUP BY tip_id
ORDER BY number DESC