Get the row with max(timestamp) - sql

I need to select most recently commented articles, with the last comment for each article, i.e. other columns of the row which contains max(c.created):
SELECT a.id, a.title, a.text, max(c.created) AS cid, c.text?
FROM subscriptions s
JOIN articles a ON a.id=s.article_id
JOIN comments c ON c.article_id=a.id
WHERE s.user_id=%d
GROUP BY a.id, a.title, a.text
ORDER BY max(c.created) DESC LIMIT 10;
Postgres tells me that I have to put c.text into GROUP BY. Obviously, I don't want to do this. min/max doesn't fit too. I don't have idea, how to select this.
Please advice.

In PostgreSQL, DISTINCT ON is probably the optimal solution for this kind of query:
SELECT DISTINCT ON (a.id)
a.id, a.title, a.text, c.created, c.text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = %d
ORDER BY a.id, c.created DESC
This retrieve articles with the latest comment and associated additional columns.
Explanation, links and a benchmark in this closely related answer.
To get the latest 10, wrap this in a subquery:
SELECT *
FROM (
SELECT DISTINCT ON (a.id)
a.id, a.title, a.text, c.created, c.text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = 12
ORDER BY a.id, c.created DESC
) x
ORDER BY created DESC
LIMIT 10;
Alternatively, you could use window functions in combination with standard DISTINCT:
SELECT DISTINCT
a.id, a.title, a.text, c.created, c.text
,first_value(c.created) OVER w AS c_created
,first_value(c.text) OVER w AS c_text
FROM subscriptions s
JOIN articles a ON a.id = s.article_id
JOIN comments c ON c.article_id = a.id
WHERE s.user_id = 12
WINDOW w AS (PARTITION BY c.article_id ORDER BY c.created DESC)
ORDER BY c_created DESC
LIMIT 10;
This works, because DISTINCT (unlike aggregate functions) is applied after window functions.
You'd have to test which is faster. I'd guess the last one is slower.

Related

Combine a CROSS JOIN and a LEFT JOIN

I have two tables named author and commit_metrics. Both of them have an id field. Author has author_name and author_email. Commit_metrics has author_id and author_date.
I am trying to write a query that will get the number of commits that each author had in a given week, even if that number is 0. Here's what I have so far:
SELECT a.id, a.author_name, a.author_email, c.week_num, COUNT(c.id)
FROM author AS a
CROSS JOIN generate_series(1, 610) AS s(n)
LEFT JOIN (SELECT c.id,
c.author_id,
c.author_date,
WEEK_NUMBER(c.author_date) AS week_num
FROM commit_metrics c) AS c ON s.n = c.week_num AND a.id = c.author_id
WHERE c.week_num IS NOT NULL
GROUP BY a.id, a.author_name, a.author_email, c.week_num
ORDER BY c.week_num DESC, a.author_name;
WEEK_NUMBER is a function I wrote for this query:
CREATE OR REPLACE FUNCTION WEEK_NUMBER(date TIMESTAMP) RETURNS INTEGER AS
$$
SELECT TRUNC(DATE_PART('day', date - '2008-01-01') / 7)::INTEGER;
$$ LANGUAGE SQL;
Currently, the query works like a charm with one major caveat. It doesn't properly calculate 0 when the author made no commits in a given week. I'm not sure why it doesn't. When I do the query with just the FROM and CROSS JOIN, it properly prints the many thousand combined authors/weeks. However, when I add the LEFT JOIN, it loses any week where the author did not make a commit.
Any help would be greatly appreciated. I'm open to doing away with the generate_series call if it's unnecessary.
Also, I found this post, but I don't think it's helpful for my case.
Although you are using a left join, "WHERE c.week_num IS NOT NULL" filters out all of the cases where there is no post. Try this:
SELECT a.id, a.author_name, a.author_email, s.n as week_num, COUNT(c.id) as post_count
FROM author AS a
CROSS JOIN generate_series(1, 610) AS s(n)
LEFT JOIN (SELECT c.id,
c.author_id,
c.author_date,
WEEK_NUMBER(c.author_date) AS week_num
FROM commit_metrics c) AS c ON s.n = c.week_num AND a.id = c.author_id
GROUP BY a.id, a.author_name, a.author_email, s.n
ORDER BY s.n DESC, a.author_name;
Your WHERE clause is excluding the records on commit_metrics that are null, which is the case when the author has no commits during the week selected. You should just remove this from the WHERE clause to get your desired output.
If you need the WHERE clause to eliminate some of the CROSS JOIN records based on your data, you will need that CROSS JOIN and WHERE to be in a sub-select that you LEFT JOIN to, or create some more complicated logic in the current WHERE clause.
Remove the filtering condition. Also a subquery is not needed and you want to select s.n instead of c.week_num:
SELECT a.id, a.author_name, a.author_email, s.n as week_num, COUNT(c.id)
FROM author a CROSS JOIN
generate_series(1, 610) AS s(n) LEFT JOIN
commit_metrics c
ON s.n = WEEK_NUMBER(c.author_date) AND a.id = c.author_id
GROUP BY a.id, a.author_name, a.author_email, c.week_num
ORDER BY c.week_num DESC, a.author_name;

How can I select the highest post per topic and associate them given this schema design?

I am working on a small forum component for a site and I am creating a page where I want to display each topic along with its highest rated answer. Here are what the tables look like:
POST USER TOPIC
id id id
date name title
text bio date
views
likes
topic_id
author_id
My query looks like so:
select
u.id, u.name, u.bio,
p.id, p.date, p.text, p.views, p.likes,
t.id, t.title, t.date
from
( select p.id, max(p.likes) as likes, p.topic_id
from post as p group by p.topic_id ) as q
inner join post as p on q.id = p.id
inner join topic as t on t.id = q.topic_id
inner join user as u on u.id = p.author_id
order by date desc;
One of the problems I'm having running this is withing "q". Postgresql wont let me run the "q" query because it wants "p.id" to be in the "group by" clause or in an aggregate function. I tried to use "distinct on (p.id)" but I got the same error message: p.id must appear in the GROUP BY clause or be used in an aggregate function.
Without the p.id attribute, I cannot meaningfully link it to the other tables; is there another way of accomplishing this?
;WITH cte AS (
SELECT
u.id AS UserId
,u.name
,u.bio
,p.id AS PostId
,p.[date] AS PostDate
,p.text
,p.views
,p.Likes
,t.id AS TopidId
,t.title
,t.[date] AS TopicDate
,p.Likes
,ROW_NUMBER() OVER (PARTITION BY t.id ORDER BY p.Likes DESC, p.[date] DESC) AS RowNum
,DENSE_RANK() OVER (PARTITION BY t.id ORDER BY p.Likes DESC) AS RankNum
FROM
topic t
INNER JOIN post p
ON t.id = p.topic_id
INNER JOIN [user] u
ON p.author_id = u.id
)
SELECT *
FROM
cte
WHERE
RowNum = 1
;
switch RowNum to RankNum if you want to see ties for most liked
This is a common need: when grouping, show each group's first/last a ranked by some other criteria b. I don't have a name for it, but this seems to be the canonical question. You can see there are a lot of choices! My favorite solution is probably a lateral join:
SELECT u.id, u.name, u.bio,
p.id, p.date, p.text, p.views, p.likes,
t.id, t.title, t.date
FROM topic t
LEFT OUTER JOIN LATERAL (
SELECT *
FROM post
WHERE post.topic_id = t.id
ORDER BY post.likes DESC
LIMIT 1
) p
ON true
LEFT OUTER JOIN "user" u
ON p.author_id = u.id
;
SELECT
u.id AS uid, u.name, u.bio
, p.id AS pid, p."date" AS pdate, p.text, p.views, p.likes
, t.id AS tid, t.title, t."date" AS tdate
FROM post p
JOIN topic t ON t.id = p.topic_id
JOIN user u ON u.id = p.author_id
WHERE NOT EXISTS ( SELECT *
FROM post nx
WHERE nx.topic_id = p.topic_id
AND nx.likes > p.likes)
ORDER BY p."date" DESC
;

Query returning too many results

SQL query that returns expected 29 results for a.id = 366
select a.name, c.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
where a.id = 366
GROUP BY a.name, c.name
order by MAX(b.renew_date), MAX(b.date) desc;
SQL code below that returns 34 results, multiple results where two different Provides supplied the same course. I know these extra results are because I added e.name to the list to be returned. But all that is needed is the 29 entries with the latest date and Providers names.
select a.name, c.name, e.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
GROUP BY a.name, c.name, e.name
order by MAX(b.renew_date), MAX(b.date) desc;
Can anyone rework this code to return a single DISTINCT Provider name with the MAX(renew_date) for each course.
This returns exactly one row per distinct combination of (a.name, c.name):
The one with the latest renew_date.
Among these, the one with the latest date (may differ from global max(date)!).
Among these, the one with the alphabetically first e.name:
SELECT DISTINCT ON (a.name, c.name)
a.name AS a_name, c.name AS c_name, e.name AS e_name
, b.renew_date, b.date
FROM boson_course c
JOIN boson_coursedetail b on c.id = b.course_id
JOIN boson_coursedetail_attendance d on d.coursedetail_id = b.id
JOIN boson_employee a on a.id = d.employee_id
JOIN boson_provider e on b.provider_id = e.id
WHERE a.id = 366
ORDER BY a.name, c.name
, b.renew_date DESC NULLS LAST
, b.date DESC NULLS LAST
, e.name;
The result is sorted by a_name, c_name first. If you need your original sort order, wrap this in a subquery:
SELECT *
FROM (<query from above>) sub
ORDER BY renew_date DESC NULLS LAST
, date DESC NULLS LAST
, a_name, c_name, e_name;
Explanation for DISTINCT ON:
Select first row in each GROUP BY group?
Why DESC NULL LAST?
PostgreSQL sort by datetime asc, null first?
Aside: Don't use basic type names like date ad column names. Also, name is hardly ever a good name. As you can see, we have to use aliases to make this query useful. Some general advice on naming conventions:
How to implement a many-to-many relationship in PostgreSQL?
Try using distinct on:
select distinct on (a.name, c.name, e.name), a.name, c.name, e.name,
B.date, b.renew_date as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
ORDER BY a.name, c.name, e.name, B.date desc
order by MAX(b.renew_date), MAX(b.date) desc;

Problems with SQL Inner join

Having some problems while trying to optimize my SQL.
I got 2 tables like this:
Names
id, analyseid, name
Analyses
id, date, analyseid.
I want to get the newest analyse from Analyses (ordered by date) for every name (they are unique) in Names. I can't really see how to do this without using 2 x nested selects.
My try (Dont get confused about the names. It's the same principle):
SELECT
B.id,
B.chosendatetime,
vStockNames.name
FROM
vStockNames
INNER JOIN
(
SELECT TOP 1
vAnalysesHistory.id,
vAnalysesHistory.chosendatetime,
vAnalysesHistory.companyid
FROM
vAnalysesHistory
ORDER BY
vAnalysesHistory.chosendatetime DESC
) AS B
ON
B.companyid = vStockNames.stockid
In my example the problem is that i only get 1 row returned (because of top 1). But if I exclude this, I can get multiple analyses of the same name.
Can you help me ? - THanks in advance.
SQL Server 2000+:
SELECT (SELECT TOP 1
a.id
FROM vAnalysesHistory AS a
WHERE a.companyid = n.stockid
ORDER BY a.chosendatetime DESC) AS id,
n.name,
(SELECT TOP 1
a.chosendatetime
FROM vAnalysesHistory AS a
WHERE a.companyid = n.stockid
ORDER BY a.chosendatetime DESC) AS chosendatetime
FROM vStockNames AS n
SQL Server 2005+, using CTE:
WITH cte AS (
SELECT a.id,
a.date,
a.analyseid,
ROW_NUMBER() OVER(PARTITION BY a.analyseid
ORDER BY a.date DESC) AS rk
FROM ANALYSES a)
SELECT n.id,
n.name,
c.date
FROM NAMES n
JOIN cte c ON c.analyseid = n.analyseid
AND c.rk = 1
...without CTE:
SELECT n.id,
n.name,
c.date
FROM NAMES n
JOIN (SELECT a.id,
a.date,
a.analyseid,
ROW_NUMBER() OVER(PARTITION BY a.analyseid
ORDER BY a.date DESC) AS rk
FROM ANALYSES a) c ON c.analyseid = n.analyseid
AND c.rk = 1
You're only asking for the TOP 1, so that's all you're getting. If you want one per companyId, you need to specify that in the SELECT on vAnalysesHistory. Of course, JOINs must be constant and do not allow this. Fortunately, CROSS APPLY comes to the rescue in cases like this.
SELECT
B.id,
B.chosendatetime,
vStockNames.name
FROM
vStockNames
CROSS APPLY
(
SELECT TOP 1
vAnalysesHistory.id,
vAnalysesHistory.chosendatetime,
vAnalysesHistory.companyid
FROM
vAnalysesHistory
WHERE companyid = vStockNames.stockid
ORDER BY
vAnalysesHistory.chosendatetime DESC
) AS B
You could also use ROW_NUMBER() to do the same:
SELECT
B.id,
B.chosendatetime,
vStockNames.name
FROM
vStockNames
INNER JOIN
(
SELECT
vAnalysesHistory.id,
vAnalysesHistory.chosendatetime,
vAnalysesHistory.companyid,
ROW_NUMBER() OVER (PARTITION BY companyid ORDER BY chosendatetime DESC) AS row
FROM
vAnalysesHistory
) AS B
ON
B.companyid = vStockNames.stockid AND b.row = 1
Personally I'm a fan of the first approach. It will likely be faster and is easier to read IMO.
Will something like this work for you?
;with RankedAnalysesHistory as
(
SELECT
vah.id,
vah.chosendatetime,
vah.companyid
,rank() over (partition by vah.companyid order by vah.chosendatetime desc) rnk
FROM
vAnalysesHistory vah
)
SELECT
B.id,
B.chosendatetime,
vsn.name
FROM
vStockNames vsn
join RankedAnalysesHistory as rah on rah.companyid = vsn.stockid and vah.rnk = 1
It seems to me that you only need SQL-92 for this. Of course, explicit documentation of the joining columns between the tables would help.
Simple names
SELECT B.ID, C.ChosenDate, N.Name
FROM (SELECT A.AnalyseID, MAX(A.Date) AS ChosenDate
FROM Analyses AS A
GROUP BY A.AnalyseID) AS C
JOIN Analyses AS B ON C.AnalyseID = B.AnalyseID AND C.ChosenDate = B.Date
JOIN Names AS N ON N.AnalyseID = C.AnalyseID
The sub-select generates the latest analysis for each company; the join with Analyses picks up the Analyse.ID value corresponding to that latest analysis, and the join with Names picks up the company name. (The C.ChosenDate in the select-list could be replaced by B.Date AS ChosenDate, of course.)
Complicated names
SELECT B.ID, C.ChosenDateTime, N.Name
FROM (SELECT A.CompanyID, MAX(A.ChosenDateTime) AS ChosenDateTime
FROM vAnalysesHistory AS A
GROUP BY A.CompanyID) AS C
JOIN vAnalysesHistory AS B ON C.CompanyID = B.CompanyID
AND C.ChosenDateTime = B.ChosenDateTime
JOIN vStockNames AS N ON N.AnalyseID = C.AnalyseID
Same query with systematic renaming (and slightly different layout to avoid horizontal scrollbars).

Simple SQL question about getting rows and associated counts

this oughta be an easy one.
My question is very similar to this one; basically, I've got a table of posts, a table of comments with a foreign key for the post_id, and a table of votes with a foreign key for the post id. I'd like to do a single query and get back a result set containing one row per post, along with the count of associated comments and votes.
From the question I've linked to above, it seems that for getting a table back containing just a row for each post and a comment count, this is the right approach:
SELECT a.ID, a.Title, COUNT(c.ID) AS NumComments
FROM Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
GROUP BY a.ID, a.Title
I thought adding vote count would be as easy as adding another left join, as in
SELECT a.ID, a.Title, COUNT(c.ID) AS NumComments, COUNT(v.id AS NumVotes)
FROM Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
LEFT JOIN Votes v ON v.ParentID = a.ID
GROUP BY a.ID, a.Title
but I'm getting bad numbers back. What am I missing?
SELECT
a.ID,
a.Title,
COUNT(DISTINCT c.ID) AS NumComments,
COUNT(DISTINCT v.id) AS NumVotes
FROM
Articles a
LEFT JOIN Comments c ON c.ParentID = a.ID
LEFT JOIN Votes v ON v.ParentID = a.ID
GROUP BY
a.ID,
a.Title
SELECT id, title,
(
SELECT COUNT(*)
FROM comments c
WHERE c.ParentID = a.ID
) AS NumComments,
(
SELECT COUNT(*)
FROM votes v
WHERE v.ParentID = a.ID
) AS NumVotes
FROM articles a
try:
COUNT(DISTINCT c.ID) AS NumComments
You are thinking in trees, not recordsets.
In the recordset the you get each Comment and each Vote returned multiple times combined with each other. Run the query without the group by and the count to see what I mean.
The solution is simple: use COUNT(DISCTINCT c.ID) and COUNT(DISTINCT v.ID)