Comments and replies with Postgres - sql

I have the SQL database tables posts, comments, and replies
posts
postid
body
created_at
1
The bucks beat the bills
1/16
2
Soccer tricks and tips
1/17
comments
commentid
postid(references posts.postid)
body
created_at
78
1
Yayyy
1/18
79
1
Booo
1/19
79
2
These tips suck
1/20
replies
replyid
commentid(references comments.commentid)
body
created_at
167
79
I agree
1/21
167
78
yayyyy
1/22
168
79
No they dont
1/23
I want to do 2 things
1
We are GIVEN the postid. For instance, postid=1.
In a single call to the database, I want to query the database to GET:
the post body
the first 2 comments on a post, sorted by created_at
for each of those comments, get the first 2 replies, sorted by created_at
Querying POST, COMMENTS, and REPLIES from the database using SQL would look something like,
const POST =
fetch(
select * from posts
where postid = 1
)
const COMMENTS =
fetch(
select * from comments
where postid = ${POST.postid}
order by created_at
limit 2
)
const REPLIES =
COMMENTS.map((COMMENT) => {
fetch(
select * from replies
where commentid = ${COMMENT.commentid}
order by created_at
limit 2
)
})
How do I write these queries as a SINGLE SQL call to the database?
The returned data should be in a nested form. Something like
const POST = {
postid: 1
body: "...",
comments: [{commentid: 1, body: "..."}, ...]
}
But if you have a different form that is easier, I'm open.
2
In the call above, how do I include the comments and replies aggregates?
For instance, the returned data should look like
const POST = {
// same as before
postid: 1
body: "...",
comments: [{commentid: 1, body: "..."}, ...]
// aggregate for comments
comments_aggregate: 2
}

Your schema is incorrect. You cannot reference column commentid if it is not unique. Assuming you properly make the postid, commentid, replyid the respective PRIMARY KEY of their tables, this method should provide the information you are after.
Few comments inline.
SELECT json_build_object(
'postid', p.postid,
'body', p.body,
'comments', json_agg(
json_build_object(
'commentid', c.commentid,
'body', c.body,
'replies', replies,
'reply_aggregate', replycount
)
),
'comments_aggregate', CommentCount
)
FROM posts p
LEFT OUTER JOIN (
SELECT postid, commentid, body, created_at, CommentCount
FROM (
SELECT *,
/* Count the total number of comments per post */
COUNT(*) OVER (PARTITION BY postid) AS CommentCount,
/* Number the comments of a post to later select the first 2 */
ROW_NUMBER() OVER (PARTITION BY postid ORDER BY created_at) as CommentNumber
FROM Comments
) FirstTwoComments
WHERE CommentNumber <= 2
) c ON p.postid = c.postid
LEFT OUTER JOIN (
SELECT commentid,
ReplyCount,
json_agg(
json_build_object(
'replyid', replyid,
'body', body
)
) AS replies
FROM (
SELECT *,
/* Count the total number of replies per comment */
COUNT(*) OVER (PARTITION BY commentid) AS ReplyCount,
/* Number the replies of a comment to later select the first 2 */
ROW_NUMBER() OVER (PARTITION BY commentid ORDER BY created_at) as CommentNumber
FROM Replies
) FirstTwoReplies
WHERE CommentNumber <= 2
GROUP BY commentid, ReplyCount
) r ON c.commentid = r.commentid
WHERE p.postid = 1
GROUP BY p.postid, p.body, CommentCount

Related

Access Bare Columns w/ Aggregate Function w/o adding to Group By

I have 2 tables in postgres.
users
auth0_id
email
123-A
a#a
123-B
b#b
123-C
c#c
auth0_logs
id
date
user_id
client_name
abc-1
021-10-16T00:18:41.381Z
123-A
example_client
abc-2
...
123-A
example_client
abc-3
...
123-B
example_client
abc-4
...
123-A
example_client
abc-5
...
123-B
example_client
abc-6
...
123-C
example_client
I am trying to get the last login information (a single row in the auth0_logs table based on MAX(auth0_logs.date) ) for for each unique user (auth0_logs.user_id) joined to the users table on user.auth0_id.
[
{
// auth0_logs information
user_id: "123-A",
last_login: "021-10-16T00:18:41.381Z",
client_name: "example_client",
// users information
email: "a#a"
},
{
user_id: "123-B",
last_login: "...",
client_name: "example_client",
email: "b#b"
},
{
user_id: "123-C",
last_login: "...",
client_name: "example_client",
email: "c#c"
}
]
I know this is a problem with "bare" columns not being allowed in queries that use aggregators (without being added to the GROUP BY -- but adding to the GROUP BY returned > 1 row) but I cannot get a solution that works from other SO posts (best post I've found: SQL select only rows with max value on a column). I promise you I have been on this for many hours over the past few days ....
-- EDIT: start --
I have removed my incorrect attempts as to not confuse / misdirect future readers. Please see #MichaelRobellard answer using the WITH clause based on the above information.
-- EDIT: end --
Any help or further research direction would be greatly appreciated!
with user_data as (
select user_id, max(date) from auth0_logs group by user_id
)
select * from user_data
join auth0_logs on user_data.user_id = auth0_logs.user_id
and user_data.date = auth0_logs.date
join users on user_data.user_id = users.auth0_id
with
t as
(
select distinct on (user_id) *
from login_logs
order by user_id, ldate desc
),
tt as
(
select auth0_id user_id, ldate last_login, client_name, email
from t join users on auth0_id = user_id
)
select json_agg(to_json(tt.*)) from tt;
SQL fiddle here.

SQL Query With Max Value from Child Table

Three pertinent tables: tracks (music tracks), users, and follows.
The follows table is a many to many relationship relating users (followers) to users (followees).
I'm looking for this as a final result:
<track_id>, <user_id>, <most popular followee>
The first two columns are simple and result from a relationship between tracks and users. The third is my problem. I can join with the follows table and get all of the followees that each user follows, but how to get only the most followee that has the highest number of follows.
Here are the tables with their pertinent columns:
tracks: id, user_id (fk to users.id), song_title
users: id
follows: followee_id (fk to users.id), follower_id (fk to users.id)
Here's some sample data:
TRACKS
1, 1, Some song title
USERS
1
2
3
4
FOLLOWS
2, 1
3, 1
4, 1
3, 4
4, 2
4, 3
DESIRED RESULT
1, 1, 4
For the desired result, the 3rd field is 4 because as you can see in the FOLLOWS table, user 4 has the most number of followers.
I and a few great minds around me are still scratching our heads.
So I threw this into Linqpad because I'm better with Linq.
Tracks
.Where(t => t.TrackId == 1)
.Select(t => new {
TrackId = t.TrackId,
UserId = t.UserId,
MostPopularFolloweeId = Followers
.GroupBy(f => f.FolloweeId)
.OrderByDescending(g => g.Count())
.FirstOrDefault()
.Key
});
The resulting SQL query was the following (#p0 being the track id):
-- Region Parameters
DECLARE #p0 Int = 1
-- EndRegion
SELECT [t0].[TrackId], [t0].[UserId], (
SELECT [t3].[FolloweeId]
FROM (
SELECT TOP (1) [t2].[FolloweeId]
FROM (
SELECT COUNT(*) AS [value], [t1].[FolloweeId]
FROM [Followers] AS [t1]
GROUP BY [t1].[FolloweeId]
) AS [t2]
ORDER BY [t2].[value] DESC
) AS [t3]
) AS [MostPopularFolloweeId]
FROM [Tracks] AS [t0]
WHERE [t0].[TrackId] = #p0
That outputs the expected response, and should be a start to a cleaner query.
This sounds like an aggregation query with row_number(). I'm a little confused on how all the joins come together:
select t.*
from (select t.id, f.followee_id, count(*) as cnt,
row_number() over (partition by t.id order by count(*) desc) as seqnum
from followers f join
tracks t
on f.follow_id = t.user_id
group by t.id, f.followee_id
) t
where seqnum = 1;

SQL to Paginate Data Where Pagination Starts at a Given Primary Key

Edit: The original example I used had an int for the primary key when in fact my primary key is a var char containing a UUID as a string. I've updated the question below to reflect this.
Caveat: Solution must work on postgres.
Issue: I can easily paginate data when starting from a known page number or index into the list of results to paginate but how can this be done if all I know is the primary key of the row to start from. Example say my table has this data
TABLE: article
======================================
id categories content
--------------------------------------
B7F79F47 local a
6cb80450 local b
563313df local c
9205AE5A local d
E88F7520 national e
5ab669a5 local f
fb047cf6 local g
591c6b50 national h
======================================
Given an article primary key of '9205AE5A' (article.id == '9205AE5A') and categories column must contain 'local' what sql can I use to return a result set that includes the articles either side of this one if it was paginated i.e. the returned result should contain 3 items (previous, current, next articles)
('563313df','local','c'),('9205AE5A','local','d'),('5ab669a5','local','f')
Here is my example setup:
-- setup test table and some dummy data
create table article (
id varchar(36),
categories varchar(256),
content varchar(256)
)
insert into article values
('B7F79F47', 'local', 'a'),
('6cb80450', 'local', 'b'),
('563313df', 'local', 'c'),
('9205AE5A', 'local', 'd'),
('E88F7520', 'national', 'e'),
('5ab669a5', 'local', 'f'),
('fb047cf6', 'local', 'g'),
('591c6b50', 'national', 'h');
I want to paginate the rows in the article table but the starting point I have is the 'id' of an article. In order to provide a "Previous Article" and "Next Article" links on the rendered page I also need the articles that would come either side of this article I know the id of
On the server side I could run my pagination sql and iterate through each result set to find the index of the given item. See the following inefficient pseudo code / sql to do this:
page = 0;
resultsPerPage = 10;
articleIndex = 0;
do {
resultSet = select * from article where categories like '%local%' limit resultsPerPage offset (page * resultsPerPage) order by content;
for (result in resultSet) {
if (result.id == '9205AE5A') {
// we have found the articles index ('articleIndex') in the paginated list.
// Now we can do a normal pagination to return the list of 3 items starting at the article prior to the one found
return select * from article where categories like '%local%' limit 3 offset (articleIndex - 1);
}
articleIndex++;
}
page++;
} while (resultSet.length > 0);
This is horrendously slow if the given article is way down the paginated list. How can this be done without the ugly while+for loops?
Edit 2: I can get the result using two sql calls
SELECT 'CurrentArticle' AS type,* FROM
(
SELECT (ROW_NUMBER() OVER (ORDER BY content ASC)) AS RowNum,*
FROM article
WHERE categories LIKE '%local%'
ORDER BY content ASC
) AS tagCloudArticles
WHERE id='9205AE5A'
ORDER BY content ASC
LIMIT 1 OFFSET 0
From that result returned e.g.
('CurrentArticle', 4, '9205AE5A', 'local', 'd')
I can get the RowNum value (4) and then run the sql again to get RowNum+1 (5) and RowNum-1 (3)
SELECT 'PrevNextArticle' AS type,* FROM
(
SELECT (ROW_NUMBER() OVER (ORDER BY content ASC)) AS RowNum,*
FROM article
WHERE categories LIKE '%local%'
ORDER BY content ASC
) AS tagCloudArticles
WHERE RowNum in (3, 5)
ORDER BY content ASC
LIMIT 2 OFFSET 0
with result
('PrevNextArticle', 3, '563313df', 'local', 'c'),
('PrevNextArticle', 5, '5ab669a5', 'local', 'f')
It would be nice to do this in one efficient sql call though.
If the only information about the surrounding articles shown in the page is "Next" and "Previous" there is no need to get their rows in advance. When the user chooses "Previous" or "Next" use these queries SQL Fiddle
-- Previous
select *
from article
where categories = 'local' and id < 3
order by id desc
limit 1
;
-- Next
select *
from article
where categories = 'local' and id > 3
order by id
limit 1
;
If it is necessary to get information about the previous and next articles: SQL Fiddle
with ordered as (
select
id, content,
row_number() over(order by content) as rn
from article
where categories = 'local'
), rn as (
select rn
from ordered
where id = '9205AE5A'
)
select
o.id,
o.content,
o.rn - rn.rn as rn
from ordered o cross join rn
where o.rn between rn.rn -1 and rn.rn + 1
order by o.rn
The articles will have rn -1, 0, 1, if existent.
Check whether following query solve your issue. passed id as well in filter with category:
SELECT * FROM
(
select (1 + row_number() OVER(Order BY id ASC)) AS RowNo,* from article where categories like '%local%' and id>=3
UNION
(SELECT 1,* FROM article where categories like '%local%' and id<3 ORDER BY id DESC LIMIT 1)
) AS TEMP
WHERE
RowNo between 1 and (1+10-1)
ORDER BY
RowNo
I think this query will yield you the result
(SELECT *, 2 AS ordering from article where categories like '%local%' AND id = 3 LIMIT 1)
UNION
(SELECT *, 1 AS ordering from article where categories like '%local%' AND id < 3 ORDER BY id DESC LIMIT 1 )
UNION
(SELECT *, 3 AS ordering from article where categories like '%local%' AND id > 3 ORDER BY id ASC LIMIT 1 )

SQL query to iterate on a table to calculate the time difference between the 1st and 2nd record that have the same value in one of their fields?

I have a postgresql table storing posts from an online forum. Each post belongs to a thread. I want to calculate the time it takes for the post that starts a thread to get its first response (sometimes a thread never gets a response, so that has to be taken in to consideration)
The posts table has these fields:
post_id, post_timestamp, thread_id
There can be one or more posts per thread_id. This query, for example, returns the first and second post of a thread with id 1234:
select * from posts where thread_id = 1234 order by post_timestamp limit 2
I want to calculate the time difference between first and second post and store it in a separate table with these fields:
thread_id, seconds_between_1s_and_2nd
SELECT (
SELECT post_timestamp
FROM posts
WHERE thread_id = t.id
ORDER BY
post_timestamp
LIMIT 1 OFFSET 1
) -
(
SELECT post_timestamp
FROM posts
WHERE thread_id = t.id
ORDER BY
post_timestamp
LIMIT 1 OFFSET 0
)
FROM threads t
, or, in PostgreSQL 8.4+:
SELECT (
SELECT post_timestamp - LAG(post_timestamp) OVER (ORDER BY post_timestamp)
FROM posts
WHERE thread_id = t.id
ORDER BY
post_timestamp
LIMIT 1 OFFSET 1
)
FROM threads t
To express this in seconds, use EXTRACT(epoch FROM AGE(time1, time2))

Retrieve 2 last posts for each category

Lets say I have 2 tables: blog_posts and categories. Each blog post belongs to only ONE category, so there is basically a foreign key between the 2 tables here.
I would like to retrieve the 2 lasts posts from each category, is it possible to achieve this in a single request?
GROUP BY would group everything and leave me with only one row in each category. But I want 2 of them.
It would be easy to perform 1 + N query (N = number of category). First retrieve the categories. And then retrieve 2 posts from each category.
I believe it would also be quite easy to perform M queries (M = number of posts I want from each category). First query selects the first post for each category (with a group by). Second query retrieves the second post for each category. etc.
I'm just wondering if someone has a better solution for this. I don't really mind doing 1+N queries for that, but for curiosity and general SQL knowledge, it would be appreciated!
Thanks in advance to whom can help me with this.
Check out this MySQL article on how to work with the top N things in arbitrarily complex groupings; it's good stuff. You can try this:
SET #counter = 0;
SET #category = '';
SELECT
*
FROM
(
SELECT
#counter := IF(posts.category = #category, #counter + 1, 0) AS counter,
#category := posts.category,
posts.*
FROM
(
SELECT
*
FROM test
ORDER BY category, date DESC
) posts
) posts
HAVING counter < 2
SELECT p.*
FROM (
SELECT id,
COALESCE(
(
SELECT datetime
FROM posts pi
WHERE pi.category = c.id
ORDER BY
pi.category DESC, pi.datetime DESC, pi.id DESC
LIMIT 1, 1
), '1900-01-01') AS post_datetime,
COALESCE(
(
SELECT id
FROM posts pi
WHERE pi.category = c.id
ORDER BY
pi.category DESC, pi.datetime DESC, pi.id DESC
LIMIT 1, 1
), 0) AS post_id
FROM category c
) q
JOIN posts p
ON p.category <= q.id
AND p.category >= q.id
AND p.datetime >= q.post_datetime
AND (p.datetime, p.id) >= (q.post_datetime, q.post_id)
Make an index on posts (category, datetime, id) for this to be fast.
Note the p.category <= c.id AND p.category >= c.id hack: this makes MySQL to use Range checked for each record which is more index efficient.
See this article in my blog for a similar problem:
MySQL: emulating ROW_NUMBER with multiple ORDER BY conditions