Related
EDIT:
As requested, our table schema is,
posts:
postid (primary key),
post_text
comments:
commentid (primary key) ,
postid (foreign key referencing posts.postid),
comment_text
replies
replyid (primary key)
commentid (foreign key referencing comments.commentid)
reply_text
I have the tables posts, comments, and replies in a SQL database. (Obviously, a post can have comments, and a comment can have replies)
I want to return a post based on its id, postid.
So I would like a database function has the inputs and outputs,
input:
postid
output:
post = {
postid
post_text
comments: [comment, ...]
}
Where the comment and reply are nested in the post,
comment = {
commentid,
text
replies: [reply, ...]
}
reply = {
replyid
reply_text
}
I have tried using joins, but the returned data is highly redundant, and it seems stupid. For instance, fetching the data from two different replies will give,
postid
post_text
commentid
comment_text
replyid
reply_text
1
POST_TEXT
78
COMMENT_TEXT
14
REPLY1_TEXT
1
POST_TEXT
78
COMMENT_TEXT
15
REPLY2_TEXT
It seems instead I want to make 3 separate queries, in sequence (first to the table posts, then to comments, then to replies)
How do I do this?
The “highly redundant” join result is normally the best way, because it is the natural thing in a relational database. Relational databases aim at avoiding redundancy in data storage, but not in query output. Avoiding that redundancy comes at an extra cost: you have to aggregate the data on the server side, and the client probably has to unpack the nested JSON data again.
Here is some sample code that demonstrates how you could aggregate the results:
SELECT postid, post_text,
jsonb_agg(
jsonb_build_object(
'commentid', commentid,
'comment_text', comment_text,
'replies', replies
)
) AS comments
FROM (SELECT postid, post_text, commentid, comment_text,
jsonb_agg(
jsonb_build_object(
'replyid', replyid,
'reply_text', reply_text
)
) AS replies
FROM /* your join */
GROUP BY postid, post_text, commentid, comment_text) AS q
GROUP BY postid, post_text;
The redundant data stems from a cross join of a post's comments and replies. I.e. for each post you join each comment with each reply. Comment 78 does neither relate to reply 14 nor to reply 15, but merely to the same post.
The typical approach to select the data would hence be three queries:
select * from posts;
select * from comments;
select * from replies;
You can also reduce this to two queries and join the posts table to the comments query, the replies query, or both. This again, will lead to selecting redundant data, but may ease data handling in your app.
If you want to avoid joins, but must avoid database round trips, you can glue query results together:
select *
from
(
select postid as id, postid, 'post' as type, post_text as text from posts
union all
select commentid as id, postid, 'comment' as type, comment_text as text from comments
union all
select replyid as id, postid, 'reply' as type, reply_text as text from replies
) glued
order by postid, type, id;
At last you can create JSON in your DBMS. Again, don't cross join comments with replies, but join the aggregated comments object and the aggregated replies object to the post.
select p.postid, p.post_text, c.comments, r.replies
from posts p
left join
(
select
postid,
jsonb_object_agg(jsonb_build_object('commentid', commentid,
'comment_text', comment_text)
) as comments
from comments
group by postid
) c on c.postid = p.postid
left join
(
select
postid,
jsonb_object_agg(jsonb_build_object('replyid', replyid,
'reply_text', reply_text)
) as replies
from replies
group by postid
) r on r.postid = p.postid;
Your idea to store things in JSON is a good one if you have something to parse it down the line.
As an alternative to the previous answers that involve JSON, you can also get a normal SQL result set (table definition and sample data are below the query):
WITH MyFilter(postid) AS (
VALUES (1),(2) /* rest of your filter */
)
SELECT 'Post' AS PublicationType, postid, NULL AS CommentID, NULL As ReplyToID, post_text
FROM Posts
WHERE postID IN (SELECT postid from MyFilter)
UNION ALL
SELECT CASE ReplyToID WHEN NULL THEN 'Comment' ELSE 'Reply' END, postid, commentid, replyToID, comment_text
FROM Comments
WHERE postid IN (SELECT postid from MyFilter)
ORDER BY postid, CommentID NULLS FIRST, ReplyToID NULLS FIRST
Note: the PublicationType column was added for the sake of clarity. You can alternatively inspect CommentID and ReplyToId and see what is null to determine the type of publication.
This should leave you with very little, if any, redundant data to transfer back to the SQL client.
This approach with UNION ALL will work with 3 tables too (you only have to add 1 UNION ALL) but in your case, I would rather go with a 2-table schema:
CREATE TABLE posts (
postid SERIAL primary key,
post_text text NOT NULL
);
CREATE TABLE comments (
commentid SERIAL primary key,
ReplyToID INTEGER NULL REFERENCES Comments(CommentID) /* ON DELETE CASCADE? */,
postid INTEGER NOT NULL references posts(postid) /* ON DELETE CASCADE? */,
comment_text Text NOT NULL
);
INSERT INTO posts(post_text) VALUES ('Post 1'),('Post 2'),('Post 3');
INSERT INTO Comments(postid, comment_text) VALUES (1, 'Comment 1.1'), (1, 'Comment 1.2'), (2, 'Comment 2.1');
INSERT INTO Comments(replytoId, postid, comment_text) VALUES (1, 1, 'Reply to comment 1.1'), (3, 2, 'Reply to comment 2.1');
This makes 1 fewer table and allows to have level 2 replies (replies to replies), or more, rather than just replies to comments. A recursive query (there are plenty of samples of that on SO) can make it so a reply can always be linked back to the original comment if you want.
Edit: I noticed your comment just a bit late. Of course, no matter what solution you take, there is no need to execute a request to get the replies to each and every comment.
Even with 3 tables, even without JSON, the query to get all the replies for all the comments at once is:
SELECT *
FROM replies
WHERE commentid IN (
SELECT commentid
FROM comments
WHERE postid IN (
/* List your post ids here or nest another SELECT postid FROM posts WHERE ... */
)
)
I have a table that contains all of the posts. I also have a table where a row is added when a user likes a post with foreign keys user_id and post_id.
I want to retrieve a list of ALL of the posts and whether or not a specific user has liked that post. Using an outer join I end up getting some posts twice. Once for user 1 and once for user 2. If I use a WHERE to filter for likes.user_id = 1 AND likes.user_id is NULL I don't get the posts that are only liked by other users.
Ideally I would do this with a single query. SQL isn't my strength, so I'm not even really sure if a sub query is needed or if a join is sufficient.
Apologies for being this vague but I think this is a common enough query that it should make some sense.
EDIT: I have created a DB Fiddle with the two queries that I mentioned. https://www.db-fiddle.com/f/oFM2zWsR9WFKTPJA16U1Tz/4
UPDATE: Figured it out last night. This is what I ended up with:
SELECT
posts.id AS post_id,
posts.title AS post_title,
CASE
WHEN EXISTS (
SELECT *
FROM likes
WHERE posts.id = likes.post_id
AND likes.user_id = 1
) THEN TRUE
ELSE FALSE END
AS liked
FROM posts;
Although I was able to resolve it, thanks to #wildplasser for his answer as well.
Data (I needed to change it a bit, because one should not assign to serials):
CREATE TABLE posts (
id serial,
title varchar
);
CREATE TABLE users (
id serial,
name varchar
);
CREATE TABLE likes (
id serial,
user_id int,
post_id int
);
INSERT INTO posts (title) VALUES ('First Post');
INSERT INTO posts (title) VALUES ('Second Post');
INSERT INTO posts (title) VALUES ('Third Post');
INSERT INTO users (name) VALUES ('Obama');
INSERT INTO users (name) VALUES ('Trump');
INSERT INTO likes (user_id, post_id) VALUES (1, 1);
INSERT INTO likes (user_id, post_id) VALUES (2, 1);
INSERT INTO likes (user_id, post_id) VALUES (2, 2);
-- I want to retrieve a list of ALL of the posts and whether or not a specific user has liked that post
SELECT id, title
, EXISTS(
--EXISTS() yields a boolean value
SELECT *
FROM likes lk
JOIN users u ON u.id = lk.user_id AND lk.post_id=p.id
WHERE u.name ='Obama'
) AS liked_by_Obama
FROM posts p
;
Results:
id | title | liked_by_obama
----+-------------+----------------
1 | First Post | t
2 | Second Post | f
3 | Third Post | f
(3 rows)
As far as I understand, you have two tables such as post table which includes all post from different users and a like table with user.id and post id. if you want to retreive only posts then
select * from posts
if you need user information as well, which is present in user table then you can do below.
select user.user_name, post.postdata from user,post where post.userid=user.userid
in above query, user_name is a column name in user table and postdata is a column in post table.
have a comments table and need to get the first comment date (first inserted record) of each users in the table.
Output will be :
user_id first_comment_date
You can use min() try
SELECT user_id, MIN(comment_date)
FROM commentstable
GROUP BY user_id;
What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN? After all it should behave like mongodb's $all.
Thinking about group conversations where conversation_users is a join table between conversation_id and user_id I have something like this in mind:
WHERE (conversations_users.user_id ALL IN (1,2))
UPDATE 16.07.12
Adding more info about schema and case:
The join-table is rather simple:
Table "public.conversations_users"
Column | Type | Modifiers | Storage | Description
-----------------+---------+-----------+---------+-------------
conversation_id | integer | | plain |
user_id | integer | | plain |
A conversation has many users and a user belongs to many conversations. In order to find all users in a conversation I am using this join table.
In the end I am trying to figure out a ruby on rails scope that find's me a conversation depending on it's participants - e.g.:
scope :between, ->(*users) {
joins(:users).where('conversations_users.user_id all in (?)', users.map(&:id))
}
UPDATE 23.07.12
My question is about finding an exact match of people. Therefore:
Conversation between (1,2,3) won't match if querying for (1,2)
Assuming the join table follows good practice and has a unique compound key defined, i.e. a constraint to prevent duplicate rows, then something like the following simple query should do.
select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2
It's important to note that the number 2 at the end is the length of the list of user_ids. That obviously needs to change if the user_id list changes length. If you can't assume your join table doesn't contain duplicates, change "count(*)" to "count(distinct user_id)" at some possible cost in performance.
This query finds all conversations that include all the specified users even if the conversation also includes additional users.
If you want only conversations with exactly the specified set of users, one approach is to use a nested subquery in the where clause as below. Note, first and last lines are the same as the original query, only the middle two lines are new.
select conversation_id from conversations_users where user_id in (1, 2)
and conversation_id not in
(select conversation_id from conversations_users where user_id not in (1,2))
group by conversation_id having count(*) = 2
Equivalently, you can use a set difference operator if your database supports it. Here is an example in Oracle syntax. (For Postgres or DB2, change the keyword "minus" to "except.)
select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2
minus
select conversation_id from conversations_users where user_id not in (1,2)
A good query optimizer should treat the last two variations identically, but check with your particular database to be sure. For example, the Oracle 11GR2 query plan sorts the two sets of conversation ids before applying the minus operator, but skips the sort step for the last query. So either query plan could be faster depending on multiple factors such as the number of rows, cores, cache, indices etc.
I'm collapsing those users into an array. I'm also using a CTE (the thing in the WITH clause) to make this more readable.
=> select * from conversations_users ;
conversation_id | user_id
-----------------+---------
1 | 1
1 | 2
2 | 1
2 | 3
3 | 1
3 | 2
(6 rows)
=> WITH users_on_conversation AS (
SELECT conversation_id, array_agg(user_id) as users
FROM conversations_users
WHERE user_id in (1, 2) --filter here for performance
GROUP BY conversation_id
)
SELECT * FROM users_on_conversation
WHERE users #> array[1, 2];
conversation_id | users
-----------------+-------
1 | {1,2}
3 | {1,2}
(2 rows)
EDIT (Some resources)
array functions: http://www.postgresql.org/docs/9.1/static/functions-array.html
CTEs: http://www.postgresql.org/docs/9.1/static/queries-with.html
This preserves ActiveRecord objects.
In the below example, I want to know the time sheets which are associated with all codes in the array.
codes = [8,9]
Timesheet.joins(:codes).select('count(*) as count, timesheets.*').
where('codes.id': codes).
group('timesheets.id').
having('count(*) = ?', codes.length)
You should have the full ActiveRecord objects to work with. If you want it to be a true scope, you can just use your above example and pass in the results with .pluck(:id).
While #Alex' answer with IN and count() is probably the simplest solution, I expect this PL/pgSQL function to be the faster:
CREATE OR REPLACE FUNCTION f_conversations_among_users(_user_arr int[])
RETURNS SETOF conversations AS
$BODY$
DECLARE
_sql text := '
SELECT c.*
FROM conversations c';
i int;
BEGIN
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
JOIN conversations_users x' || i || ' USING (conversation_id)';
END LOOP;
_sql := _sql || '
WHERE TRUE';
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
AND x' || i || '.user_id = ' || i;
END LOOP;
/* uncomment for conversations with exact list of users and no more
_sql := _sql || '
AND NOT EXISTS (
SELECT 1
FROM conversations_users u
WHERE u.conversation_id = c.conversation_id
AND u.user_id <> ALL (_user_arr)
)
*/
-- RAISE NOTICE '%', _sql;
RETURN QUERY EXECUTE _sql;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;
Call:
SELECT * FROM f_conversations_among_users('{1,2}')
The function dynamically builds executes a query of the form:
SELECT c.*
FROM conversations c
JOIN conversations_users x1 USING (conversation_id)
JOIN conversations_users x2 USING (conversation_id)
...
WHERE TRUE
AND x1.user_id = 1
AND x2.user_id = 2
...
This form performed best in an extensive test of queries for relational division.
You could also build the query in your app, but I went by the assumption that you want to use one array parameter. Also, this is probably fastest anyway.
Either query requires an index like the following to be fast:
CREATE INDEX conversations_users_user_id_idx ON conversations_users (user_id);
A multi-column primary (or unique) key on (user_id, conversation_id) is just as well, but one on (conversation_id, user_id) (like you may very well have!) would be inferior. You find a short rationale at the link above, or a comprehensive assessment under this related question on dba.SE
I also assume you have a primary key on conversations.conversation_id.
Can you run a performance test with EXPLAIN ANALYZE on #Alex' query and this function and report your findings?
Note that both solutions find conversations where at least the users in the array take part - including conversations with additional users.
If you want to exclude those, un-comment the additional clause in my function (or add it to any other query).
Tell me if you need more explanation on the features of the function.
create a mapping table with all possible values and use this
select
t1.col from conversations_users as t1
inner join mapping_table as map on t1.user_id=map.user_id
group by
t1.col
having
count(distinct conversations_users.user_id)=
(select count(distinct user_id) from mapping)
select id from conversations where not exists(
select * from conversations_users cu
where cu.conversation_id=conversations.id
and cu.user_id not in(1,2,3)
)
this can easily be made into a rails scope.
I am guessing that you don't really want to start messing with temporary tables.
Your question was unclear as to whether you want conversations with exactly the set of users, or conversations with a superset. The following is for the superset:
with users as (select user_id from users where user_id in (<list>)
),
conv as (select conversation_id, user_id
from conversations_users
where user_id in (<list>)
)
select distinct conversation_id
from users u left outer join
conv c
on u.user_id = c.user_id
where c.conversation_id is not null
For this query to work well, it assumes that you have indexes on user_id in both users and conversations_users.
For the exact set . . .
with users as (select user_id from users where user_id in (<list>)
),
conv as (select conversation_id, user_id
from conversations_users
where user_id in (<list>)
)
select distinct conversation_id
from users u full outer join
conv c
on u.user_id = c.user_id
where c.conversation_id is not null and u.user_id is not null
Based on #Alex Blakemore's answer, the equivalent Rails 4 scope on you Conversation class would be:
# Conversations exactly with users array
scope :by_users, -> (users) {
self.by_any_of_users(users)
.group("conversations.id")
.having("COUNT(*) = ?", users.length) -
joins(:conversations_users)
.where("conversations_users.user_id NOT IN (?)", users)
}
# generates an IN clause
scope :by_any_of_users, -> (users) { joins(:conversations_users).where(conversations_users: { user_id: users }).distinct }
Note you can optimize it instead of doing a Rails - (minus) you could do a .where("NOT IN") but that would be really complex to read.
Based on Alex Blakemore answer
select conversation_id
from conversations_users cu
where user_id in (1, 2)
group by conversation_id
having count(distinct user_id) = 2
I have found an alternative query with the same goal, finding the conversation_id of a conversation that contains user_1 and user_2 (ignoring aditional users)
select *
from conversations_users cu1
where 2 = (
select count(distinct user_id)
from conversations_users cu2
where user_id in (1, 2) and cu1.conversation_id = cu2.conversation_id
)
It is slower according the analysis that postgres perform via explain query statement, and i guess that is true because there is more conditions beign evaluated, at least, for each row of the conversations_users the subquery will get executed as it is correlated subquery. The possitive point with this query is that you aren't grouping, thus you can select aditional fields of the conversations_users table. In some situations (like mine) it could be handy.
Would like to figure out how to better retrieve data from database without performance cost.
Plan as follows:
Select id from article table;
store ids in List<int> arr;
find out last article id. int x = arr.Count()
Select * from article_tbl where id = x; Run query.
Post it on you page.
Am I planning right? What is better way of retrieving data from database?
Thanks a lot
Try something like this - you can call it "ad-hoc" or wrap it up in a stored procedure:
-- get the "latest" ID from the "Article" latest
-- but you need to define *latest* by WHAT criteria?? A date?? The ID itself??
DECLARE #LastID INT
SELECT TOP 1 #LastID = ID
FROM dbo.Article
ORDER BY .......... -- order by date? id? what??
-- get the detail data for that ID from the "Article_tbl"
SELECT (list of columns)
FROM dbo.Article_tbl
WHERE ID = #LastID