Multiple joined tables and conditions - sql

I have the following tables structure:
users
- id int(PK)
- role varchar(20)
questions
- id int (PK)
- status varchar(20)
answers
- id int (PK)
- question_id int (refs questions id)
- user_id int (refs users id)
- created_at timestamp
My goal would be to get those questions where the status is 'opened', the last (based on created at) answer is made by a user whose role is admin and the last answer made by a user (non-admin) was at least 1 week ago.
I came up with a veeery long query, most of it works, but the problem is that if there is an answer by a user that is not the last but was at least 1 week ago, it triggers as well.
I'm also opened to simplify my query to remove that bunch of inner joins...
select distinct q.* from questions q
inner join answers a on a.question_id = q.id
inner join answers a2 on a.question_id = q.id
inner join users u ON u.id = a.user_id
WHERE q.status = 'opened' AND u.role = 'admin'
and a.id in (select a3.id from answers a3 inner join questions q2 on q.id = a3.ticket_id inner join users u on u.id = a3.user_id where u."role" = 'admin' and a3.created_at = (select max(a3.created_at) from "answers" a3 where a3.question_id = q2.id))
and a2.id in (select a.id from "answers" a
inner join users u on u.id = a.user_id
where u."role" = 'user'
and a.created_at < (SELECT now() - interval '1 week'))

You are joining the answers table twice without filtering it. You can solve this by using subqueries, something like this:
SELECT questions.*
FROM questions
JOIN (
SELECT question_id, MAX(created_at) AS max_created_at
FROM answers
WHERE user_id IN (SELECT id FROM users WHERE role = 'admin')
GROUP BY question_id
) admin_answers ON questions.id = admin_answers.question_id
JOIN (
SELECT question_id, MAX(created_at) AS max_created_at
FROM answers
WHERE user_id NOT IN (SELECT id FROM users WHERE role = 'admin')
GROUP BY question_id
) nonadmin_answers ON questions.id = nonadmin_answers.question_id
WHERE questions.status = 'opened'
AND admin_answers.max_created_at = (SELECT MAX(created_at) FROM answers WHERE question_id = questions.id)
AND nonadmin_answers.max_created_at <= NOW() - INTERVAL '1 week'
If not working give me some sample data for the tables and the expected result and will check it

Related

How do I find out which users with a specific RoleID that's not been active within a time interval?

This query down below will tell me how many non-active users there's been during a timeframe.
USE Database
SELECT u.*
FROM [dbo].[tbl_Users] u
WHERE NOT EXISTS (SELECT 1
FROM [dbo].[CaseTable] ct
WHERE c.tUserID = u.UserID AND ct.CreationDate between '2019-01-01' and '2019-12-31'
);
And this query below will tell me the users that have the specific role id I'm looking for.
Use Database;
SELECT UserID, DepartmentID, RoleId
FROM tbl_UsersBelongsTo
WHERE RoleID=6
How can I integrate both queries and essentially get what I'm looking for? I presume it's with a JOIN clause but how??
I think you just want join or additional exists:
SELECT u.*
FROM [dbo].[tbl_Users] u
WHERE NOT EXISTS (SELECT 1
FROM [dbo].[CaseTable] ct
WHERE ct.tUserID = u.UserID AND
ct.CreationDate between '2019-01-01' and '2019-12-31'
) AND
EXISTS (SELECT 1
FROM tbl_UsersBelongsTo ubt
WHERE ubt.RoleID = 6 AND ubt.userId = u.userId
);
Please try to use an inner join like below:
SELECT u.*
FROM [dbo].[tbl_Users] u
INNER JOIN
(
SELECT UserID
FROM tbl_UsersBelongsTo
WHERE RoleID=6
) x ON u.UserID = x.UserID
WHERE NOT EXISTS (SELECT 1
FROM [dbo].[CaseTable] ct
WHERE c.tUserID = u.UserID AND ct.CreationDate between '2019-01-01' and '2019-12-31'
);
You can read more about JOINS here.
If I understood the question correctly - you are using 2 different databases and the name of the 2nd database is pisacara. It is possible to join tables from different databases in SQL Server as long as as those databases are on the same server and you use the same credentials for both databases.
Assuming that tbl_Users table has a UserID field as well, the query would look something like this:
SELECT u.*
FROM [1st_database_name].[dbo].[tbl_Users] u
INNER JOIN [piscara].[dbo].[tbl_UsersBelongsTo] a
ON u.UserID = a.UserID
WHERE NOT EXISTS (SELECT 1
FROM [1st_database_name].[dbo].[CaseTable] ct
WHERE c.tUserID = u.UserID
AND ct.CreationDate BETWEEN'2019-01-01' AND'2019-12-31'
)
AND a.RoleID=6;
You can also try putting the 2nd query in the WHERE clause, as a sub-query, like so:
SELECT u.*
FROM [1st_database_name].[dbo].[tbl_Users] u
WHERE NOT EXISTS (SELECT 1
FROM [1st_database_name].[dbo].[CaseTable] ct
WHERE c.tUserID = u.UserID
AND ct.CreationDate BETWEEN'2019-01-01' AND'2019-12-31'
)
AND u.UserID IN (SELECT UserID
FROM [piscara].[dbo].[tbl_UsersBelongsTo]
WHERE RoleID=6);

How to loop through a cte in main query

I am trying to rank users on my system based on the user's totalArticleViews and the user's totalArticles on my system. The ranking should be based on the formula (totalArticleViews + ( totalArticles * 500 )) / 100
I have a system that allows users to post articles, a record is created every time any of these articles are read by anyone. My database has the following tables. users, articles, reads.
I have tried to get the views to insert into the formula, but i'm having issues getting all the users articles and multiplying it by 500 to insert into the formula to rank them all
with article_views AS (
SELECT article_id, COUNT(reads.id) AS views, 1 * 500 AS points
FROM reads
WHERE article_id IN (
SELECT id FROM articles WHERE articles.published_on IS NOT NULL AND
articles.deleted_at IS NULL
)
GROUP BY article_id
),
published AS (
SELECT COUNT(articles.id) AS TotalArticle, COUNT(articles.id) * 500 AS
points
FROM articles
WHERE published_on IS NOT NULL AND deleted_at IS NULL
GROUP BY articles.user_id
)
SELECT
users.id AS user_id,
ROUND((SUM(article_views.views) + () ) / 100.0, 2) AS points,
ROW_NUMBER() OVER (ORDER BY ROUND((SUM(article_views.views) + ()) /
100.0, 2) DESC)
FROM users
LEFT JOIN articles ON users.id = articles.user_id
LEFT JOIN reads ON articles.id = reads.article_id
LEFT JOIN article_views ON reads.article_id = article_views.article_id
WHERE
users.id IN (SELECT user_id FROM role_user WHERE role_id = 2)
AND status = 'ACTIVE'
GROUP BY users.id
ORDER BY points DESC NULLS LAST
I'm stuck at this point
(SUM(article_views.views) + () ) / 100.0, 2)
Simply use the published CTE by including the GROUP BY column user_id in SELECT and then joining published to users by this field in main level query.
WITH article_views AS (
SELECT r.article_id,
COUNT(r.id) AS views,
1 * 500 AS points
FROM reads r
WHERE r.article_id IN (
SELECT id
FROM articles a
WHERE a.published_on IS NOT NULL
AND a.deleted_at IS NULL
)
GROUP BY r.article_id
),
published AS (
SELECT a.user_id,
COUNT(a.id) AS TotalArticle,
COUNT(a.id) * 500 AS points
FROM articles a
WHERE a.published_on IS NOT NULL
AND a.deleted_at IS NULL
GROUP BY a.user_id
)
SELECT u.id AS user_id,
ROUND((SUM(av.views) + (p.TotalArticle)) / 100.0, 2) AS points,
ROW_NUMBER() OVER (ORDER BY ROUND((SUM(av.views) + (p.points))
/ 100.0, 2) DESC) AS rn
FROM users u
LEFT JOIN articles a ON u.id = a.user_id
LEFT JOIN reads r ON a.id = r.article_id
LEFT JOIN article_views av ON r.article_id = av.article_id
LEFT JOIN published p ON u.id = p.user_id
WHERE u.id IN (
SELECT user_id FROM role_user WHERE role_id = 2
)
AND u.status = 'ACTIVE'
GROUP BY u.id
ORDER BY points DESC NULLS LAST

How to optimize multiple subqueries to the same data set

Imagine I have a query like the following one:
SELECT
u.ID,
( SELECT
COUNT(*)
FROM
POSTS p
WHERE
p.USER_ID = u.ID
AND p.TYPE = 1
) AS interesting_posts,
( SELECT
COUNT(*)
FROM
POSTS p
WHERE
p.USER_ID = u.ID
AND p.TYPE = 2
) AS boring_posts,
( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
AND c.TYPE = 1
) AS interesting_comments,
( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
AND c.TYPE = 2
) AS boring_comments
FROM
USERS u;
( Hopefully it's correct because I just came up with it and didn't test it )
where I try to calculate the number of interesting and boring posts and comments that the user has.
Now, the problem with this query is that we have 2 sequential scans on both the posts and comments table and I wonder if there is a way to avoid that?
I could probably LEFT JOIN both posts and comments to the users table and do some aggregation but it's gonna generate a lot of rows before aggregation and I am not sure if that's a good way to go.
Aggregate posts and comments and outer join them to the users table.
select
u.id as user_id,
coaleasce(p.interesting, 0) as interesting_posts,
coaleasce(p.boring, 0) as boring_posts,
coaleasce(c.interesting, 0) as interesting_comments,
coaleasce(c.boring, 0) as boring_comments
from users u
left join
(
select
user_id,
count(case when type = 1 then 1 end) as interesting,
count(case when type = 2 then 1 end) as boring
from posts
group by user_id
) p on p.user_id = u.id
left join
(
select
user_id,
count(case when type = 1 then 1 end) as interesting,
count(case when type = 2 then 1 end) as boring
from comments
group by user_id
) c on c.user_id = u.id;
compare results and execution plan (here you scan posts once):
with c as (
select distinct
count(1) filter (where TYPE = 1) over (partition by USER_ID) interesting_posts
, count(1) filter (where TYPE = 2) over (partition by USER_ID) boring_posts
, USER_ID
)
, p as (select USER_ID,max(interesting_posts) interesting_posts, max(boring_posts) boring_posts from c)
SELECT
u.ID, interesting_posts,boring_posts
, ( SELECT
COUNT(*)
FROM
COMMENTS c
WHERE
c.USER_ID = u.ID
) AS comments
FROM
USERS u
JOIN p on p.USER_ID = u.ID

Complex SQL query

I have the these tables:
- Users
- id
- Photos
- id
- user_id
- Classifications
- id
- user_id
- photo_id
I would like to order Users by the total number of Photos + Classifications which they own.
I wrote this query:
SELECT users.id,
COUNT(photos.id) AS n_photo,
COUNT(classifications.id) AS n_classifications,
(COUNT(photos.id) + COUNT(classifications.id)) AS n_sum
FROM users
LEFT JOIN photos ON (photos.user_id = users.id)
LEFT JOIN classifications ON (classifications.user_id = users.id)
GROUP BY users.id
ORDER BY (COUNT(photos.id) + COUNT(classifications.id)) DESC
The problem is that this query does not work as I expect and returns high numbers while I have only a few photos and classifications in the db. It returns something like this:
id n_photo n_classifications n_sum
29 19241 19241 38482
16 16905 16905 33810
1 431 0 431
...
You are missing distinct.
SELECT U.ID, COUNT(DISTINCT P.Id)+COUNT(DISTINCT C.Id) Count
FROM User U
LEFT JOIN Photos P ON P.User_Id=U.Id
LEFT JOIN Classifications C ON C.User_Id=U.Id
GROUP BY U.Id
ORDER BY COUNT(DISTINCT P.Id)+COUNT(DISTINCT C.ID)
I could be misinterpreting your schema, but shouldn't this:
LEFT JOIN classifications ON (classifications.user_id = users.id)
Be this:
LEFT JOIN classifications ON (classifications.user_id = users.id)
AND (classifications.photo_id = photos.id)
?
SELECT users1.id, users1.n_photo, users2.n_classifications
FROM (
SELECT users.id, COUNT(photos.id) AS n_photo
FROM users LEFT OUTER JOIN photos ON photos.user_id = users.id
GROUP BY users.id
) users1
INNER JOIN (
SELECT users.id, COUNT(classifications.id) AS n_classifications
FROM users LEFT OUTER JOIN classifications ON classifications.user_id = users.id
GROUP BY users.id
) users2 ON users1.id = users1.id
Try something more like this instead:
SELECT users.id as n_id,
(SELECT COUNT(photos.id) FROM photos WHERE photos.user_id = n_id) AS n_photos,
(SELECT COUNT(classifications,id) FROM classifications WHERE classifications.user_id = n_id) AS n_classifications,
(n_photos + n_classifications) AS n_sum
FROM users
GROUP BY n_id
ORDER BY n_sum DESC

Joining to a table with multiple rows for the join item

I have a table users which has a primary key userid and a datetime column pay_date.
I've also got a table user_actions which references users via the column userid, and a datetime column action_date.
I want to join the two tables together, fetching only the earliest action from the user_actions table which has an action_date later than or equal to pay_date.
I'm trying things like:
select users.userid from users
left join user_actions on user_actions.userid = users.userid
where user_actions.action_date >= users.pay_date
order by user_actions.pay_date
But obviously that returns me multiple rows per user (one for every user action occurring on or after pay_date). Any idea where to go from here?
Apologies for what probably seems like a simple question, I'm fairly new to t-sql.
CROSS APPLY is your friend:
select users.*, t.* from users
CROSS APPLY(SELECT TOP 1 * FROM user_actions WHERE user_actions.userid = users.userid
AND user_actions.action_date >= users.pay_date
order by user_actions.pay_date) AS t
If you have a PRIMARY KEY on user_actions:
SELECT u.*, ua.*
FROM users u
LEFT JOIN
user_actions ua
ON user_actions.id =
(
SELECT TOP 1 id
FROM user_actions uai
WHERE uai.userid = u.userid
AND uai.action_date >= u.pay_date
ORDER BY
uai.action_date
)
If you don't:
WITH j AS
(
SELECT u.*, ua.*, ROW_NUMBER() OVER (PARTITION BY ua.userid ORDER BY ua.action_date) AS rn, ua.action_date
FROM users u
LEFT JOIN
user_actions ua
ON ua.userid = u.userid
AND ua.action_date >= u.pay_date
)
SELECT *
FROM j
WHERE rn = 1 or action_date is null
Update:
CROSS APPLY proposed by #AlexKuznetsov is more elegant and efficient.
select u.*, ua.* from
users u join users_actions ua on u.userid = ua.userid
where
ua.action_date in
(select min(action_date) from user_actions ua1
where
ua1.action_date >= u.pay_date and
u.userid=ua1.userid)