Joining to a table with multiple rows for the join item - sql

I have a table users which has a primary key userid and a datetime column pay_date.
I've also got a table user_actions which references users via the column userid, and a datetime column action_date.
I want to join the two tables together, fetching only the earliest action from the user_actions table which has an action_date later than or equal to pay_date.
I'm trying things like:
select users.userid from users
left join user_actions on user_actions.userid = users.userid
where user_actions.action_date >= users.pay_date
order by user_actions.pay_date
But obviously that returns me multiple rows per user (one for every user action occurring on or after pay_date). Any idea where to go from here?
Apologies for what probably seems like a simple question, I'm fairly new to t-sql.

CROSS APPLY is your friend:
select users.*, t.* from users
CROSS APPLY(SELECT TOP 1 * FROM user_actions WHERE user_actions.userid = users.userid
AND user_actions.action_date >= users.pay_date
order by user_actions.pay_date) AS t

If you have a PRIMARY KEY on user_actions:
SELECT u.*, ua.*
FROM users u
LEFT JOIN
user_actions ua
ON user_actions.id =
(
SELECT TOP 1 id
FROM user_actions uai
WHERE uai.userid = u.userid
AND uai.action_date >= u.pay_date
ORDER BY
uai.action_date
)
If you don't:
WITH j AS
(
SELECT u.*, ua.*, ROW_NUMBER() OVER (PARTITION BY ua.userid ORDER BY ua.action_date) AS rn, ua.action_date
FROM users u
LEFT JOIN
user_actions ua
ON ua.userid = u.userid
AND ua.action_date >= u.pay_date
)
SELECT *
FROM j
WHERE rn = 1 or action_date is null
Update:
CROSS APPLY proposed by #AlexKuznetsov is more elegant and efficient.

select u.*, ua.* from
users u join users_actions ua on u.userid = ua.userid
where
ua.action_date in
(select min(action_date) from user_actions ua1
where
ua1.action_date >= u.pay_date and
u.userid=ua1.userid)

Related

SQL filtering multiple joins by row-specific date window

I am trying to obtain aggregate stats of each customer in their first 60 days. Each user has a different join date, which resides in a user_info table. Currently, the best way I have of doing this is to repeatedly join to the user table each time I need to get aggregate stats from another table, then joining each pair together in a nested subquery. With multiple tables, this query becomes very sluggish and unwieldy. How can I do this in a more parsimonious manner?
My current solution:
SELECT t1.userid
,t1.total_transactions
,t1.days_transact
,t2.total_vouchers
,t2.days_redeemed
FROM (
SELECT u.userid
,SUM(s.transactions) AS total_transactions
,COUNT(DISTINCT s.dated) AS days_transact
FROM (
SELECT userid
,created
FROM schema.user_info
) u
LEFT JOIN (
SELECT userid
,transactions
,dated
FROM schema.transactions
) s
ON u.userid = s.userid
AND s.dated BETWEEN u.created AND DATE_ADD(u.created, 61)
GROUP BY u.userid
) t1
LEFT JOIN (
SELECT u.userid
,SUM(v.vouchers) AS total_vouchers
,COUNT(DISTINCT s.dated) AS days_redeemed
FROM (
SELECT userid
,created
FROM schema.user_info
) u
LEFT JOIN (
SELECT userid
,vouchers
,dated
FROM schema.vouchers
) v
ON u.userid = v.userid
AND v.dated BETWEEN u.created AND DATE_ADD(u.created, 61)
GROUP BY u.userid
) t2
ON t1.userid = t2.userid

How do I find out which users with a specific RoleID that's not been active within a time interval?

This query down below will tell me how many non-active users there's been during a timeframe.
USE Database
SELECT u.*
FROM [dbo].[tbl_Users] u
WHERE NOT EXISTS (SELECT 1
FROM [dbo].[CaseTable] ct
WHERE c.tUserID = u.UserID AND ct.CreationDate between '2019-01-01' and '2019-12-31'
);
And this query below will tell me the users that have the specific role id I'm looking for.
Use Database;
SELECT UserID, DepartmentID, RoleId
FROM tbl_UsersBelongsTo
WHERE RoleID=6
How can I integrate both queries and essentially get what I'm looking for? I presume it's with a JOIN clause but how??
I think you just want join or additional exists:
SELECT u.*
FROM [dbo].[tbl_Users] u
WHERE NOT EXISTS (SELECT 1
FROM [dbo].[CaseTable] ct
WHERE ct.tUserID = u.UserID AND
ct.CreationDate between '2019-01-01' and '2019-12-31'
) AND
EXISTS (SELECT 1
FROM tbl_UsersBelongsTo ubt
WHERE ubt.RoleID = 6 AND ubt.userId = u.userId
);
Please try to use an inner join like below:
SELECT u.*
FROM [dbo].[tbl_Users] u
INNER JOIN
(
SELECT UserID
FROM tbl_UsersBelongsTo
WHERE RoleID=6
) x ON u.UserID = x.UserID
WHERE NOT EXISTS (SELECT 1
FROM [dbo].[CaseTable] ct
WHERE c.tUserID = u.UserID AND ct.CreationDate between '2019-01-01' and '2019-12-31'
);
You can read more about JOINS here.
If I understood the question correctly - you are using 2 different databases and the name of the 2nd database is pisacara. It is possible to join tables from different databases in SQL Server as long as as those databases are on the same server and you use the same credentials for both databases.
Assuming that tbl_Users table has a UserID field as well, the query would look something like this:
SELECT u.*
FROM [1st_database_name].[dbo].[tbl_Users] u
INNER JOIN [piscara].[dbo].[tbl_UsersBelongsTo] a
ON u.UserID = a.UserID
WHERE NOT EXISTS (SELECT 1
FROM [1st_database_name].[dbo].[CaseTable] ct
WHERE c.tUserID = u.UserID
AND ct.CreationDate BETWEEN'2019-01-01' AND'2019-12-31'
)
AND a.RoleID=6;
You can also try putting the 2nd query in the WHERE clause, as a sub-query, like so:
SELECT u.*
FROM [1st_database_name].[dbo].[tbl_Users] u
WHERE NOT EXISTS (SELECT 1
FROM [1st_database_name].[dbo].[CaseTable] ct
WHERE c.tUserID = u.UserID
AND ct.CreationDate BETWEEN'2019-01-01' AND'2019-12-31'
)
AND u.UserID IN (SELECT UserID
FROM [piscara].[dbo].[tbl_UsersBelongsTo]
WHERE RoleID=6);

How to loop through a cte in main query

I am trying to rank users on my system based on the user's totalArticleViews and the user's totalArticles on my system. The ranking should be based on the formula (totalArticleViews + ( totalArticles * 500 )) / 100
I have a system that allows users to post articles, a record is created every time any of these articles are read by anyone. My database has the following tables. users, articles, reads.
I have tried to get the views to insert into the formula, but i'm having issues getting all the users articles and multiplying it by 500 to insert into the formula to rank them all
with article_views AS (
SELECT article_id, COUNT(reads.id) AS views, 1 * 500 AS points
FROM reads
WHERE article_id IN (
SELECT id FROM articles WHERE articles.published_on IS NOT NULL AND
articles.deleted_at IS NULL
)
GROUP BY article_id
),
published AS (
SELECT COUNT(articles.id) AS TotalArticle, COUNT(articles.id) * 500 AS
points
FROM articles
WHERE published_on IS NOT NULL AND deleted_at IS NULL
GROUP BY articles.user_id
)
SELECT
users.id AS user_id,
ROUND((SUM(article_views.views) + () ) / 100.0, 2) AS points,
ROW_NUMBER() OVER (ORDER BY ROUND((SUM(article_views.views) + ()) /
100.0, 2) DESC)
FROM users
LEFT JOIN articles ON users.id = articles.user_id
LEFT JOIN reads ON articles.id = reads.article_id
LEFT JOIN article_views ON reads.article_id = article_views.article_id
WHERE
users.id IN (SELECT user_id FROM role_user WHERE role_id = 2)
AND status = 'ACTIVE'
GROUP BY users.id
ORDER BY points DESC NULLS LAST
I'm stuck at this point
(SUM(article_views.views) + () ) / 100.0, 2)
Simply use the published CTE by including the GROUP BY column user_id in SELECT and then joining published to users by this field in main level query.
WITH article_views AS (
SELECT r.article_id,
COUNT(r.id) AS views,
1 * 500 AS points
FROM reads r
WHERE r.article_id IN (
SELECT id
FROM articles a
WHERE a.published_on IS NOT NULL
AND a.deleted_at IS NULL
)
GROUP BY r.article_id
),
published AS (
SELECT a.user_id,
COUNT(a.id) AS TotalArticle,
COUNT(a.id) * 500 AS points
FROM articles a
WHERE a.published_on IS NOT NULL
AND a.deleted_at IS NULL
GROUP BY a.user_id
)
SELECT u.id AS user_id,
ROUND((SUM(av.views) + (p.TotalArticle)) / 100.0, 2) AS points,
ROW_NUMBER() OVER (ORDER BY ROUND((SUM(av.views) + (p.points))
/ 100.0, 2) DESC) AS rn
FROM users u
LEFT JOIN articles a ON u.id = a.user_id
LEFT JOIN reads r ON a.id = r.article_id
LEFT JOIN article_views av ON r.article_id = av.article_id
LEFT JOIN published p ON u.id = p.user_id
WHERE u.id IN (
SELECT user_id FROM role_user WHERE role_id = 2
)
AND u.status = 'ACTIVE'
GROUP BY u.id
ORDER BY points DESC NULLS LAST

How to get Max column with group by using MS SQL?

What I want to query is "get user's last fatal logs". When I query below statement it only returns "username" and "logDate" fields but I also want to get this "logDate"'s corresponding row(I mean logid, logdata);
SELECT user.username, MAX(log.logDate) FROM user
INNER JOIN log ON user.userid = log.userid
WHERE log.logtype = 'fatal'
GROUP BY user.username
My user table;
userid username
-----------------
1 robert
2 ronaldo
log table;
logid logDate logtype userid logdata
----------------------------------------------------------
1 2016-11-28 19:37:53.000 fatal 1 data
2 2016-11-28 22:37:53.000 fatal 1 data
3 2016-11-28 12:37:53.000 fatal 2 data
I will do this using CROSS APPLY(preferred approach with proper index added to Log table)
SELECT *
FROM [USER] u
CROSS apply (SELECT TOP 1 *
FROM log l
WHERE u.userid = l.userid
AND l.logtype = 'fatal'
ORDER BY l.logDate DESC) cs
If the log table is very large then create a Non Clustered Index on Log table to improve the performance
CREATE NONCLUSTERED INDEX NIX_Log_logtype_userid
ON [log] (logtype,userid)
INCLUDE (logid,logDate,logdata)
Another approach using ROW_NUMBER
SELECT *
FROM (SELECT *,
Row_number()OVER(partition BY [USER].username ORDER BY log.logDate DESC) AS rn
FROM [USER]
INNER JOIN log
ON [USER].userid = log.userid
WHERE log.logtype = 'fatal') A
WHERE rn = 1
Another approach using ROW_NUMBER and TOP 1 with ties
SELECT TOP 1 WITH ties *
FROM [USER]
INNER JOIN log
ON [USER].userid = log.userid
WHERE log.logtype = 'fatal'
ORDER BY Row_number()OVER(partition BY [USER].username ORDER BY log.logDate DESC)
Note : All the queries result all the column from both the tables select the required columns
You can use ROW_NUMBER for this:
SELECT user.username,
log.logid, log.logtype, log.logDate, log.logdata
FROM (
SELECT user.username,
log.logid, log.logtype, log.logDate, log.logdata,
ROW_NUMBER() OVER (PARTITION BY user.username
ORDER BY log.logDate DESC) AS rn
FROM user
INNER JOIN log ON user.userid = log.userid
WHERE log.logtype = 'fatal') AS t
WHERE t.rn = 1
A quick option would be to get the max logdate in a subquery. This way, you can select any fields you need from the user table and don't have to aggregate in the outer query. The only issue with this one is that your logdate needs to not have duplicates. If it's a datetime then this isn't likely but you may have duplicates if it's just a date field. Worth checking.
SELECT
u.username
,u.logdate
,u.logid
,u.logdata
FROM user u
INNER JOIN (SELECT
userid
,MAX(logdate) MaxLog
FROM log
WHERE logtype = 'fatal'
GROUP BY userid) l
ON u.userid = l.userid
AND u.logdate = l.MaxLog
WITH MaxLogDate AS (
SELECT user.userid, MAX(log.logDate) logDate FROM user
INNER JOIN log ON user.userid = log.userid
WHERE log.logtype = 'fatal'
GROUP BY user.userid
)
SELECT log.logid, log.logDate, log.logtype, u.userid, u.username
FROM user u
JOIN MaxLogDate m ON u.userid = m.userid
JOIN log ON log.logDate = m.logDate AND log.userid = m.userid
WHERE log.logtype = 'fatal' --This line is optional, may increase the performance.

How to eliminate 'The multi-part identifier "" could not be bound' error?

I have this query, it's supposed to return results of non validated accounts in a database, that were created after a certain date. I keep getting this error and I'm not sure how to eliminate it. Here is the query:
select count(*) (nolock)
from dbo.[User]
where ID is not null
and UserStatusID!=2
and CreateDateTime>='5/1/2012'
and not exists (select userid from dbo.UserValidation where dbo.[User].UserID=dbo.UserValidation.UserID)
It errors out on the "where dbo.[User].UserID=dbo.UserValidation.UserID" What am I doing wrong here?
Try aliasing the tables:
select count(*) (nolock)
from dbo.[User] u
where ID is not null
and UserStatusID != 2
and CreateDateTime >= '5/1/2012'
and not exists (select uv.userid from dbo.UserValidation uv where u.UserID = uv.UserID)
Without the schema:
select count(*) (nolock)
from [User] u
where ID is not null
and UserStatusID != 2
and CreateDateTime >= '5/1/2012'
and not exists (select uv.userid from UserValidation uv where u.UserID = uv.UserID)
While doing a JOIN it's always better to explicitly qualify all the columns in query like below.
select count(u.userid)
from [User] u
where u.ID is not null
and u.UserStatusID != 2
and u.CreateDateTime >= '5/1/2012'
and not exists
(
select uv.userid
from UserValidation uv
where uv.UserID = u.UserID
)