Select count Distinct based on combination of two columns - sql

I have the below query that gets counts of distinct USERID's from the various tables and sums them to a grand total. I am expecting a total of 35 as the results, however I am only getting 30 as a result from this query. What it appears to be doing is when it finds the same USERID in more than one row in any table, it is counting them only once (It is fine that USERID's appear more than once in a table based on how it was structured).
I would like to get Distinct values based on the combination of USERID and EXAM_DT, as this combination will satisfy the uniqueness I need.
SQL:
SELECT 'TOTAL', '', COUNT (DISTINCT G.USERID) + COUNT (DISTINCT H.USERID) +
COUNT (DISTINCT J.USERID) + COUNT (DISTINCT M.USERID) + COUNT (DISTINCT
P.USERID) + COUNT(DISTINCT S.USERID) + COUNT (DISTINCT V.USERID) + COUNT (
DISTINCT Y.USERID)
FROM PS_JOB F INNER JOIN PS_EMPLMT_SRCH_QRY F1 ON (F.USERID =
F1.USERID AND F.EMPL_RCD = F1.EMPL_RCD )
LEFT OUTER JOIN PS_GHS_HS_ANN_EXAM G ON F.USERID = G.USERID AND G.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_ANTINEO H ON F.USERID = H.USERID AND H.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_AUDIO J ON F.USERID = J.USERID AND J.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_DOT M ON F.USERID = M.USERID AND M.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_HAZMAT P ON F.USERID = P.USERID AND P.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_PREPLACE S ON F.USERID = S.USERID AND S.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GH_RESP_FIT V ON F.USERID = V.USERID AND V.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_ASBESTOS Y ON F.USERID = Y.USERID AND Y.USERID =
F.USERID
WHERE ( ( F.EFFDT =
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED
WHERE F.USERID = F_ED.USERID
AND F.EMPL_RCD = F_ED.EMPL_RCD
AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))
AND F.EFFSEQ =
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES
WHERE F.USERID = F_ES.USERID
AND F.EMPL_RCD = F_ES.EMPL_RCD
AND F.EFFDT = F_ES.EFFDT) ))
My results:
(No column name) (No column name) (No column name)
TOTAL 30
Here is an example from one of the tables in the query that contains the USERID 816455 twice, but only counting (in above query) one distinct occurrence of it (when I need the distinct to be based on the combination of USERID and EXAM_DT)
USERID USER_RCD EXAM_DT EXAM_TYPE_CD EXPIRE_DT
001 0 2018-04-17 ANN 2019-04-17
03 0 2018-04-03 ANN 2019-04-27
816455 0 2018-03-02 ANN 2018-03-31
816455 0 2018-03-26 ANN 2018-06-30
410908 0 2018-03-05 ANN 2019-05-30
I would like to avoid having to use subqueries to do the aggregation on the joins if possible as I need to add the sql to a tool that doesn't support that use. Any help is appreciated!
EDIT:
As LukStorms suggested I tried "Method 1" from his answer as follows:
SELECT count (distinct concat(G.USERID, G.EXAM_DT))
+ count (distinct concat(H.USERID, H.EXAM_DT)) + count (distinct
concat(J.USERID, J.EXAM_DT)) + count (distinct concat(M.USERID, M.EXAM_DT))
+ count (distinct concat(P.USERID, P.EXAM_DT)) + count (distinct
concat(S.USERID, S.EXAM_DT)) + count (distinct concat(V.USERID, V.EXAM_DT))
+ count (distinct concat(Y.USERID, Y.EXAM_DT)) AS 'Total_Unique'
FROM PS_JOB F
LEFT OUTER JOIN PS_GHS_HS_ANN_EXAM H ON F.USERID = H.USERID AND
H.EMPL_RCD = F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_ANTINEO G ON F.USERID = G.USERID AND G.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_AUDIO J ON F.USERID = J.USERID AND J.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_DOT M ON F.USERID = M.USERID AND M.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_HAZMAT P ON F.USERID = P.USERID AND P.EMPL_RCD
= F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_PREPLACE S ON F.USERID = S.USERID AND S
.EMPL_RCD = F.EMPL_RCD
LEFT OUTER JOIN PS_GH_RESP_FIT V ON F.USERID = V.USERID AND V.EMPL_RCD =
F.EMPL_RCD
LEFT OUTER JOIN PS_GHS_HS_ASBESTOS Y ON F.USERID = Y.USERID
WHERE ( ( F.EFFDT =
(SELECT MAX(F_ED.EFFDT) FROM PS_JOB F_ED
WHERE F.USERID = F_ED.USERID
AND F.EMPL_RCD = F_ED.EMPL_RCD
AND F_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))
AND F.EFFSEQ =
(SELECT MAX(F_ES.EFFSEQ) FROM PS_JOB F_ES
WHERE F.USERID = F_ES.USERID
AND F.EMPL_RCD = F_ES.EMPL_RCD
AND F.EFFDT = F_ES.EFFDT) ))
From the above query I am getting a total count of 42, not 30. I looked at the data without the COUNT aggregation and it appears to retrieving a blank row in the tables, along with the concatenated data.

So you want to count distinct based on a combination of USERID and EXAM_DT?
But a count(distinct ...) only allows one field.
So then combine the 2 fields.
You can use concat for that.
Or the alternative. Group em on the date, then sum the totals.
Simplified example snippet:
declare #T table (id int identity(1,1) primary key, userid int, exam_dt datetime);
insert into #T (userid, exam_dt) values
(100, GETDATE()),(200, GETDATE()),(100, GETDATE()-1),(200, GETDATE()+0.001),(NULL,NULL);
select * from #T;
-- Method 1.1
select count(distinct concat(userid,'_',cast(exam_dt as date))) as total_unique from #T where userid is not null;
-- Method 1.2 : Adjustment because of the left joins. When there's no match then the values of the joined table would appear as NULL
select count(distinct nullif(concat(userid,'_',cast(exam_dt as date)),'_')) as total_unique from #T;
-- Method 2
select sum(total) as total_unique
from(
select count(distinct t.userid) as total
from #T t
group by cast(t.exam_dt as date)
) q;
Returns 3.
Because userid 100 has 2 records with different dates, therefore counts as 2.
While userid 200 has 2 records with the same date, therefore counts as 1.
Simplified example snippet with joins:
declare #T table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
declare #F1 table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
declare #F2 table (id int identity(1,1) primary key, userid int, empl_rcd int default 0, exam_dt date);
insert into #T (userid, exam_dt) values (100, GETDATE()),(200, GETDATE()),(100, GETDATE()-1),(200, GETDATE()),(300, GETDATE());
insert into #F1 (userid, exam_dt) values (100, GETDATE()),(200, GETDATE()),(200, GETDATE()+1);
insert into #F2 (userid, exam_dt) values (100, GETDATE()),(300, GETDATE()+1),(300, GETDATE()+2);
select (total0 + total1 + total2) as total, q.*
from (
select
count(distinct nullif(concat(t0.userid,'_',t0.exam_dt),'_')) as total0,
count(distinct nullif(concat(f1.userid,'_',f1.exam_dt),'_')) as total1,
count(distinct nullif(concat(f2.userid,'_',f2.exam_dt),'_')) as total2
from #T t0
left join #F1 f1 on (f1.userid = t0.userid and f1.empl_rcd = t0.empl_rcd)
left join #F2 f2 on (f2.userid = t0.userid and f2.empl_rcd = t0.empl_rcd)
) q;

Related

Oracle query optimization recommendation

Below query is just taking long time and the below predicate is used only to get unique records, as such was wondering is there a different way to rewrite the same query without calling the below predicate multiple times, to get the unique ID.
select max(c.id) from plocation c where c.ids = y.ids and c.idc = y.idc)
select max(cr.id) from plocation_log cr where cr.ids = yt.ids and cr.idc = yt.idc)
select max(pr.id) from patentpr where pr.ids = p.ids and pr.idc = p.idc)
My full sample query
SELECT to_char(p.pid) AS patentid,
p.name,
x.dept,
y.location
FROM patent p
JOIN pdetails x ON p.pid = x.pid AND x.isactive = 1
JOIN plocation y
ON y.idr = p.idr
AND y.idc = p.idc
AND y.id = *(select max(c.id) from plocation c where c.ids = y.ids and c.idc = y.idc)*
AND y.idopstype in (36, 37)
JOIN plocation_log yt
ON yt.idr = y.idr
AND yt.idc= y.idc
AND yt.id = *(select max(cr.id) from plocation_log cr where cr.ids = yt.ids and cr.idc = yt.idc)*
AND yt.idopstype in (36,37)
WHERE
p.idp IN (10,20,30)
AND p.id = *(select max(pr.id) from patent pr where pr.ids = p.ids and pr.idc = p.idc)*
AND p.idopstype in (36,37)
Consider joining to aggregate CTEs to calculate MAX values per group once as opposed to rowwise MAX calculation for every outer query row. Also, be sure to use more informative table aliases instead of a, b, c or x, y, z style.
WITH loc_max AS
(select ids, idc, max(id) as max_id from plocation group ids, idc)
, log_max AS
(select ids, idc, max(id) as max_id from plocation_log group by ids, idc)
, pat_max AS
(select ids, idc, max(id) as max_id from patent pr group by ids, idc)
SELECT to_char(pat.pid) AS patentid
, pat.name
, det.dept
, loc.location
FROM patent pat
JOIN pdetails det
ON pat.pid = det.pid
AND det.isactive = 1
JOIN plocation loc
ON loc.idr = pat.idr
AND loc.idc = pat.idc
AND loc.idopstype IN (36, 37)
JOIN loc_max -- ADDED CTE JOIN
ON loc.id = loc_max.max_id
AND loc.ids = loc_max.ids
AND loc.idc = loc_max.idc
JOIN plocation_log log
ON log.idr = log.idr
AND log.idc = log.idc
AND log.idopstype in (36,37)
JOIN log_max -- ADDED CTE JOIN
ON log.id = log_max.max_id
AND log.ids = log_max.ids
AND log.idc = log_max.idc
JOIN pat_max -- ADDED CTE JOIN
ON pat.id = pat_max.max_id
AND pat.ids = pat_max.ids
AND pat.idc = pat_max.idc
WHERE pat.idp IN (10, 20, 30)
AND pat.idopstype IN (36, 37)
As commented by The Impaler, one option is to use analytic functions instead of correlated subqueries. The idea is to rank records within subqueries using RANK(), then filter in the outer query (join conditions or WHERE clause).
Consider:
SELECT to_char(p.pid) AS patentid,
p.name,
x.dept,
y.location
FROM (SELECT p.*, RANK() OVER(PARTITION BY ids, idc ORDER BY id) rn FROM patinet) p
JOIN pdetails x ON p.pid = x.pid AND x.isactive = 1
JOIN (SELECT y.*, RANK() OVER(PARTITION BY ids, idc ORDER BY id) rn FROM plocation y) y
ON y.idr = p.idr
AND y.idc = p.idc
AND y.idopstype in (36, 37)
AND y.rn = 1
JOIN (SELECT y.*, RANK() OVER(PARTITION BY ids, idc ORDER BY id) rn FROM plocation_log yt) yt
ON yt.idr = y.idr
AND yt.idc= y.idc
AND yt.idopstype in (36,37)
AND yt.rn = 1
WHERE
p.idp IN (10,20,30)
AND p.idopstype in (36,37)
AND p.rn = 1

extra sql query column

I need to select extra columns from another table in my sql query.
SELECT
d.UnitID,
b.BookingID,
d.ProjectID,
b.ClientName,
(SELECT LetterTypeID
FROM Letters
WHERE ProjectID = 27 AND BookingID = b.BookingID)
FROM
ScheduledDues AS d
INNER JOIN
Booking AS b ON d.BookingID = b.BookingID
INNER JOIN
Units AS u ON d.UnitID = u.UnitID
WHERE
d.ProjectID = 27
AND DueFrom <= GETDATE()
GROUP BY
d.BookingID, d.UnitID, d.ProjectID,
u.UnitNo, b.ClientName
HAVING
SUM(DueTill) = 0
How can I do this? and have it in group by. Is selecting LetterTypeID possible?
Need to use LetterTypeID in group by like below or You can fetch results without letters table into temp and join it with letters to get LetterTypeID
SELECT d.UnitID,
b.BookingID,
d.ProjectID,
b.ClientName,
l.LetterTypeID
FROM ScheduledDues AS d INNER JOIN
Booking AS b ON d.BookingID = b.BookingID INNER JOIN
Units AS u ON d.UnitID = u.UnitID INNER JOIN
Letters AS l ON b.BookingID=l.BookingID AND l.ProjectID=27
WHERE d.ProjectID = 27 AND
DueFrom <= GETDATE()
GROUP BY d.BookingID,
d.UnitID,
d.ProjectID,
u.UnitNo,
b.ClientName,
l.LetterTypeID
HAVING SUM(DueTill) = 0
Can you try this query
SELECT BookingID, UnitID, ProjectID
UnitNo, ClientName ,LetterTypeID FROM (SELECT
d.UnitID,
b.BookingID,
d.ProjectID,
b.ClientName,
DueTill,
(SELECT LetterTypeID
FROM Letters
WHERE ProjectID = 27 AND BookingID = b.BookingID) LetterTypeID
FROM
ScheduledDues AS d
INNER JOIN
Booking AS b ON d.BookingID = b.BookingID
INNER JOIN
Units AS u ON d.UnitID = u.UnitID
WHERE
d.ProjectID = 27
AND DueFrom <= GETDATE()) x
GROUP BY
BookingID, UnitID, ProjectID,
UnitNo, ClientName ,LetterTypeID
HAVING
SUM(DueTill) = 0

How I can select highest review from a user?

I need to select reviews for product, but unique by user (i.e. one review from user).
With my code, I select all reviews, and I can see few reviews left by one user.
SELECT
tr.reviewText, tr.reviewDate, tr.reviewRating,
u.userName AS userName,
u.userFirstName AS userFirstName, u.userSurname AS userSurname,
u.countryId AS countryId
FROM
tblReviews tr
INNER JOIN
tblOrderProduct op ON op.orderProductId = tr.orderProductId
AND op.productOptionId IN (SELECT productOptionId
FROM tblProductOption
WHERE productSubCuId = 111
AND productOptionActive = 1)
LEFT JOIN
tblOrder o ON o.orderId = op.orderId
LEFT JOIN
tblUser u ON u.userRandomId = o.userRandomId
WHERE
tr.reviewsStatusId = 2
ORDER BY
tr.reviewRating DESC, tr.reviewDate DESC
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY
Can I get just one review from each user?
Maybe I need select userId -> group results by userId and select one per group? [I tried to do so, but I didn't succeed :( ]
You can use row_number to number the reviews and select any one like below:
;with per_user_one_review
as
(SELECT tr.reviewText, tr.reviewDate, tr.reviewRating,
u.userName as userName,
u.userFirstName as userFirstName, u.userSurname as userSurname,
u.countryId as countryId, row_number() over (partition by u.userRandomId order by tr.reviewDate desc) rn
FROM tblReviews tr
INNER JOIN tblOrderProduct op
ON op.orderProductId = tr.orderProductId
AND op.productOptionId IN (
SELECT productOptionId FROM tblProductOption
WHERE productSubCuId = 111 AND productOptionActive = 1
)
LEFT JOIN tblOrder o ON o.orderId = op.orderId
LEFT JOIN tblUser u ON u.userRandomId = o.userRandomId
WHERE tr.reviewsStatusId = 2
ORDER BY tr.reviewRating DESC, tr.reviewDate DESC
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY
)
select * from per_user_one_review where rn = 1
It will pick the latest review (reviewDate desc) from the user.
If you need the last review you could use a join with the suquery for max review date grouped by orderProductId
(and as a suggestion you could use a inner join instead of a IN clasue based on a subquery)
select tr.reviewText
, tr.reviewDate
, tr.reviewRating
, u.userName
, u.userFirstName
, u.userSurname
, u.countryId
from tblReviews tr
INNER JOIN (
select max(reviewDate) max_date, orderProductId
from tblReviews
group by orderProductId
) t1 on t1.orderProductId = tr.orderProductId and t1.max_date = tr.reviewDate
INNER JOIN tblOrderProduct op ON op.orderProductId = tr.orderProductId
INNER JOIN (
SELECT productOptionId
FROM tblProductOption
WHERE productSubCuId = 111 AND productOptionActive = 1
) t2 ON op.productOptionId = t2.productOptionId
LEFT JOIN tblOrder o ON o.orderId = op.orderId
LEFT JOIN tblUser u ON u.userRandomId = o.userRandomId
WHERE tr.reviewsStatusId = 2
ORDER BY tr.reviewRating DESC, tr.reviewDate DESC
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY

SQL correct query or not

given these relationships, how could you query the following:
The tourists (name and email) that booked at least a pension whose rating is greater than 9, but didn't book any 3 star hotel with a rating less than 9.
Is the following correct?
SELECT Tourists.name, Tourists.email
FROM Tourists
WHERE EXISTS (
SELECT id FROM Bookings
INNER JOIN Tourists ON Bookings.touristId=Tourists.id
INNER JOIN AccomodationEstablishments ON Bookings.accEstId=AccomodationEstablishments.id
INNER JOIN AccomodationTypes ON AccomodationEstablishments.accType=AccomodationTypes.id
WHERE AccomodationTypes.name = 'Pension' AND
AccomodationEstablishments.rating > 9
) AND NOT EXISTS (
SELECT id FROM Bookings
INNER JOIN Tourists ON Bookings.touristId=Tourists.id
INNER JOIN AccomodationEstablishments ON Bookings.accEstId=AccomodationEstablishments.id
INNER JOIN AccomodationTypes ON AccomodationEstablishments.accType=AccomodationTypes.id
WHERE AccomodationTypes.name = 'Hotel' AND
AccomodationEstablishments.noOfStars = 3 AND
AccomodationEstablishments.rating < 9
)
I would do this using aggregation and having:
SELECT t.name, t.email
FROM Bookings b INNER JOIN
Tourists t
ON b.touristId = t.id INNER JOIN
AccomodationEstablishments ae
ON b.accEstId = ae.id INNER JOIN
AccomodationTypes a
ON ae.accType = a.id
GROUP BY t.name, t.email
HAVING SUM(CASE WHEN a.name = 'Pension' AND ae.rating > 9 THEN 1 ELSE 0 END) > 0 AND
SUM(a.name = 'Hotel' AND ae.noOfStars = 3 AND ae.rating < 9 THEN 1 ELSE 0 END)= 0;
Your method also works, but you probably need t.id in the subqueries.

sql group by clause error

This is the error
Msg 8120, Level 16, State 1, Procedure FollowingUpdates, Line 10
Column 'TopicsComplete.TopicCreationDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
This is after adding these 2 lines, I need to count a separate table rows (the amount of rows not the count of the topicid) and include in result any ideas? thanks
,COUNT(DISTINCT MC.topicid) AS NewMessagesCount
LEFT OUTER JOIN Messages AS MC ON MC.TopicId = T.TopicId AND MC.userid = #id
#id int = null
,#UserGroupId int = null
AS
SELECT
*
FROM
(SELECT
ROW_NUMBER()
OVER ( ORDER BY TopicOrder desc
, CASE WHEN M.MessageCreationDate > T.TopicCreationDate
THEN M.MessageCreationDate
ELSE T.TopicCreationDate
END desc )
AS RowNumber,
T.TopicId, T.TopicTitle, T.TopicShortName, T.TopicDescription, T.TopicCreationDate, T.TopicViews, T.TopicReplies, T.UserId, T.TopicTags, T.TopicIsClose, T.TopicOrder, T.LastMessageId, T.UserName, M.MessageCreationDate, T.ReadAccessGroupId, T.PostAccessGroupId, U.UserGroupId, U.UserPhoto, T.UserFullName ,M.UserId AS MessageUserId ,MU.UserName AS MessageUserName
,COUNT(DISTINCT MC.topicid) AS NewMessagesCount
FROM TopicsComplete AS T
LEFT OUTER JOIN Messages AS M ON M.TopicId = T.TopicId AND M.MessageId = T.LastMessageId AND M.Active = 1
LEFT OUTER JOIN Messages AS MC ON MC.TopicId = T.TopicId AND MC.userid = #id
INNER JOIN Users AS U ON U.UserId = T.UserId
LEFT JOIN Users MU ON MU.UserId = M.UserId
WHERE EXISTS
(SELECT * FROM TopicsComplete
LEFT OUTER JOIN Messages AS M ON M.TopicId = T.TopicId AND M.MessageId = T.LastMessageId AND M.Active = 1 INNER JOIN
topicfollows AS TF ON T.TopicId != TF.topicid INNER JOIN
Users AS U ON U.UserId = T.UserId LEFT OUTER JOIN
Users AS MU ON MU.UserId = M.UserId
WHERE (T.UserId = #id)
UNION SELECT * FROM TopicsComplete
LEFT OUTER JOIN Messages AS M ON M.TopicId = T.TopicId AND M.MessageId = T.LastMessageId AND M.Active = 1 INNER JOIN
topicfollows AS TF ON T.TopicId = TF.topicid INNER JOIN
Users AS U ON U.UserId = T.UserId LEFT JOIN
Users MU ON MU.UserId = M.UserId
WHERE (TF.userid = #id)
)
) T
When you have an aggregation function in the select, SQL Server quite reasonably assumes that you want to do an aggregation. All columns not in aggregation functions should then be in the group by clause.
In your case, you have COUNT(DISTINCT MC.topicid) AS NewMessagesCount in a select expression. All the other columns should be in the group by. There is no group by, but you get the error anyway, because one should be there.
You need to have any column not contained in an aggregate (max, min, count, sum, etc.) in the GROUP BY clause.