Why is my SQL Count Statement is counting too many times - sql

I have a query that is supposed to count how many times a user has logged into two different versions of our software based on unique session ids. My count in my outer select statement however is counting way too many times. For example I get 31000 sessions for one user which is incorrect. It should be something more like 40. Why is this happening?
SELECT X.FirstName, X.LastName, X.CompanyName, X.AQ8Sessions, AQ360Sessions = COUNT(RRUI.SessionId)
FROM(
SELECT RRUI.UserId, RRUI.FirstName, RRUI.LastName, RRUI.CompanyName, COUNT(distinct RRUI.SessionId) AQ8Sessions
FROM Authentication.dbo.RegReportUserInfo RRUI
INNER JOIN Authentication.dbo.RegReportSessions RRS
ON RRUI.SessionId = RRS.SessionId
INNER JOIN WebCatalog.Published.People P
ON P.PKey = RRUI.UserId
WHERE RRUI.ClientType = 'aq8' AND RRS.ExpiresAt <= '2013-11-24 23:59:59.999'
AND RRS.ExpiresAt >= '2013-11-18 00:00:00.000' AND RRUI.CompanyName NOT LIKE 'AutoQuotes%'
AND P.EMail NOT LIKE '%#aqnet.com'
GROUP BY RRUI.FirstName, RRUI.LastName, RRUI.CompanyName, RRUI.UserId
) X
INNER JOIN Authentication.dbo.RegReportSessions RRS
ON RRS.UserId = X.UserId
AND RRS.ExpiresAt <= '2013-11-24 23:59:59.999'
AND RRS.ExpiresAt >= '2013-11-18 00:00:00.000'
LEFT OUTER JOIN Authentication.dbo.RegReportUserInfo RRUI
ON X.UserId = RRUI.UserId AND RRUI.ClientType = 'aq360'
GROUP BY X.FirstName, X.LastName, X.CompanyName, X.AQ8Sessions
ORDER BY X.AQ8Sessions DESC, COUNT(RRUI.SessionId) DESC

Hard to say for sure without seeing the data but I expect one or both of these will fix it:
COUNT(DISTINCT RRUI.SessionId)
and / or
INNER JOIN Authentication.dbo.RegReportUserInfo
where you had
LEFT OUTER JOIN Authentication.dbo.RegReportUserInfo

Related

SQL (snowflake) - how can I return 1 row from a join or use MAX in a second join from result of first

I have a large query that I have pasted parts of below.
I am wanting to use the result of the first join in my second join.
What I am trying to do get the last session that has a lead_conversion then I am getting all sessions in between then and the current row
This is the part I am struggling with
left join (
select ss.id, ss.session_start, ss.lead_id
from sessions ss
inner join lead_conversions inner_lc on inner_lc.session_id = ss.id
) prev_lc
on prev_lc.lead_id = lc.lead_id
and prev_lc.session_start::TIMESTAMP < s.session_start::TIMESTAMP
left join cte_sessions reset_prev_sess
on reset_prev_sess.lead_id = lc.lead_id
and reset_prev_sess.session_start::TIMESTAMP <= s.session_start::TIMESTAMP
and (
prev_lc.session_start::TIMESTAMP IS NULL
OR
reset_prev_sess.session_start::TIMESTAMP > prev_lc.session_start::TIMESTAMP
)
my issue is I cant just fetch the last prev_lc and I cant seem to use max(prev_lc.session_start)
I have tried grouping in first select and using max but this does not work as I believe this is ran before the on
left join (
select max(ss.session_start) as session_start, max(ss.lead_id) as lead_id
from sessions ss
inner join lead_conversions inner_lc on inner_lc.session_id = ss.id
group by inner_lc.id
) prev_lc on prev_lc.lead_id = lc.lead_id
I have also tried using max in the second join but this give the error
SQL compilation error: Invalid aggregate function in ON clause [MAX(CAST(PREV_LC.SESSION_START AS TIMESTAMP_NTZ(9)))]
left join cte_sessions reset_prev_sess
on reset_prev_sess.lead_id = lc.lead_id
and reset_prev_sess.session_start::TIMESTAMP <= s.session_start::TIMESTAMP
and (
prev_lc.session_start::TIMESTAMP IS NULL
OR
reset_prev_sess.session_start::TIMESTAMP > max(prev_lc.session_start::TIMESTAMP)
)
any help with this would be very appreciated
Thank you
if I understand correctly you are looking for to join with the last session start,so what you can do is to order by startsession in your subquery and limit to 1 record:
left join (
select ss.id, ss.session_start, ss.lead_id
from sessions ss
inner join lead_conversions inner_lc on inner_lc.session_id = ss.id
order by ss.session_start desc
limit 1
) prev_lc
the rest of query stays untouched.
So I have found a solution for this if any one comes across this. I ended up just rethinking how I go about it.
I ended up adding a row number for each conversion
with cte_sessions as (
select
s.id
,s.lead_id
,s.session_start::TIMESTAMP as session_start
,CASE WHEN MAX(lc.id) IS NOT NULL
then ROW_NUMBER() over (partition by s.lead_id, (CASE WHEN
MAX(lc.id) IS NOT NULL then 1 else 0 end)
order by s.session_start
)
END as conversion_row
from sessions s
left join lead_conversions lc on lc.session_id = s.id
group by s.id, s.session_start, s.lead_id, s.project_id, s.crawler_id
order by s.session_start
)
The I just did this in the join
left join cte_sessions prev_lc on prev_lc.lead_id = lc.lead_id and prev_lc.conversion_row = s.conversion_row - 1

Finding user with max(Audit_Date) and date not in range

I'm working on a SQL query where the user's maximum Audit_Date is not in a range, as in they haven't used the system for a long time. I tried it this way:
SELECT DISTINCT
UserID
,max(Audit_Date)
FROM RV_USERS RV_USERS
INNER JOIN RV_AUDIT RV_AUDIT ON
RV_USERS.UserID=RV_AUDIT.UserID
group BY RV_USERS.UserID
AND --it doesn't like the "and" here
not exists(
select *
FROM RV_USERS RV_USERS
INNER JOIN RV_AUDIT RV_AUDIT ON ON RV_USERS.UserID=RV_AUDIT.UserID
where
Audit_Date not between '2019-05-29 00:00:00' and '10/29/2019'
)
I tried to use not exists, like the example but it's not working in this case. I get incorrect syntax near the keyword 'AND', right before the not exists. I need to make a view out of this so I think variables and temp tables are out. It's going to be used in a crystal report and scheduled in Central Management Console.
**Update1: Tried this per answer:
SELECT DISTINCT
"RV_USERS"."UserID"
,"RV_AUDIT"."Audit_Date"
FROM "RV_USERS" "RV_USERS"
INNER JOIN "RV_AUDIT" "RV_AUDIT" ON "RV_USERS"."UserID"="RV_AUDIT"."UserID"
group BY "RV_USERS"."UserID", "RV_AUDIT"."Audit_Date"
HAVING
max("RV_AUDIT"."Audit_Date") < '2019-05-29 00:00:00'
and
"RV_USERS".UserID='me'
This returns me with dates in may, even though I have used the system since May. I checked that by removing the max date part and see my dates go to today.
**Update2: Tried this per other answer:
SELECT DISTINCT
"RV_USERS"."UserID"
,max("RV_AUDIT"."Audit_Date")
FROM "RV_USERS" "RV_USERS"
INNER JOIN "RV_AUDIT" "RV_AUDIT" ON "RV_USERS"."UserID"="RV_AUDIT"."UserID"
WHERE
not exists(
select *
FROM "RV_USERS" u2
INNER JOIN "RV_AUDIT" a2 ON u2."UserID"=a2."UserID"
where
a2."Audit_Date" not between '2019-05-27 00:00:00' and '10/31/2019'
)
group BY "RV_USERS"."UserID"
This is not returning anything, but we know there are managers that haven't used the system.
**Update 3 per answer:
SELECT DISTINCT
u."UserID"
,max(a."Audit_Date")
FROM "RV_USERS" u
INNER JOIN "RV_AUDIT" a ON u."UserID"=a."UserID"
WHERE
u.UserID not in(
select u2.UserID
FROM "RV_USERS" u2
INNER JOIN "RV_AUDIT" a2 ON u2."UserID"=a2."UserID"
where
a2."Audit_Date" between '2019-05-27 00:00:00' and '10/31/2019'
)
group BY u."UserID"
You have more than one mistakes, here is corrected code:
SELECT DISTINCT
RV_U.UserID
, max(RV_U.Audit_Date)
FROM RV_USERS RV_U
INNER JOIN RV_AUDIT RV_A ON RV_U.UserID=RV_A.UserID
WHERE
NOT EXISTS(
SELECT *
FROM RV_USERS RV_USERS
INNER JOIN RV_AUDIT RV_AUDIT ON RV_USERS.UserID=RV_AUDIT.UserID
WHERE Audit_Date not between '2019-05-29 00:00:00' and '10/29/2019'
)
GROUP BY RV_U.UserID;
Your line NNER JOIN RV_AUDIT RV_AUDIT ON ON
RV_USERS.UserID=RV_AUDIT.UserID has ON two times.
GROUP BY should go on the end
AND should be replaced with WHERE
Try to implement this changes.
I would suggest a having clause:
SELECT UserID, max(Audit_Date)
FROM RV_USERS RV_USERS INNER JOIN
RV_AUDIT RV_AUDIT
ON RV_USERS.UserID = RV_AUDIT.UserID
GROUP BY RV_USERS.UserID
HAVING max(Audit_Date) < '2019-05-29';
Your query seems much more complicated than necessary.
Replace and with where clause and put the group by at the last to make the above query run atleast

SQL sum aggregate function gives wrong results

My data is given below
Right answer is Sum = 601,050.00
But SQL sum aggregate function gives me wrong answer that is 5078150.00000
15,000.00 27,950.00 24,750.00 11,550.00 7,400.00 7,500.00 14,650.00 12,500.00 32,800.00 35,700.00 94,100.00 10,100.00 19,700.00 22,100.00 35,450.00 28,050.00 50,150.00 69,750.00 13,800.00 3,600.00 18,600.00 2,350.00 7,200.00 21,600.00 7,700.00 4,500.00 2,500.00
select sum(SO_SalesOrder.OrderTotal),l.Name as [Store Name]
From SO_SalesOrder inner join BASE_Location l on
SO_SalesOrder.LocationId = l.LocationId
inner join SO_SalesOrder_Line on SO_SalesOrder.SalesOrderId =
SO_SalesOrder_Line.SalesOrderId
inner join BASE_Product on BASE_Product.ProdId =
SO_SalesOrder_Line.ProdId
inner join BASE_Category on BASE_Category.CategoryId =
BASE_Product.CategoryId
where SO_SalesOrder.OrderDate >= '2018-02-01' and
SO_SalesOrder.OrderDate <= '2018-02-28' and BASE_Category.Name = '1MHNZ'
group by l.Name
There is likely to be a problem with one (or more) of your joins, maybe you have duplicate rows or the joining conditions are not OK.
Remove the group by l.Name, the SUM() aggregate and see if the returned values for SO_SalesOrder.OrderTotal are what you are expecting them to be (you might need to filter with a particular l.Name in a WHERE clause). It's very likely you will see duplicate amounts, or amounts you are not considering when arriving to the value 601,050.00.
If so, try joining the tables 1 by 1 and see which ones are making your rows go comando.
In my opinion your problem depends on the logic of the query.
You have a master-detail relationship between SO_SalesOrder and SO_SalesOrder_line joined by SalesOrderId column.
So if you have three lines in your order you will sum up three times the same OrderTotal.
try with something like this:
select sum(SO_SalesOrder.OrderTotal) Total, l.Name as [Store Name]
From SO_SalesOrder
join BASE_Location l on SO_SalesOrder.LocationId = l.LocationId
where SO_SalesOrder.OrderDate >= '2018-02-01' and SO_SalesOrder.OrderDate <= '28-02-2018'
and exists (
select 0 x
From SO_SalesOrder_Line
join BASE_Product on BASE_Product.ProdId = SO_SalesOrder_Line.ProdId
join BASE_Category on BASE_Category.CategoryId = BASE_Product.CategoryId
where BASE_Category.Name = '1MHNZ'
and SO_SalesOrder_Line.SalesOrderId = SO_SalesOrder.SalesOrderId
)
group by l.Name
P.S.
Check also the dates columns, if they contains also time fraction you should reconsider your upper bound filter.
I suggest you to use and SO_SalesOrder.OrderDate < '01-03-2018' instead of <= 28-02

Adding an extra statement to existing SQL

I am trying to modify an SQL statement that returns the number of Incidents logged by a user. The current statement is -
SELECT
USERS.NAME,
Count(INCIDENTS_H.SERVICEREQNO)
FROM Sostenuto.sunrise.INCIDENTS_H INCIDENTS_H
INNER JOIN Sostenuto.sunrise.USERS USERS
ON INCIDENTS_H.OWNERACCOUNT = USERS.SERVICEREQNO
WHERE (INCIDENTS_H.ADDEDDATE >= {ts '2013-11-25 00:00:00'})
AND (INCIDENTS_H.OPERATIONID = 102005166)
AND (INCIDENTS_H.OWNERGROUP = 123000012
OR INCIDENTS_H.OWNERGROUP=123000031
OR INCIDENTS_H.OWNERGROUP=123000047)
AND (INCIDENTS_H.ADDEDBY=INCIDENTS_H.OWNERACCOUNT)
GROUP BY USERS.NAME
Which works fine. Howeever I need to add another clause into the statement from a different table, I need to also include-
INCIDENTS.ADDEDBY = INCIDENTS_H.OWNERACCOUNT
However I am struggling to modify the original statement to include this. Can anyone give me any pointers?
SELECT
USERS.NAME,
Count(INCIDENTS_H.SERVICEREQNO)
FROM Sostenuto.sunrise.INCIDENTS_H INCIDENTS_H
INNER JOIN Sostenuto.sunrise.USERS USERS
ON INCIDENTS_H.OWNERACCOUNT = USERS.SERVICEREQNO
INNER JOIN INCIDENTS I
ON INCIDENTS.ADDEDBY = INCIDENTS_H.OWNERACCOUNT
WHERE (INCIDENTS_H.ADDEDDATE >= {ts '2013-11-25 00:00:00'})
AND (INCIDENTS_H.OPERATIONID = 102005166)
AND (INCIDENTS_H.OWNERGROUP = 123000012
OR INCIDENTS_H.OWNERGROUP=123000031
OR INCIDENTS_H.OWNERGROUP=123000047)
AND (INCIDENTS_H.ADDEDBY=INCIDENTS_H.OWNERACCOUNT)
GROUP BY USERS.NAME
try:
SELECT u.NAME, Count(h.SERVICEREQNO)
FROM Sostenuto.sunrise.INCIDENTS_H h
Join Sostenuto.sunrise.USERS u
ON h.OWNERACCOUNT = u.SERVICEREQNO
Join Incidents i
On i.ADDEDBY = h.OWNERACCOUNT
WHERE (h.ADDEDDATE>={ts '2013-11-25 00:00:00'})
AND (h.OPERATIONID=102005166)
AND (h.OWNERGROUP=123000012 OR h.OWNERGROUP=123000031 OR h.OWNERGROUP=123000047)
AND (h.ADDEDBY=h.OWNERACCOUNT)
GROUP BY u.NAME

Adding new table to existing query

I have this existing query that works fine:
SELECT data_tool.name as tool,
MIN(data_cst.date_time) "start",
MAX(data_cst.date_time) "end",
data_cst.recipe_id,
data_target.name as target,
data_lot.name as lot,
data_wafer.name as wafer,
data_measparams.name as mp
FROM data_cst
INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
INNER JOIN data_target ON data_target.id = data_cst.target_name_id
INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id
WHERE data_target.id IN (130, 539)
AND data_cst.date_time BETWEEN '2010-01-11 00:00:00' AND '2013-01-11 23:59:59'
AND data_cst.tool_id IN (14,16)
GROUP BY wafer_id, data_cst.lot_id, data_file_id, target_name_id
HAVING count(*) < 100
ORDER BY start, tool
Now I need to add something to it. I have another table called
event_message_idx that has columns recipe_id, lot_id, tool_id,
date_time, and message_idx.
I need to find out how many rows in that table have message_idx = 'OM'
and how many have message_idx = 'SEM' joined with the above query on
recipe_id, lot_id, tool_id and has date_time between start and end.
I have not been able to figure out how to do this in one query (which
I'd really perfer to a sub query as these tables are very large and
the subquery performance has been poor in the past on this system).
I've been playing around with a left join like this:
SELECT data_tool.name as tool,
MIN(data_cst.date_time) "start",
MAX(data_cst.date_time) "end",
data_cst.recipe_id,
data_target.name as target,
data_lot.name as lot,
data_wafer.name as wafer,
data_measparams.name as mp,
event_message_idx.message_idx,
COUNT(event_message_idx.message_idx)
FROM data_cst
LEFT JOIN event_message_idx
ON event_message_idx.recipe_id = data_cst.recipe_id
AND event_message_idx.message_idx IN ('OM', 'SEM')
AND event_message_idx.lot_id = data_cst.lot_id
AND event_message_idx.tool_id = data_cst.tool_id
INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
INNER JOIN data_target ON data_target.id = data_cst.target_name_id
INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id
WHERE data_target.id IN (130, 539)
AND data_cst.date_time BETWEEN '2010-01-11 00:00:00' AND '2013-01-11 23:59:59'
AND data_cst.tool_id IN (14,16)
GROUP BY wafer_id, data_cst.lot_id, data_file_id, target_name_id,
event_message_idx.message_idx
HAVING count(*) < 100
ORDER BY start, tool
But there are 2 issues here:
I get double the number of rows I want - one for OM and one for SEM
I don't want that - I just want to know how many OM and SEM rows
there are (really I just want to know if there are 0 or more then 0 -
the actual count doesn't matter).
I am not taking the date range into account - I only want to count
rows from event_message_idx that are between start and end and I can't
figure out how to do that.
Is this possible? I'm thinking it's not and I've have to use 2 queries (which will really complicate the app), or a subquery (which I'm also struggling to write).
calculate those in a subquery, eg
SELECT data_tool.name as tool,
MIN(data_cst.date_time) "start",
MAX(data_cst.date_time) "end",
data_cst.recipe_id,
data_target.name as target,
data_lot.name as lot,
data_wafer.name as wafer,
data_measparams.name as mp,
COALESCE(a.totalOM, 0) totalOM,
COALESCE(a.totalSEM, 0) totalSEM
FROM data_cst
INNER JOIN data_tool
ON data_tool.id = data_cst.tool_id
INNER JOIN data_target
ON data_target.id = data_cst.target_name_id
INNER JOIN data_lot
ON data_lot.id = data_cst.lot_id
INNER JOIN data_wafer
ON data_wafer.id = data_cst.wafer_id
INNER JOIN data_measparams
ON data_measparams.id = data_cst.meas_params_name_id
LEFT JOIN
(
SELECT recipe_id, lot_id, tool_id,
SUM(CASE WHEN message_idx = 'OM' THEN 1 ELSE 0 END) totalOM,
SUM(CASE WHEN message_idx = 'SEM' THEN 1 ELSE 0 END) totalSEM
FROM event_message_idx
WHERE date_time BETWEEN '2010-01-11 00:00:00' AND '2013-01-11 23:59:59'
GROUP BY recipe_id, lot_id, tool_id
) a ON data_cst.recipe_id = a.recipe_id AND
data_cst.lot_id = a.lot_id AND
data_cst.tool_id = a.tool_id
WHERE data_target.id IN (130, 539) AND
(data_cst.date_time BETWEEN '2010-01-11 00:00:00' AND '2013-01-11 23:59:59') AND
data_cst.tool_id IN (14,16)
GROUP BY wafer_id, data_cst.lot_id, data_file_id, target_name_id
HAVING COUNT(*) < 100
ORDER BY `start`, tool