Calculate number of distinct instances of value in column - sql

long time lurker. I've searched and searched though none of the solutions work for me.
I'm working in a Sybase (ASE) db (most mssql/mysql transactional db solutions will work just fine)
In my example, I'm trying to calculate/count the number of times a specific 'party_id' is listed in a column. The problem I'm having is that it's only counting FOR each row- so of course the count is always going to be 1.
See output:
(I would like for party_id 130568 to show '2' in the refs column, 125555 to show '5', etc.)
output
Here is my query:
select
count(distinct p.party_id) as refs,
p.party_id,
sp_first_party(casenum),
c.casenum,
mld.mailing_list,
p.our_client
from cases c
inner join party p on c.casenum=p.case_id
inner join names n on n.names_id=p.party_id
inner join mailing_list_defined mld on n.names_id=mld.names_id
where
mld.mailing_list like 'Mattar Stars'
and mld.addr_type like 'Home'
and n.deceased='N'
and p.our_client='Y'
group by p.party_id, c.casenum, mld.mailing_list, p.our_client
order by sp_first_party(casenum) asc
Any tips would be greatly appreciated.
Thank you

Sounds like you need to be using an APPLY statement. Not sure if the join criteria on the APPLY statement is correct, but you should be able to extrapolate the logic. See if that will work with Sybase.
SELECT pic.PartyInstanceCount AS refs
,p.party_id
,sp_first_party(casenum)
,c.casenum
,mld.mailing_list
,p.our_client
FROM cases AS c
INNER JOIN party AS p ON c.casenum = p.case_id
INNER JOIN names AS n ON n.names_id = p.party_id
INNER JOIN mailing_list_defined AS mld ON n.names_id = mld.names_id
OUTER APPLY (
SELECT COUNT(1) AS PartyInstanceCount
FROM party p2
WHERE p2.case_id = c.casenum
) pic
WHERE mld.mailing_list LIKE 'Mattar Stars'
AND mld.addr_type LIKE 'Home'
AND n.deceased = 'N'
AND p.our_client = 'Y'
ORDER BY
sp_first_party(casenum) ASC

Related

How to filter where a condition is true at least once

I need to filter down to only service orders that have a "service" work group value in at least one of their tasks. However, I don't want to get rid of the rows that aren't work group = "Service" if at least one of the task rows has that value. The end result would leave out all data from service orders that didn't have at least one BI_WRKFLW_TASK_KEY that was equal to "SERVICE". I know how to do normal filters but getting it to this specificity is beyond my current experience.
I've experimented with normal filters but they leave out rows that are a part of the same Service Order but just don't have that work group.
SELECT W.BI_WRKFLW_KEY,
T.BI_WORK_EVENT_CD,
T.BI_TASK_CD,
T.BI_WORKGRP,
**M.BI_SO_NBR**,
M.BI_SO_TYPE_CD,
M.BI_CLOSE_DT,
M.BI_OPEN_DT,
M.BI_SO_STAT_CD,
R.BI_WRKFLW_TMPLT_NM,
T.BI_WRKFLW_TASK_SEQ_NBR,
T.BI_WORKGRP,
A.BI_WORK_EVENT_CD,
A.BI_EVENT_DT_TM,
A.SY_JOB_QUEUE_ID,
**A.BI_WORKGRP**,
A.SY_USER_ID,
**A.BI_WRKFLW_TASK_KEY**
FROM BI_WRKFLW W
LEFT JOIN BI_WRKFLW_TASKS T ON W.BI_WRKFLW_KEY = T.BI_WRKFLW_KEY
LEFT JOIN BI_SO_DET D ON W.BI_WRKFLW_KEY = D.BI_WRKFLW_KEY
LEFT JOIN BI_SO_MASTER M ON D.BI_SO_NBR = M.BI_SO_NBR
LEFT JOIN BI_WRKFLW_TMPLT_REF R ON W.BI_WRKFLW_TMPLT_ID = R.BI_WRKFLW_TMPLT_ID
LEFT JOIN BI_TASK_ACT A ON T.BI_WRKFLW_TASKS_KEY = A.BI_WRKFLW_TASKS_KEY
WHERE M.BI_OPEN_DT >= ADD_MONTHS(CURRENT_DATE, -'12')
--AND M.BI_SO_TYPE_CD IN ('IVC-NEW1')
--AND M.BI_SO_STAT_CD LIKE 'O'
ORDER BY M.BI_SO_NBR, T.BI_EVENT_DT_TM
Any Service order row where the Service order has at least one BI_WRKFLOW_TASK_CD = "Service" would be kept and all other service orders filtered out.
I tried to map this out, i may not have got it quite right,
I think you are asking for BI_SO_MASTER records that have >=1 BI_WRKFLW_TASKS that belong to a certain group.
Try using a CTE to get the detail rows with a correct task, then you can find the SO population... then you can ???not sure what the ultimate result set goal is?
;with matchingTasks as ( D.BI_SO_NBR, D.<id> , W.BI_WRKFLW_KEY , T.<key> , A.Key
from BI_WRKFLW W
LEFT JOIN BI_WRKFLW_TASKS T ON W.BI_WRKFLW_KEY = T.BI_WRKFLW_KEY
LEFT JOIN BI_SO_DET D ON W.BI_WRKFLW_KEY = D.BI_WRKFLW_KEY
LEFT JOIN BI_TASK_ACT A ON T.BI_WRKFLW_TASKS_KEY = A.BI_WRKFLW_TASKS_KEYW
Where
<good dates>
and <A.field is what I am looking for>
)
/*Here you have the SO population
as well as the ids that helped this SO qualify.
*/
, My_SO_Population as (select Distinct BI_SO_NBR from matchingTasks )
/*now you can go get what you need.
the challenge of finding SOs w/ >=1 matching task has been solved...
*/
select <necessary fields> from
My_SO_Population
join <whatever you need....this is where i am cloudy>
if i am missing the goal, let me know where...
You can just add this to your WHERE clause:
AND T.BI_WRKFLW_KEY IN (
SELECT BI_WRKFLW_KEY
FROM BI_WRKFLW_TASKS
WHERE BI_WRKFLOW_TASK_CD = 'Service')

COUNT is outputting more than one row

I am having a problem with my SQL query using the count function.
When I don't have an inner join, it counts 55 rows. When I add the inner join into my query, it adds a lot to it. It suddenly became 102 rows.
Here is my SQL Query:
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'
GROUP BY [fmsStage].[dbo].[File].[FILENUMBER]
Also, I have to do TOP 1 at the SELECT statement because it returns 51 rows with random numbers inside of them. (They are probably not random, but I can't figure out what they are.)
What do I have to do to make it just count the rows from [fmsStage].[dbo].[file].[FILENUMBER]?
First, your query would be much clearer like this:
SELECT COUNT(f.[FILENUMBER])
FROM [fmsStage].[dbo].[File] f INNER JOIN
[fmsStage].[dbo].[Container] c
ON v.[FILENUMBER] = c.[FILENUMBER]
WHERE f.[RELATIONCODE] = 'SHIP02' AND
c.DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08';
No GROUP BY is necessary. Otherwise you'll just one row per file number, which doesn't seem as useful as the overall count.
Note: You might want COUNT(DISTINCT f.[FILENUMBER]). Your question doesn't provide enough information to make a judgement.
Just remove GROUP BY Clause
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'

Sum(IIF( including results with 0 count

Hey All i am using sum iff to return a count based on multiple criteria.
i am basically running a report on calls recieved per site, however i need sites with 0 calls included in the result set, with the value of 0 or even Null, if they have no calls for that week.
only issue is that my where cluase has only included sites that have had calls in the week
Any ideas.
Code:
SELECT
d.sitename,
count(c.Chargeablecalls) AS All_Calls,
SUM(IIf(c.ChargeableCalls Like "Chargeable",1,0)) AS Chargeable_calls,
d.sitetype
FROM
(Callstatus AS s LEFT JOIN statusconversion AS c ON s.description=c.reportheading)
INNER JOIN sitedetails AS d ON s.zone=d.zone
WHERE s.date_loaded BETWEEN
(SELECT reportdate FROM reportMonth) AND (SELECT priorweek FROM reportMonth)
GROUP BY d.sitename, d.sitetype;
You need a RIGHT JOIN for sitedetails in order to get all the sites even those with no calls.
You may need to do the first half of query separately and then use that query in the main query.
create a new query - qryCallStatus:
SELECT DISTINCT zone, description
FROM Callstatus, reportMonth
WHERE
Callstatus.date_loaded BETWEEN reportMonth.reportdate AND reportMonth.priorweek;
Then change your output query to:
SELECT
d.sitename,
count(c.Chargeablecalls) AS All_Calls,
SUM(IIf(c.ChargeableCalls Like "Chargeable",1,0)) AS Chargeable_calls,
d.sitetype
FROM
(sitedetails AS d LEFT JOIN qryCallStatus AS s ON d.zone=s.zone)
LEFT JOIN statusconversion AS c ON s.description=c.reportheading
GROUP BY d.sitename, d.sitetype;

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

Sorting rows by count of a many-to-many associated record

I know there are a lot of other SO entries that seem like this one, but I haven't found one that actually answers my question so hopefully one of you can either answer it or point me to another SO question that is related.
Basically, I have the following query that returns Venues that have any CheckIns that contain the searched Keyword ("foobar" in this example).
SELECT DISTINCT v.*
FROM "venues" v
INNER JOIN "check_ins" c ON c."venue_id" = v."id"
INNER JOIN "keywordings" ks ON ks."check_in_id" = c."id"
INNER JOIN "keywords" k ON ks."keyword_id" = k."id"
WHERE (k."name" = 'foobar')
I want to SELECT and ORDER BY the count of the matched Keyword for each given Venue. E.g. if there have been 5 CheckIns that have been created, associated with that Keyword, then there should be a returned column (called something like keyword_count) with the value 5 which is sorted.
Ideally this should be done without any queries in the SELECT clause, or preferably none at all.
I've been struggling with this for a while and my mind is just going blank (perhaps it's been too long a day) so some help would be greatly appreciated here.
Thanks in advance!
Sounds like you need something like:
SELECT v.x, v.y, count(*) AS keyword_count
FROM "venues" v
INNER JOIN "check_ins" c ON c."venue_id" = v."id"
INNER JOIN "keywordings" ks ON ks."check_in_id" = c."id"
INNER JOIN "keywords" k ON ks."keyword_id" = k."id"
WHERE (k."name" = 'foobar')
GROUP BY v.x, v.y
ORDER BY 3