sql how to set a condition with having count - sql

I have this table
codice
cfmoglie
cfmarito
data
numero
invitati
I have to find the couples that are already married.
I'm tryng this one and it works:
SELECT 'moglie' as persona,
matrimonio.cfmoglie as cf,
COUNT(matrimonio.cfmoglie) as già_sposati
FROM matrimonio
GROUP BY matrimonio.cfmoglie
HAVING COUNT(matrimonio.cfmoglie)> 1
UNION
SELECT 'marito' as persona,
matrimonio.cfmarito as cf,
COUNT(matrimonio.cfmarito) as già_sposati
FROM matrimonio
GROUP BY matrimonio.cfmarito
HAVING COUNT(matrimonio.cfmarito)> 1;
Changing cfmarito or cfmoglie I have, for example, 7 records, but I need just the couples that have already married, not the person. How I can solve?

You can use analytic functions to find the rows where the cfmoglie and cfmarito have both previously occurred once or more:
SELECT *
FROM (
SELECT cfmoglie,
cfmarito,
data,
COUNT(*) OVER (PARTITION BY cfmoglie ORDER BY data) AS num_moglie,
COUNT(*) OVER (PARTITION BY cfmarito ORDER BY data) AS num_marito
FROM table_name
)
WHERE num_moglie > 1
AND num_marito > 1
If you want to find couples where at least one partner was already married (rather than both partners) then change AND to OR.
If you want to find couples who are renewing their marriage vows (i.e. the same couple has already married before) then:
SELECT *
FROM (
SELECT cfmoglie,
cfmarito,
data,
COUNT(*) OVER (PARTITION BY cfmoglie, cfmarito ORDER BY data)
AS num_married
FROM table_name
)
WHERE num_married > 1

Related

SQL query to return duplicate rows for certain column, but with unique values for another column

I have written the query shown here that combines three tables and returns rows where the at_ticket_num from appeal_tickets is duplicated but against a different at_sys_ref value
select top 100
t.t_reference, at.at_system_ref, at_ticket_num, a.a_case_ref
from
tickets t, appeal_tickets at, appeals_2 a
where
t.t_reference in ('AB123','AB234') -- filtering on these values so that I can see that its working
and t.t_number = at.at_ticket_num
and at.at_system_ref = a.a_system_ref
and at.at_ticket_num IN (select at_ticket_num
from appeal_tickets
group by at_ticket_num
having count(distinct at_system_ref) > 1)
order by
t.t_reference desc
This is the output:
t_reference at_system_ref at_ticket_num a_case_ref
-------------------------------------------------------
AB123 30838974 23641583 1111979010
AB123 30838976 23641583 1111979010
AB234 30839149 23641520 1111977352
AB234 30839209 23641520 1111988003
I want to modify this so that it only returns records where t_reference is duplicated but against a different a_case_ref. So in above case only records for AB234 would be returned.
Any help would be much appreciated.
You want all ticket appeals that have more than one system reference and more than one case reference it seems. You can join the tables, count the occurrences per ticket and then only keep the tickets that match these criteria.
select *
from
(
select
t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref,
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
count(distinct a.a_case_ref) over (partition by at.at_ticket_num) as caserefs
from tickets t
join appeal_tickets at on at.at_ticket_num = t.t_number
join appeals_2 a on a.a_system_ref = at.at_system_ref
) counted
where sysrefs > 1 and caserefs > 1
order by t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref;
Correction
It seems that SQL Server still doesn't support COUNT(DISTINCT ...) OVER (...). You can count distinct values in a subquery though. Replace
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
by
(
select count(distinct a2.a_system_ref)
from appeal_tickets at2
join appeals_2 a2 on a2.a_system_ref = at2.at_system_ref
where at2.at_ticket_num = t.t_number
) as sysrefs,
An alternative workaround is to use DENSE_RANK in two directions (found here: https://stackoverflow.com/a/53518204/2270762):
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref) +
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref desc) -
1 as sysrefs,
with data as (
<your query plus one column>,
case when
min() over (partition by t.t_reference)
<>
max() over (partition by t.t_reference)
then 1 end as dup
)
select * from data where dup = 1

How can I count duplicates that fall within a date range? (SQL)

I have a table that contains Applicant ID, Application Date and Job Description.
I am trying to identify duplicates, defined as when the same Applicant ID applies for the same Job Description within 3 days of their other application.
I have already done this for the same date, this way:
CREATE TABLE Duplicates
SELECT
COUNT (ApplicantID) as ApplicantCount
ApplicantID
ApplicationDate
JobDescription
FROM Applications
GROUP BY ApplicantID,ApplicationDate,JobDescription
-
DELETE FROM Duplicates WHERE ApplicantCount <2
SELECT COUNT(*) FROM Duplicates
I'm now trying to make it so it doesn't have to match exactly on the ApplicationDate, but falls within a range. How do you do this?
You can use lead()/lag(). Here is an example that returns the first application when there is a duplicate:
SELECT a.*
FROM (SELECT a.*,
LEAD(ApplicationDate) OVER (PARTITION BY ApplicantID, JobDescription) as next_ad
FROM Applications a
) a
WHERE next_ad <= ApplicationDate + INTERVAL 3 DAY;
You can also phrase this using exists:
select a.*
from applications a
where exists (select 1
from applications a2
where a2.ApplicantID = a.ApplicantID and
a2.JobDescription = a.JobDescription and
a2.ApplicationDate > a.ApplicationDate and
a2.ApplicationDate <= a.ApplicationDate + interval 3 day
);

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Got a error message when I try to find out which patient account have duplicated record.

When I run the script below, I got a error message "Cannot perform an aggregate function on an expression containing an aggregate or a subquery" Please provide some advice. Thanks
SELECT
CONVERT(DECIMAL(18,5),SUM(CASE WHEN PATIENT_ACCOUNT_NO IN (
SELECT PATIENT_ACCOUNT_NO
FROM STND_ENCOUNTER
GROUP BY PATIENT_ACCOUNT_NO
HAVING ( COUNT(PATIENT_ACCOUNT_NO) > 1)) THEN 0 ELSE 1 END)) dupPatNo
FROM [DBO].[STND_ENCOUNTER]
I think the error message is pretty clear. You have a sum() function with a subquery in it (albeit within a case, but that doesn't matter).
It seems that you want to choose patients that have more than one encounter, then add 0 if the patients is in the list and 1 if the patient is not. Hmmm. . . sounds like you want to count the number of patients with only one encounter.
Try using this logic instead:
select count(*)
from (select se.*, count(*) over (partition by PATIENT_ACCOUNT_NO) as NumEncounters
from dbo.stnd_encounter se
) se
where NumEncounters = 1;
As a note, the variable you are assigning is called DupPatientNo. This sounds like the number of patients that have duplicates. In that case, the query is:
select count(distinct PATIENT_ACCOUNT_NO)
from (select se.*, count(*) over (partition by PATIENT_ACCOUNT_NO) as NumEncounters
from dbo.stnd_encounter se
) se
where NumEncounters > 1;
(Or use count(*) if you want the number of encounters on duplicate patients.)
If you want to find number of PATIENT_ACCOUNT_NO that does not have any duplicates then use the following
SELECT COUNT(DISTINCT dupPatNo.PATIENT_ACCOUNT_NO)
FROM (
SELECT PATIENT_ACCOUNT_NO
FROM STND_ENCOUNTER
GROUP BY PATIENT_ACCOUNT_NO
HAVING COUNT(PATIENT_ACCOUNT_NO) = 1
) dupPatNo
If you want to find number of PATIENT_ACCOUNT_NO that have atleast one duplicate then use the following
SELECT COUNT(DISTINCT dupPatNo.PATIENT_ACCOUNT_NO)
FROM (
SELECT PATIENT_ACCOUNT_NO
FROM STND_ENCOUNTER
GROUP BY PATIENT_ACCOUNT_NO
HAVING COUNT(PATIENT_ACCOUNT_NO) > 1
) dupPatNo
Use of DISTINCT will make the query not count same item again and again
Though your query looks for first result, its not clear what you want. Hence giving query for both

I need a sql query to group by name but return other fields based on the most recent entry

I'm querying a single table called PhoneCallNotes. The caller FirstName, LastName and DOB are recorded for each call as well as many other fields including a unique ID for the call (PhoneNoteID) but no unique ID for the caller.
My requirement is to return a list of callers with duplicates removed along with the PhoneNoteID, etc from their most recent entry.
I can get the list of users I want using a Group By on name, DOB and Max(CreatedOn) but how do I include uniqueID (of the most recent entry in the results?)
select O.CallerFName,O.CallerLName,O.CallerDOB,Max(O.CreatedOn)
from [dbo].[PhoneCallNotes] as O
where O.CallerLName like 'Public'
group by O.CallerFName,O.CallerLName,O.CallerDOB order by Max(O.CreatedOn)
Results:
John Public 4/4/2001 4/6/12 16:42
Joe Public 4/12/1988 4/6/12 16:52
John Public 1/2/1950 4/6/12 17:01
Thanks
You can also write what Andrey wrote somewhat more compactly if you select TOP (1) WITH TIES and put the ROW_NUMBER() expression in the ORDER BY clause:
SELECT TOP (1) WITH TIES
CallerFName,
CallerLName,
CallerDOB,
CreatedOn,
PhoneNoteID
FROM [dbo].[PhoneCallNotes]
WHERE CallerLName = 'Public'
ORDER BY ROW_NUMBER() OVER(
PARTITION BY CallerFName, CallerLName, CallerDOB
ORDER BY CreatedOn DESC
)
(By the way, there's no reason to use LIKE for a simple string comparison.)
Try something like that:
;WITH CTE AS (
SELECT
O.CallerFName,
O.CallerLName,
O.CallerDOB,
O.CreatedOn,
PhoneNoteID,
ROW_NUMBER() OVER(PARTITION BY O.CallerFName, O.CallerLName, O.CallerDOB ORDER BY O.CreatedOn DESC) AS rn
FROM [dbo].[PhoneCallNotes] AS O
WHERE
O.CallerLName LIKE 'Public'
)
SELECT
CallerFName,
CallerLName,
CallerDOB,
CreatedOn,
PhoneNoteID
FROM CTE
WHERE rn = 1
ORDER BY
CreatedOn
Assuming that the set of [FirstName, LastName, DateOfBirth] are unique (#shudder#), I believe the following should work, on pretty much every major RDBMS:
SELECT a.callerFName, a.callerLName, a.callerDOB, a.createdOn, a.phoneNoteId
FROM phoneCallNotes as a
LEFT JOIN phoneCallNotes as b
ON b.callerFName = a.callerFName
AND b.callerLName = a.callerLName
AND b.callerDOB = a.callerDOB
AND b.createdOn > a.createdOn
WHERE a.callerLName LIKE 'Public'
AND b.phoneNoteId IS NULL
Basically, the query is looking for every phone-call-note for a particular name/dob combination, where there is not a more-recent row (b is null). If you have two rows with the same create time, you'll get duplicate rows, though.