Finding Duplicate names sql

Finding Duplicate names sql - sql

SELECT COUNT(organization.ID)
FROM organization
WHERE name in ( SELECT name FROM organization GROUP BY name HAVING count( name ) >1 )
AND organization.APPROVED=0
AND organization.CREATED_AT>'2010-07-30 10:30:21'
I'm trying to find duplicates, but this query is taking a very long time roughly 5-6 seconds. Is there another way I can find duplicates without using my method? Thanks.
SubQuery: 0.28 seconds. Everything 5.98 seconds.

SELECT organization.name, COUNT(organization.ID)
FROM organization
WHERE organization.APPROVED=0
AND organization.CREATED_AT>'2010-07-30 10:30:21'
GROUP BY name
HAVING count(organization.id) > 1;

There is no need to use the query in the WHERE clause. You can use a GROUP BY and a HAVING clause to accomplish this:
SELECT COUNT(o.ID)
FROM organization o
WHERE o.APPROVED=0
AND o.CREATED_AT>'2010-07-30 10:30:21'
GROUP BY o.Name
HAVING COUNT(o.name) > 1

Why not just do it like:
SELECT COUNT(organization.ID)
FROM organization
WHERE organization.APPROVED=0
AND organization.CREATED_AT>'2010-07-30 10:30:21'
GROUP BY organization.name
HAVING count(organization.name) > 1;

Related

SQL Help - Think I need a subquery

I have wrote the following query.
SELECT pro.[Id], COUNT(*) AS Count
FROM {Task} tsk
JOIN {profile} pro ON tsk.[ProfileId]=pro.[Id]
GROUP BY
pro.[Id]
HAVING
COUNT(*) > 1
This returns the records I am interested in but it returns the following...
ID Count
12345 3
21254 2
25458 2
I now need to take it a stage further and I think I would need to use the query I have wrote within another query to get what I need.
I basically need to see the underlying data in the count e.g. task-number. So the end result will look something like this based on the above example.
ID Count
12345 123-345
12345 135-564
12345 136-985
21254 124-856
21254 135-854
25458 214-854
25458 365-850
Am I correct in thinking I need a subquery to do this and how would I go about it?
Thanks

You could go with a CTE, count, filter then join
WITH CTE AS (
SELECT pro.[Id], COUNT(*) AS Count
FROM {Task} tsk
JOIN {profile} pro ON tsk.[ProfileId]=pro.[Id]
GROUP BY
pro.[Id]
HAVING
COUNT(*) > 1
)
SELECT tsk.[Id], tsk.[ProfileId] FROM CTE
JOIN {Task} tsk ON CTE.[Id] = tsk.[ProfileId]

Is there a way to use DISTINCT and COUNT(*) together to bulletproof your code against DUPLICATE entries?

I got help with a function yesterday to correctly get the count of multiple items in a column based on multiple criteria/columns. However, if there is a way to get the DISTINCT count of all the entries in the table based on aggregated GROUP BY statement.
SELECT TIME = ap.day,
acms.tenantId,
acms.CallingService,
policyList = ltrim(sp.value),
policyInstanceList = ltrim(dp.value),
COUNT(*) AS DISTINCTCount
FROM dbo.acms_data acms
CROSS APPLY string_split(acms.policyList, ',') sp
CROSS APPLY string_split(acms.policyInstanceList, ',') dp
CROSS APPLY (select day = convert(date, acms.[Time])) ap
GROUP BY ap.day, acms.tenantId, sp.value, dp.value, acms.CallingService
I would just like to know if there would be a way to see if there is a workaround for using DISTINCT and Count(*) together and whether or not it would affect my results to make this algorithm potentially invulnerable to duplicate entries.
The reason why I have to use COUNT(*) is because I am aggregating based on every column in the table not just a specific column or multiple.

We can use DISTINCT with COUNT together like this example.
USE AdventureWorks2012
GO
-- This query shows 290 JobTitle
SELECT COUNT(JobTitle) Total_JobTitle
FROM [HumanResources].[Employee]
GO
-- This query shows only 67 JobTitle
SELECT COUNT( DISTINCT JobTitle) Total_Distinct_JobTitle
FROM [HumanResources].[Employee]
GO

How to get datetime duplicate rows in SQL Server?

Im trying to find duplicate DATETIME rows in a table,
My column has datetime values such as 2015-01-11 11:24:10.000.
I must get the duplicates in 2015-01-11 11:24 type. Rest of it, not important. I can get the right value when I use SELECT with 'convert(nvarchar(16),column,121)', but when I put this in my code, I have to use 'group by' statement, so
My code is:
SELECT ID,
RECEIPT_BARCODE,
convert(nvarchar(16),TRANS_DATE,121),
PTYPE
FROM TRANSACTION_HEADER
WHERE TRANS_DATE BETWEEN '11.01.2015' AND '12.01.2015'
GROUP BY ID,RECEIPT_BARCODE,convert(nvarchar(16),TRANS_DATE,121),PTYPE
HAVING COUNT(convert(nvarchar(16),TRANS_DATE,121)) > 1
Since SQL forces me to use 'convert(nvarchar(16),TRANS_DATE,121)' in GROUP BY statement, I can't get the duplicate values.
Any idea for this?
Thanks in advance.

If you want the actual rows that are duplicated, then use window functions instead:
SELECT th.*, convert(nvarchar(16),TRANS_DATE,121)
FROM (SELECT th.*, COUNT(*) OVER (PARTITION BY convert(nvarchar(16),TRANS_DATE,121)) as cnt
FROM TRANSACTION_HEADER th
WHERE TRANS_DATE BETWEEN '11.01.2015' AND '12.01.2015'
) th
WHERE cnt > 1;

SELECT ID,RECEIPT_BARCODE,convert(nvarchar(16),TRANS_DATE,121), PTYPE ,COUNT(*)
FROM TRANSACTION_HEADER
WHERE TRANS_DATE BETWEEN '11.01.2015' AND '12.01.2015'
GROUP ID,RECEIPT_BARCODE,convert(nvarchar(16),TRANS_DATE,121), PTYPE
HAVING COUNT(*)>1;
I think you can use count(*) directly here.try the above one.

Got a error message when I try to find out which patient account have duplicated record.

When I run the script below, I got a error message "Cannot perform an aggregate function on an expression containing an aggregate or a subquery" Please provide some advice. Thanks
SELECT
CONVERT(DECIMAL(18,5),SUM(CASE WHEN PATIENT_ACCOUNT_NO IN (
SELECT PATIENT_ACCOUNT_NO
FROM STND_ENCOUNTER
GROUP BY PATIENT_ACCOUNT_NO
HAVING ( COUNT(PATIENT_ACCOUNT_NO) > 1)) THEN 0 ELSE 1 END)) dupPatNo
FROM [DBO].[STND_ENCOUNTER]

I think the error message is pretty clear. You have a sum() function with a subquery in it (albeit within a case, but that doesn't matter).
It seems that you want to choose patients that have more than one encounter, then add 0 if the patients is in the list and 1 if the patient is not. Hmmm. . . sounds like you want to count the number of patients with only one encounter.
Try using this logic instead:
select count(*)
from (select se.*, count(*) over (partition by PATIENT_ACCOUNT_NO) as NumEncounters
from dbo.stnd_encounter se
) se
where NumEncounters = 1;
As a note, the variable you are assigning is called DupPatientNo. This sounds like the number of patients that have duplicates. In that case, the query is:
select count(distinct PATIENT_ACCOUNT_NO)
from (select se.*, count(*) over (partition by PATIENT_ACCOUNT_NO) as NumEncounters
from dbo.stnd_encounter se
) se
where NumEncounters > 1;
(Or use count(*) if you want the number of encounters on duplicate patients.)

If you want to find number of PATIENT_ACCOUNT_NO that does not have any duplicates then use the following
SELECT COUNT(DISTINCT dupPatNo.PATIENT_ACCOUNT_NO)
FROM (
SELECT PATIENT_ACCOUNT_NO
FROM STND_ENCOUNTER
GROUP BY PATIENT_ACCOUNT_NO
HAVING COUNT(PATIENT_ACCOUNT_NO) = 1
) dupPatNo
If you want to find number of PATIENT_ACCOUNT_NO that have atleast one duplicate then use the following
SELECT COUNT(DISTINCT dupPatNo.PATIENT_ACCOUNT_NO)
FROM (
SELECT PATIENT_ACCOUNT_NO
FROM STND_ENCOUNTER
GROUP BY PATIENT_ACCOUNT_NO
HAVING COUNT(PATIENT_ACCOUNT_NO) > 1
) dupPatNo
Use of DISTINCT will make the query not count same item again and again
Though your query looks for first result, its not clear what you want. Hence giving query for both

Switch case in aggregate query

I want to have a switch case in my SQL query such that when the group by does not group any element i dont want to aggregate otherwise I want to. Is that possible.
my query is something like this:
select count(1),AVG(student_mark) ,case when Count(1)=1 then student_subjectid else null end from Students
group by student_id
i get this error Column 'student_subjectid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Thanks in advance..

SELECT
student_id,
COUNT(*) AS MarkCount,
AVG(student_mark) AS student_mark,
CASE COUNT(*) WHEN 1 THEN MIN(student_subjectid) END AS student_subjectid
FROM Students
GROUP BY student_id

Why in the world would you complicate it?
select count(1), AVG(Student_mark) Student_mark
from Students
group by student_id
If there is only one student_mark, it is also the SUM, AVG, MIN and MAX - so just continue to use the aggregate!
EDIT
The dataset that would eventuate with your requirement will not normally make sense. The way to achieve that would be to merge (union) two different results
select
numRecords,
Student_mark,
case when numRecords = 1 then student_subjectid end # else is implicitly NULL
from
(
select
count(1) AS numRecords,
AVG(Student_mark) Student_mark,
min(student_subjectid) as student_subjectid
from Students
group by student_id
) x

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding Duplicate names sql - sql

SELECT organization.name, COUNT(organization.ID) FROM organization WHERE organization.APPROVED=0 AND organization.CREATED_AT>'2010-07-30 10:30:21' GROUP BY name HAVING count(organization.id) > 1;

There is no need to use the query in the WHERE clause. You can use a GROUP BY and a HAVING clause to accomplish this: SELECT COUNT(o.ID) FROM organization o WHERE o.APPROVED=0 AND o.CREATED_AT>'2010-07-30 10:30:21' GROUP BY o.Name HAVING COUNT(o.name) > 1

Why not just do it like: SELECT COUNT(organization.ID) FROM organization WHERE organization.APPROVED=0 AND organization.CREATED_AT>'2010-07-30 10:30:21' GROUP BY organization.name HAVING count(organization.name) > 1;

Related

SQL Help - Think I need a subquery

Is there a way to use DISTINCT and COUNT(*) together to bulletproof your code against DUPLICATE entries?

How to get datetime duplicate rows in SQL Server?

Got a error message when I try to find out which patient account have duplicated record.

Switch case in aggregate query

Categories

Resources