How to limit duplicated rows - sql

I would like help regarding an SQL query.
Looking around the site, I found several code snippets to return duplicate rows.
Here is the one I went with:
select unumber, name, localid
from table1
where unumber
in (select unumber from table1 group by unumber having count (*) > 1 )
order by unumber
which works fine, however, in the table I have other columns as well, like timestamp etc.
As such, when I run the query I indeed get the duplicate rows, however, I get the duplicates several times due to different timestamps for example.
Is there any way to limit the results to 'unique' duplicate rows only?
Hope this makes sense!
Thank you in advance!

For what you describe, you can just use select distinct:
select distinct unumber, name, localid
from table1
where unumber in (select unumber from table1 group by unumber having count (*) > 1 )
order by unumber;
However, I would be more likely to write this using window functions:
select unumber, name, localid
from (select t1.*,
count(*) over (partition by unumber) as cnt,
row_number() over (partition by unumber, name, localid order by unumber) as seqnum
from table1 t1
) t1
where cnt > 1 and seqnum = 1;

Related

best way to get count and distinct count of rows in single query

What is the best way to get count of rows and distinct rows in a single query?
To get distinct count we can use subquery like this:
select count(*) from
(
select distinct * from table
)
I have 15+ columns and have many duplicates rows as well and I want to calculate count of rows as well as distinct count of rows in one query.
More if I use this
select count(*) as Rowcount , count(distinct *) as DistinctCount from table
This will not give accurate results as count(distinct *) doesn't work.
Why don't you just put the subquery inside another query?
select count(*),
(select count(*) from (select distinct * from table))
from table;
create table tbl
(
col int
);
insert into tbl values(1),(2),(1),(3);
select count(*) as distinct_count, sum(sum) as all_count
from (
select count(col) sum from tbl group by col
)A
I think I have understood what you are looking for. You need to use some window function. So, you query should be look like =>
Select COUNT(*) OVER() YourRowcount ,
COUNT(*) OVER(Partition BY YourColumnofGroup) YourDistinctCount --Basic of the distinct count
FROM Yourtable
NEW Update
select top 1
COUNT(*) OVER() YourRowcount,
DENSE_RANK() OVER(ORDER BY YourColumn) YourDistinctCount
FROM Yourtable ORDER BY TT DESC
Note: This code is written sql server. Please check the code and let me know.

HIVE - Getting ALL columns of the table with COUNT(*) with DISTINCT values

I have the table below called Current_Table
I want to get the output that is,
The Column personalemailtrim to be DISTINCT
The column Occurrences must be over Count >1
Order by the column personalemailtrim
My Query so far build is wrong in many levels, Group by cant with DISTINCT and also using Count(*) doesnt give me any results with Group my etc....
SELECT id,
personalemailtrim,
personworksatnumberofbsbs,
region,
district,
branch,
num,
countofapptsatbsb,
COUNT(personalemailtrim) occurrences
FROM Current_table
GROUP BY id,
personalemailtrim,
personworksatnumberofbsbs,
region,
district,
branch,
num,
countofapptsatbsb
HAVING COUNT(*) > 1
ORDER BY personalemailtrim
Any help provided is really appreciated . I tried several breaking down code methods but i am stuck on this
further to elaborate , The expected output should look like below
As you can see the,
Occurrences are > 1
personalemailtrim is now DISTINCT
I think you want:
select t.*
from (select t.*,
row_number() over (partition by personalemailtrim order by id) as seqnum
from Current_table t
) t
where seqnum = 1 and occurrences > 1;
This assumes that occurrences is the same for each personalemailtrim, which is consistent with your data and with your question.

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

How to sort by number of counts in SQL Server?

I want to bring the two biggest numbers and how many it.
picture is a table, table name SatisBasligi.
picture is a output.
How can I write a code for picture?
One needs to write code according to the second.
select sbb.musteriId, count(1) from SatisBaslik sbb group by
sbb.musteriId having count(1) = (select x.adet from (select count(1)
adet from SatisBaslik sb group by sb.musteriId having
count(1)
"(select x.adet from (select count(1) adet from SatisBaslik sb group
by sb.musteriId having count(1)
Try this query:
select musteriId, count(musteriId) from SatisBasligi where musteriId in (7, 8, 9);
Use Group By and Having clause
select musteriId, count(musteriId)
from SatisBasligi
Group by musteriId
Having count(musteriId) > 1
For mysql:
select musteriId, count(*)
from SatisBasligi
group by musteriId
having count(musteriId) > 1
In order to get the greatest two counts, use ORDER BY with TOP and include WITH TIES in order to get musteriIds with the same count.
select top(2) with ties
musteriId,
count(*)
from SatisBasligi
group by musteriId
order by count(*) desc;

SQL - select only results having multiple entries

this seems simple but I cannot figure out how to do it or the proper description to correcltly google it :(
Briefly, have a table with:
PatientID | Date | Feature_of_Interest...
I'm wanting to plot some results for patients with multiple visits, when they have the feature of interest. No problem filtering out by feature of interest, but then I only want my resulting query to contain patients who have multiple entries.
SELECT PatientID,Date,...
FROM myTable
WHERE Feature_Of_Interest is present
AND (Filter out PatientID's that only appear once)
So - just not sure how to approach this. I tried doing:
WITH X AS (Above SELECT, Count(*),...,Group by PatientID)
Then re-running query, but it did not work. I can post that all out if needed, but am getting the impression I am approaching this completely backward, so will defer for now.
Using SQL Server 2008.
Try this:
WITH qry AS
(
SELECT a.*,
COUNT(1) OVER(PARTITION BY PatientID) cnt
FROM myTable a
WHERE Feature_Of_Interest = 'present '
)
SELECT *
FROM qry
WHERE cnt >1
You'll want to join a subquery
JOIN (
SELECT
PatientID
FROM myTable
WHERE Feature_Of_Interest is present
GROUP BY PatientID
HAVING COUNT(*) > 1
) s ON myTable.PatientID = s.PatientID
You could start with a counting query for visits:
SELECT PatientID, COUNT(*) as numvisits FROM myTable
GROUP BY PatientID HAVING(numvisits > 1);
Then you can base further queries off this one by joining.
Quick answer as I head off to bed, so its untested code but, in short, you can use a sub query..
SELECT PatientID,Date,...
FROM myTable
WHERE Feature_Of_Interest is present
AND patientid in (select PatientID, count(patientid) as counter
FROM myTable
WHERE Feature_Of_Interest is present group by patientid having counter>1)
Im surprised your attempt didnt work, it sounds a little like it should have, except you didnt say having count > 1 hence it probably just returned them all.
You should be able to get what you need by using a window function similar to this:
WITH ctePatient AS (
SELECT PatientID, Date, SUM(1) OVER (PARTITION BY PatientID) Cnt
FROM tblPatient
WHERE Feature_Of_Interest = 1
)
SELECT *
FROM ctePatient
WHERE Cnt > 1