SQL - select only results having multiple entries - sql

this seems simple but I cannot figure out how to do it or the proper description to correcltly google it :(
Briefly, have a table with:
PatientID | Date | Feature_of_Interest...
I'm wanting to plot some results for patients with multiple visits, when they have the feature of interest. No problem filtering out by feature of interest, but then I only want my resulting query to contain patients who have multiple entries.
SELECT PatientID,Date,...
FROM myTable
WHERE Feature_Of_Interest is present
AND (Filter out PatientID's that only appear once)
So - just not sure how to approach this. I tried doing:
WITH X AS (Above SELECT, Count(*),...,Group by PatientID)
Then re-running query, but it did not work. I can post that all out if needed, but am getting the impression I am approaching this completely backward, so will defer for now.
Using SQL Server 2008.

Try this:
WITH qry AS
(
SELECT a.*,
COUNT(1) OVER(PARTITION BY PatientID) cnt
FROM myTable a
WHERE Feature_Of_Interest = 'present '
)
SELECT *
FROM qry
WHERE cnt >1

You'll want to join a subquery
JOIN (
SELECT
PatientID
FROM myTable
WHERE Feature_Of_Interest is present
GROUP BY PatientID
HAVING COUNT(*) > 1
) s ON myTable.PatientID = s.PatientID

You could start with a counting query for visits:
SELECT PatientID, COUNT(*) as numvisits FROM myTable
GROUP BY PatientID HAVING(numvisits > 1);
Then you can base further queries off this one by joining.

Quick answer as I head off to bed, so its untested code but, in short, you can use a sub query..
SELECT PatientID,Date,...
FROM myTable
WHERE Feature_Of_Interest is present
AND patientid in (select PatientID, count(patientid) as counter
FROM myTable
WHERE Feature_Of_Interest is present group by patientid having counter>1)
Im surprised your attempt didnt work, it sounds a little like it should have, except you didnt say having count > 1 hence it probably just returned them all.

You should be able to get what you need by using a window function similar to this:
WITH ctePatient AS (
SELECT PatientID, Date, SUM(1) OVER (PARTITION BY PatientID) Cnt
FROM tblPatient
WHERE Feature_Of_Interest = 1
)
SELECT *
FROM ctePatient
WHERE Cnt > 1

Related

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

How to limit duplicated rows

I would like help regarding an SQL query.
Looking around the site, I found several code snippets to return duplicate rows.
Here is the one I went with:
select unumber, name, localid
from table1
where unumber
in (select unumber from table1 group by unumber having count (*) > 1 )
order by unumber
which works fine, however, in the table I have other columns as well, like timestamp etc.
As such, when I run the query I indeed get the duplicate rows, however, I get the duplicates several times due to different timestamps for example.
Is there any way to limit the results to 'unique' duplicate rows only?
Hope this makes sense!
Thank you in advance!
For what you describe, you can just use select distinct:
select distinct unumber, name, localid
from table1
where unumber in (select unumber from table1 group by unumber having count (*) > 1 )
order by unumber;
However, I would be more likely to write this using window functions:
select unumber, name, localid
from (select t1.*,
count(*) over (partition by unumber) as cnt,
row_number() over (partition by unumber, name, localid order by unumber) as seqnum
from table1 t1
) t1
where cnt > 1 and seqnum = 1;

SQL removing duplicates in a certain column

Good afternoon all!
Had a question regarding the removal of duplicates in one column and making it remove the whole row. I will provide an example in a screenshot in Excel as to not provide proprietary info.
I am looking to remove one of the rows highlighted in yellow for example but do not want to limit it to one Dr.Mike or one Health Partners clinic for example. Relatively new to this so any help would be greatly appreciated.
Thank you
You can do:
select distinct prov, clinic, address
from t;
I have used the below script to remove duplicates.
SELECT Prov, Clinic,Address, count(*)
FROM SomeTable
group by Prov, Clinic,Address
having count(*)>1
SELECT Prov, Clinic,Address, countrows = count(*)
INTO [holdkey]
FROM SomeTable
GROUP BY Prov, Clinic,Address
HAVING count(*) > 1
SELECT DISTINCT sometable.*
INTO holdingtable
FROM sometable, holdkey
WHERE sometable.Prov = holdkey.Prov
AND sometable.Clinic = holdkey.Clinic
AND sometable.Address = holdkey.Address
SELECT Prov, Clinic,Address, count(*)
FROM holdingtable
group by Prov, Clinic,Address
having count(*)>1
DELETE sometable
FROM sometable, holdkey
WHERE sometable.Prov = holdkey.Prov
AND sometable.Clinic = holdkey.Clinic
AND sometable.Address = holdkey.Address
INSERT sometable SELECT * FROM holdingtable

SQL: find most common values for specific members in column 1

I have the following SQL related question:
Let us assume I have the following simple data table:
I would like to identify the most common street address and place it in column 3:
I think this should be fairly straight-forward using COUNT? Not quite sure how to go about it though. Any help is greatly appreciated
Regards
This is a very long method that I just wrote. It only lists the most frequent address. You have to get these values and insert them into the table. See if it works for you:
select * from
(select d.company, count(d.address) as final, c.maxcount,d.address
from dbo.test d inner join
(select a.company,max(a.add_count) as maxcount from
(select company,address,count(address) as add_count from dbo.test group by company,address)a
group by a.company) c
on (d.company = c.company)
group by d.company,c.maxcount,d.address)e
where e.maxcount=e.final
Here is a query in standard SQL. It first counts records per company and address, then ranks them per company giving the most often occurring address rank #1. Then it only keeps those best ranked address records, joins with the table again and shows the results.
select
mytable.company,
mytable.address,
ranked.address as most_common_address
from mytable
join
(
select
company,
address,
row_number() over (partition by company oder by cnt desc) as rn
from
(
select
company,
address,
count(*) over (partition by company, address) as cnt
from mytable
) counted
) ranked on ranked.rn = 1
and ranked.company = mytable.company
and ranked.address = mytable.address;
This select statement will give you the most frequent occurrence. Let us call this A.
SELECT `value`,
COUNT(`value`) AS `value_occurrence`
FROM `my_table`
GROUP BY `value`
ORDER BY `value_occurrence` DESC
LIMIT 1;
To INSERT this into your table,
INSERT INTO db (col1, col2, col3) VALUES (val1, val2, A)
Note that you want that whole select statment for A!
You don't mention your DBMS. Here is a solution for Oracle.
select
company,
address,
(
select stats_mode(address)
from mytable this_company_only
where this_company_only.company = mytable.company
) as most_common_address
from mytable;
This looks a bit clumsy, because STATS_MODE is only available as an aggregate function, not as an analytic window function.

adding count( ) column on each row

I'm not sure if this is even a good question or not.
I have a complex query with lot's of unions that searches multiple tables for a certain keyword (user input). All tables in which there is searched are related to the table book.
There is paging on the resultset using LIMIT, so there's always a maximum of 10 results that get withdrawn.
I want an extra column in the resultset displaying the total amount of results found however. I do not want to do this using a separate query. Is it possible to add a count() column to the resultset that counts every result found?
the output would look like this:
ID Title Author Count(...)
1 book_1 auth_1 23
2 book_2 auth_2 23
4 book_4 auth_.. 23
...
Thanks!
This won't add the count to each row, but one way to get the total count without running a second query is to run your first query using the SQL_CALC_FOUND_ROWS option and then select FOUND_ROWS(). This is sometimes useful if you want to know how many total results there are so you can calculate the page count.
Example:
select SQL_CALC_FOUND_ROWS ID, Title, Author
from yourtable
limit 0, 10;
SELECT FOUND_ROWS();
From the manual:
http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#function_found-rows
The usual way of counting in a query is to group on the fields that are returned:
select ID, Title, Author, count(*) as Cnt
from ...
group by ID, Title, Author
order by Title
limit 1, 10
The Cnt column will contain the number of records in each group, i.e. for each title.
Regarding second query:
select tbl.id, tbl.title, tbl.author, x.cnt
from tbl
cross join (select count(*) as cnt from tbl) as x
If you will not join to other table(s):
select tbl.id, tbl.title, tbl.author, x.cnt
from tbl, (select count(*) as cnt from tbl) as x
My Solution:
SELECT COUNT(1) over(partition BY text) totalRecordNumber
FROM (SELECT 'a' text, id_consult_req
FROM consult_req cr);
If your problem is simply the speed/cost of doing a second (complex) query I would suggest you simply select the resultset into a hash-table and then count the rows from there while returning, or even more efficiently use the rowcount of the previous resultset, then you do not even have to recount
This will add the total count on each row:
select count(*) over (order by (select 1)) as Cnt,*
from yourtable
Here is your answare:
SELECT *, #cnt count_rows FROM (
SELECT *, (#cnt := #cnt + 1) row_number FROM your_table
CROSS JOIN (SELECT #cnt := 0 AS variable) t
) t;
You simply cannot do this, you'll have to use a second query.