SQL - SELECTING multiple columns with duplicate information and some with unique

SQL - SELECTING multiple columns with duplicate information and some with unique - sql

I need to produce a query that will pull all the records with:
Same First_Name
Same Last_Name
Same DOB
Same client_ID (Client_ID is given "1011")
Different Member_ID
Note: I have huge database with multimillion records, and as soon as I provide more than one subquery it takes hours to provide even first sample of data. (maybe my subqueries were incorrect though)
I've tried building this query step-by-step, but still it fails to filter the way I need.
Select
ta.Member_ID,
ta.First_Name,
ta.LAST_NAME,
ta.date_of_birth,
ta.client_id,
From TestTable ta
WHERE client_id = '1011'
AND
((SELECT COUNT(*)
FROM TestTable ta2
WHERE ta.date_of_birth=ta2.date_of_birth
AND ta.FIRST_NAME=ta2.FIRST_NAME
AND ta.LAST_NAME=ta2.LAST_NAME)>1
I'm not even got to the point of selecting different Member_ID, and still this query pulls records that not necesary follow those parameters.
Please help.
Here is sample data, highlighted is the pair that I want to be able to get:
My Sample Table

Just use window functions:
SELECT ta.Member_ID, ta.First_Name, ta.LAST_NAME, ta.date_of_birth,
ta.client_id
FROM (SELECT ta.*,
COUNT(*) OVER (PARTITION BY FIRST_NAME, LAST_NAME, date_of_birth) as cnt
FROM TestTable ta
) ta
WHERE client_id = '1011' AND cnt > 1;

As a general note,don't use correlated sub queries unless you absolutely have to. Performance takes a severe hit as the subquery is run for every row of the outer query. A simple join should work:
Select
ta.Member_ID,
ta.First_Name,
ta.LAST_NAME,
ta.date_of_birth,
ta.client_id
From TestTable ta JOIN TestTable ta2
WHERE ta.client_id = '1011' AND ta.Member_ID <> ta2.Member_ID
ON ta.date_of_birth=ta2.date_of_birth
AND ta.FIRST_NAME=ta2.FIRST_NAME
AND ta.LAST_NAME=ta2.LAST_NAME
AND ta.client_id=ta2.client_id

If your only intention is to find records with same details but diff Member ID use the basic group by to filter data. This is not as costly as joining two tables
Select
ta.First_Name,
ta.LAST_NAME,
ta.date_of_birth,
ta.client_id
From TestTable ta
group by
ta.First_Name,
ta.LAST_NAME,
ta.date_of_birth,
ta.client_id
having count(distinct Member_ID) > 1

Related

Count() how many times a name shows up in a table with the rest of info

I have read in various websites about the count() function but I still cannot make this work.
I made a small table with (id, name, last name, age) and I need to retrieve all columns plus a new one. In this new column I want to display how many times a name shows up or repeats itself in the table.
I have made test and can retrieve but only COLUMN NAME with the count column, but I haven't been able to retrieve all data from the table.
Currently I have this
select a.n_showsup, p.*
from [test1].[dbo].[person] p,
(select count(*) n_showsup
from [test1].[dbo].[person])a
This gives me all data on output but on the column n_showsup it gives me just the number of rows, now I know this is because I'm missing a GROUP BY but then when I write group by NAME it shows me a lot of records. This is an example of what I need:

You can use window functions, if you RDBMS supports them:
select t.*, count(*) over(partition by name) n_showsup
from mytable t
Alternatively, you can join the table with an aggregation query that counts the number of occurences of each name:
select t.*, x.n_showsup
from mytable t
inner join (select name, count(*) n_showsup from mytable group by name) x
on x.name = t.name

While the window function approach (#GMB's answer) is the right way to go, thinking through this from a subquery approach (like you were headed towards) would look something like:
select p.*, a.n_showsup
from [test1].[dbo].[person] p
INNER JOIN (
select name, count(*) n_showsup
from [test1].[dbo].[person]
GROUP BY name
) a ON p.name = a.name
This is VERY close to what you had, the difference is that we are grouping that subquery by name (so we get a count by name) and we can use that in the join criteria which we do with the ON clause on that INNER JOIN.
You should really never ever use a comma in your FROM clause. Instead use a JOIN.

SQL: find most common values for specific members in column 1

I have the following SQL related question:
Let us assume I have the following simple data table:
I would like to identify the most common street address and place it in column 3:
I think this should be fairly straight-forward using COUNT? Not quite sure how to go about it though. Any help is greatly appreciated
Regards

This is a very long method that I just wrote. It only lists the most frequent address. You have to get these values and insert them into the table. See if it works for you:
select * from
(select d.company, count(d.address) as final, c.maxcount,d.address
from dbo.test d inner join
(select a.company,max(a.add_count) as maxcount from
(select company,address,count(address) as add_count from dbo.test group by company,address)a
group by a.company) c
on (d.company = c.company)
group by d.company,c.maxcount,d.address)e
where e.maxcount=e.final

Here is a query in standard SQL. It first counts records per company and address, then ranks them per company giving the most often occurring address rank #1. Then it only keeps those best ranked address records, joins with the table again and shows the results.
select
mytable.company,
mytable.address,
ranked.address as most_common_address
from mytable
join
(
select
company,
address,
row_number() over (partition by company oder by cnt desc) as rn
from
(
select
company,
address,
count(*) over (partition by company, address) as cnt
from mytable
) counted
) ranked on ranked.rn = 1
and ranked.company = mytable.company
and ranked.address = mytable.address;

This select statement will give you the most frequent occurrence. Let us call this A.
SELECT `value`,
COUNT(`value`) AS `value_occurrence`
FROM `my_table`
GROUP BY `value`
ORDER BY `value_occurrence` DESC
LIMIT 1;
To INSERT this into your table,
INSERT INTO db (col1, col2, col3) VALUES (val1, val2, A)
Note that you want that whole select statment for A!

You don't mention your DBMS. Here is a solution for Oracle.
select
company,
address,
(
select stats_mode(address)
from mytable this_company_only
where this_company_only.company = mytable.company
) as most_common_address
from mytable;
This looks a bit clumsy, because STATS_MODE is only available as an aggregate function, not as an analytic window function.

Get the first instance of a row using MS Access

EDITED:
I have this query wherein I want to SELECT the first instance of a record from the table petTable.
SELECT id,
pet_ID,
FIRST(petName),
First(Description)
FROM petTable
GROUP BY pet_ID;
The problem is I have huge number of records and this query is too slow. I discovered that GROUP BY slows down the query. Do you have any idea that could make this query faster? or better, a query wherein I don't need to use GROUP BY?

"The problem is I have huge number of records and this query is too slow. I discovered that GROUP BY slows down the query. Do you have any idea that could make this query faster?"
And an index on pet_ID, then create and test this query:
SELECT pet_ID, Min(id) AS MinOfid
FROM petTable
GROUP BY pet_ID;
Once you have that query working, you can join it back to the original table --- then it will select only the original rows which match based on id and you can retrieve the other fields you want from those matching rows.
SELECT pt.id, pt.pet_ID, pt.petName, pt.Description
FROM
petTable AS pt
INNER JOIN
(
SELECT pet_ID, Min(id) AS MinOfid
FROM petTable
GROUP BY pet_ID
) AS sub
ON pt.id = sub.MinOfid;

Your Query could change as,
SELECT ID, pet_ID, petName, Description
FROM petTable
WHERE ID IN
(SELECT Min(ID) As MinID FROM petTable GROUP BY pet_ID);
Or use the TOP clause,
SELECT petTable.petID, petTable.petName, petTable.[description]
FROM petTable
WHERE petTable.ID IN
(SELECT TOP 1 ID
FROM petTable AS tmpTbl
WHERE tmpTbl.petID = petTable.petID
ORDER BY tmpTbl.petID DESC)
ORDER BY petTable.petID, petTable.petName, petTable.[description];

Select entry of each group having exactly 1 entry

I am looking for an optimized query
let me show you a small example.
Lets suppose I have a table having three field studentId, teacherId and subject as
Now I want those data in which a physics teacher is teaching to only one student, i.e
teacher 300 is only teaching student 3 and so on.
What I have tried till now
select sid,tid from tabletesting with(nolock)
where tid in (select tid from tabletesting with(nolock)
where subject='physics' group by tid having count(tid) = 1)
and subject='physics'
The above query is working fine. But I want different solution in which I don't have to scan the same table twice.
I also tried using Rank() and Row_Number() but no result.
FYI :
I have showed you an example, this is not the actual table i am playing with, my table contain huge number of rows and columns and where clause is also very complex(i.e date comparison etc.), so I don't want to give the same where clause in subquery and outquery.

You can do this with window functions. Assuming that there are no duplicate students for a given teacher (as in your sample data):
select tt.sid, tt.tid
from (select tt.*, count(*) over (partition by teacher) as scnt
from TableTesting tt
) tt
where scnt = 1;
Another way to approach this, which might be more efficient, is to use an exists clause:
select tt.sid, tt.tid
from TableTesting tt
where not exists (select 1 from TableTesting tt1 where tt1.tid = tt.tid and tt1.sid <> tt.sid)

Another option is to use an analytic function:
select sid, tid, subject from
(
select sid, tid, subject, count(sid) over (partition by subject, tid) cnt
from tabletesting
) X
where cnt = 1

SQL - select only results having multiple entries

this seems simple but I cannot figure out how to do it or the proper description to correcltly google it :(
Briefly, have a table with:
PatientID | Date | Feature_of_Interest...
I'm wanting to plot some results for patients with multiple visits, when they have the feature of interest. No problem filtering out by feature of interest, but then I only want my resulting query to contain patients who have multiple entries.
SELECT PatientID,Date,...
FROM myTable
WHERE Feature_Of_Interest is present
AND (Filter out PatientID's that only appear once)
So - just not sure how to approach this. I tried doing:
WITH X AS (Above SELECT, Count(*),...,Group by PatientID)
Then re-running query, but it did not work. I can post that all out if needed, but am getting the impression I am approaching this completely backward, so will defer for now.
Using SQL Server 2008.

Try this:
WITH qry AS
(
SELECT a.*,
COUNT(1) OVER(PARTITION BY PatientID) cnt
FROM myTable a
WHERE Feature_Of_Interest = 'present '
)
SELECT *
FROM qry
WHERE cnt >1

You'll want to join a subquery
JOIN (
SELECT
PatientID
FROM myTable
WHERE Feature_Of_Interest is present
GROUP BY PatientID
HAVING COUNT(*) > 1
) s ON myTable.PatientID = s.PatientID

You could start with a counting query for visits:
SELECT PatientID, COUNT(*) as numvisits FROM myTable
GROUP BY PatientID HAVING(numvisits > 1);
Then you can base further queries off this one by joining.

Quick answer as I head off to bed, so its untested code but, in short, you can use a sub query..
SELECT PatientID,Date,...
FROM myTable
WHERE Feature_Of_Interest is present
AND patientid in (select PatientID, count(patientid) as counter
FROM myTable
WHERE Feature_Of_Interest is present group by patientid having counter>1)
Im surprised your attempt didnt work, it sounds a little like it should have, except you didnt say having count > 1 hence it probably just returned them all.

You should be able to get what you need by using a window function similar to this:
WITH ctePatient AS (
SELECT PatientID, Date, SUM(1) OVER (PARTITION BY PatientID) Cnt
FROM tblPatient
WHERE Feature_Of_Interest = 1
)
SELECT *
FROM ctePatient
WHERE Cnt > 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - SELECTING multiple columns with duplicate information and some with unique - sql

Just use window functions: SELECT ta.Member_ID, ta.First_Name, ta.LAST_NAME, ta.date_of_birth, ta.client_id FROM (SELECT ta., COUNT() OVER (PARTITION BY FIRST_NAME, LAST_NAME, date_of_birth) as cnt FROM TestTable ta ) ta WHERE client_id = '1011' AND cnt > 1;

Related

Count() how many times a name shows up in a table with the rest of info

SQL: find most common values for specific members in column 1

Get the first instance of a row using MS Access

Select entry of each group having exactly 1 entry

SQL - select only results having multiple entries

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - SELECTING multiple columns with duplicate information and some with unique - sql

Just use window functions: SELECT ta.Member_ID, ta.First_Name, ta.LAST_NAME, ta.date_of_birth, ta.client_id FROM (SELECT ta.*, COUNT(*) OVER (PARTITION BY FIRST_NAME, LAST_NAME, date_of_birth) as cnt FROM TestTable ta ) ta WHERE client_id = '1011' AND cnt > 1;

Related

Count() how many times a name shows up in a table with the rest of info

SQL: find most common values for specific members in column 1

Get the first instance of a row using MS Access

Select entry of each group having exactly 1 entry

SQL - select only results having multiple entries

Categories

Resources

Just use window functions: SELECT ta.Member_ID, ta.First_Name, ta.LAST_NAME, ta.date_of_birth, ta.client_id FROM (SELECT ta., COUNT() OVER (PARTITION BY FIRST_NAME, LAST_NAME, date_of_birth) as cnt FROM TestTable ta ) ta WHERE client_id = '1011' AND cnt > 1;