How to select distinct with ID on the result? - sql

How to select distinct from table including ID column on the result?
Like for example: (this is error query)
SELECT ID,City,Street from (SELECT distinct City, Street from Location)
The table Location
CREATE TABLE Location(
ID int identity not null,
City varchar(max) not null,
Street varchar(max) not null
)
Then it will show the column ID, distinct column City, distinct column Street
Is there a possible query to have this result?

If you want for instance the lowest id for the unique data you desire you can do
select min(id), City, Street
from Location
group by City, Street
Generally you have to tell the DB what id to take using an aggregate function like min() or max()

Related

How to group data which have same Id number in SQL

Query a list of CITY names from STATION for cities that have an even ID number. Print the results in any order, but exclude duplicates from the answer.
The STATION table is described as follows:
CREATE TABLE STATION
(
Id int,
CITY varchar(50),
STATE varchar(50),
LAT_N int,
LONG_W int
)
I have this SQL which throws an error when I run:
SELECT CITY, STATE
FROM STATION
WHERE ID % 2 = 0
GROUP BY ID, CITY, STATE
There is no need to group by ID
select CITY,STATE from STATION where (ID % 2)=0 group by CITY,STATE
But you don't need a grouping function, DISTINCT will do also the trick
select DISTINCT CITY, STATE from STATION where (ID % 2)=0

Match records together in SQL Server by multiple groupings

I have a scenario that I need to "match" records based on multiple attributes of a person. For instance, if a FirstName and LastName match, or a NickName and LastName match, those two scenarios should be grouped into one larger match. Here's the example data in SQLFiddle:
http://www.sqlfiddle.com/#!18/0ca91/7
I'm generating a match key from the record attributes. The result gives me two different match keys and three total records. I need a result that has only one match key generated, and eventually I'm going to group all three records into one golden record in a separate step. I cannot figure out a way to logically group these records together either by "group by" or by using DENSE_RANK to generate my match key. Any help would be greatly appreciated! Thanks!
CREATE TABLE Persons (
ID int,
FirstName varchar(255),
LastName varchar(255),
NickName varchar(255)
);
INSERT INTO Persons
SELECT 1 AS ID, 'NIKKI' AS FNAME, 'MADISON' AS LNAME, 'Nikki' AS NickName
UNION ALL
SELECT 2 AS ID, 'NICOLE' AS FNAME, 'MADISON' AS LNAME, 'NICOLE' AS NickName
UNION ALL
SELECT 3 AS ID, 'NICOLE' AS FNAME, 'MADISON' AS LNAME,'Nikki' AS NickName
SELECT
*
, DENSE_RANK() OVER (ORDER BY TRIM(LastName), TRIM(FirstName)) AS GroupKey
FROM Persons
Desired Result:
GroupKey
1
1
1

How to hide distinct column in sql selection

I am doing a query in sql to find rows with distinct value of name as below:
select distinct name, age, sex from person
it works but I don't want to show the name column in the result set. Is there a way to hide this column?
EDIT1
the reason I put distinct name there is to avoid multiple rows with the same name returned. My table has person with the same name but different age and sex. So I want to make the result distinct in name but don't show the name.
You could try something like this.
select age, sex from (
select distinct name, age, sex from person);
I'm presuming you might have people with the same age and sex but a different name.
Otherwise just remove the name
Here is my solution (sql server 2016):
create table person (age varchar(20), [name] varchar(20), gender varchar(20))
go
insert into person values ('20', 'joe', 'm')
insert into person values ('19', 'tom', 'm')
insert into person values ('25', 'sally', 'f')
insert into person values ('28', 'Tammy', 'f')
go
select age, gender from (select distinct name, age, gender from person) t
You have to use your query as a sub query here.
From your query again select age and sex alone.
select age, sex from (select distinct name, age, sex from person) As x

Listing duplicated records using T SQL

I have a database that is used to record patient information for a small clinic. We use MS SQL Server 2008 as the backend. The patient table contains the following columns:
Id int identity(1,1),
FamilyName varchar(30),
FirstName varchar (20),
DOB datetime,
AddressLine1 varchar (50),
AddressLine2 varchar (50),
State varchar (20),
Postcode varchar (4),
NextOfKin varchar (20),
Homephone varchar (20),
Mobile varchar (20)
Occasionally the staff register a new patient, unaware that the patient already has a record in the system. We end up with several thousands duplicated records.
What I would like to do is to present a list of patients who have duplicated records for the staff to merge during quiet time. We consider 2 records to be duplicated if the 2 records have exactly the same FamilyName, FirstName and DOB. What I am doing at the moment is to use a sub query to return the records as follow:
SELECT FamilyName,
FirstName,
DOB,
AddressLine1,
AddressLine2,
State,
Postcode,
NextOfKin,
HomePhone,
Mobile
FROM
Patients AS p1
WHERE Id IN
(
SELECT Max(Id)
FROM Patients AS p2,
COUNT(id) AS NumberOfDuplicate
GROUP BY
FamilyName,
FirstName,
DOB HAVING COUNT(Id) > 1
)
This produces the result but the performance is terrible. Is there any better way to do it? The only requirements is I need to show all the fields in the Patients table as the user of the system wants to view all the details before making the decision whether to merge the records or not.
This will output every row which has a duplicate, based on firstname and lastname
SELECT DISTINCT t1.*
FROM Table AS t1
INNER JOIN Table AS t2
ON t1.firstname = t2.firstname
AND t1.lastname = t2.lastname
AND t1.id <> t2.id
I suggest you build an index on the 3 fields you use to detect duplicates,
then try this query:
with Duplicates as
(
select FamilyName, FirstName, DOB
from Patients
group by FamilyName, FirstName, DOB
having count(*) > 1
)
Select Patients.*
from Patients
inner join Duplicates
on Patients.FamilyName = Duplicates.FamilyName
And Patients.FirstName= Duplicates.FirstName
and Patients.DOB= Duplicates.DOB
WITH CTE
AS
(
SELECT Id, FamilyName, FirstName ,DOB
ROW_NUMBER() OVER(PARTITION BY FamilyName, FirstName ,DOB ORDER BY Id) AS DuplicateCount
FROM PatientTable
)
select * from CTE where DuplicateCount > 1
If I were in your shoes, I'd do following:
add indexes to FamilyName, FirstName and DOB
create view for your subquery
modified the query as following
Select p.* FROM Patients p INNER JOIN view_name v ON v.FirstName=p.Firstname AND ...
select FamilyName, FirstName, DOB
from Patients
group by FamilyName, FirstName, DOB
having count(*)>1
Will show all duplicates.
However, please consider names being written different, but similar. You might want to look for the topics 'data deduplication' and/or 'record linkage'. I solved the problem using a string similarity algorithm (modified Jaro/Winkler and levenshtein).

how to remove a multiple records for same zipcode keeping atleast one record for that zipcode in database table

how to remove a multiple records for same zipcode keeping atleast one record for that zipcode in database table
id zipcode
1 38000
2 38000
3 38000
4 38005
5 38005
i want table with two column with id and zipcode ...
my final will be following
id zipcode
1 38000
4 38005
How about
delete from myTable
where id not in (
select Min( id )
from myTable
group by zipcode )
That lets you keep your lowest IDs, which is what you seemed to want.
To just select that resultset you would use a DISTINCT statement:
SELECT id, zipcode
FROM table
WHERE zipcode IN (SELECT DISTINCT zipcode FROM table)
To delete the other records and keep only one you usea subquery like so:
DELETE FROM table
WHERE id NOT IN
(SELECT id
FROM table
WHERE zipcode IN (SELECT DISTINCT zipcode FROM table)
)
You can also accomplish this using a join if you perfer.
with cte as (
select row_number() over (partitioned by zipcode order by id desc) as rn
from table)
delete from cte
where rn > 1;
This has the advantage of correctly handling duplicates and offers tight control over what gets deleted and what gets kept.
Create temporary table with desired result:
select min(id), zipcode
into tmp_sometable
from sometable
group by zipcode
Remove the original table:
drop table sometable
Rename temporary table:
sp_rename 'tmp_sometable', 'sometable';
or something like:
delete from sometable
where id not in
(
select min(id)
from sometable
group by zipcode
)
delete from table where id not in (select min(id) from table zipcode in(select distinct zipcode from table));
select distinct zipcode from table - would give the distinct zipcode in the table
select min(id) from table zipcode in(select distinct zipcode from table) - would give the record with the min ID for each zip code
delete from table where id not in (select min(id) from table zipcode in(select distinct zipcode from table)) - this would delete all the records in the table that are not there as a result of query 2
There's an easier way if you want the lowest ID number. I just tested this:
SELECT
min(ID),
zipcode
FROM #zip
GROUP BY zipcode