SQL Find/Count similar duplicates

SQL Find/Count similar duplicates - sql

I am trying to to do a count on all records that are similar.
Eg.
I have 2 records 1 is 1A Smith St and the other is 1 Smith St.
So this data needs to be returned/counted as 1.
Any assistance would be gratefully appreciated.
Thanks

Counting those where the (street, number) combination isn't unique.
SELECT COUNT(*) AS Total
FROM
(
SELECT Street, Number
FROM your_table
GROUP BY Street, Number
HAVING COUNT(*) > 1
) q

use
SELECT COUNT(1) AS NUMBER, Suffix, Street FROM TableName GROUP BY Suffix, Street

Related

How to get values of one column without the aggregate column?

I have this table:
first_name
last_name
age
country
John
Doe
31
USA
Robert
Luna
22
USA
David
Robinson
22
UK
John
Reinhardt
25
UK
Betty
Doe
28
UAE
How can I get only the names of the oldest per country?
When I do this query
SELECT first_name,last_name, MAX(age)
FROM Customers
GROUP BY country
I get this result:
first_name
last_name
MAX(age)
Betty
Doe
31
John
Reinhardt
22
John
Doe
31
But I want to get only first name and last name without the aggregate function.

If window functions are an option, you can use ROW_NUMBER for this task.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY country ORDER BY age DESC) AS rn
FROM tab
)
SELECT first_name, last_name, age, country
FROM cte
WHERE rn = 1
Check the demo here.

It sounds like you want to get the oldest age per country first,
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
With that, you want to match that back to the original table (aka a join) to see which names they match up to.
So, something like this perhaps:
SELECT Customers.*
FROM Customers
INNER JOIN
(
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
) AS max_per_country_query
ON Customers.Country = max_per_country_query.Country
AND Customers.Age = max_per_country_query.MAX_AGE_IN_COUNTRY
If your database supports it, I prefer using the CTE style of handling these subqueries because it's easier to read and debug.
WITH cte_max_per_country AS (
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
)
SELECT Customers.*
FROM Customers C
INNER JOIN cte_max_per_country
ON C.Country = cte_max_per_country.Country
AND C.Age = cte_max_per_country.MAX_AGE_IN_COUNTRY

SQL Server : finding duplicates based on first few characters on column

I want to find duplicates based on the first three characters of the surname, is there a way a to do that on SQL? I can compare the whole name, but how to do we compare the first few characters?
Below are my tables
custid forename surname dateofbirth
----------------------------------------
1 David John 16-09-1985
2 David Jon 16-09-1985
3 Sarah Smith 10-08-2015
4 Peter Proca 11-06-2011
5 Peter Proka 11-06-2011
This is my query that I am currently running to compare
SELECT
y.id, y.forename, y.surname
FROM
customers y
INNER JOIN
(SELECT
forename, surname, COUNT(*) AS CountOf
FROM customers
GROUP BY forename, surname
HAVING COUNT(*) > 1) dt ON y.forename = dt.forename

You can use left():
select c.*
from (select c.*, count(*) over (partition by left(surname, 3)) as cnt
from customers c
) c
order by surname;
You can include the forename as well in the partition by if you mean forename and first three letters of surname.

You can use exists as follows:
select t.* from t
Where exists
(select 1 from t tt
Where left(t.surname, 3) = left(tt.surname, 3) and t.custid <> tt.custid
)
order by t.surname;

Distinct on specific columns in SQL

I know someone on here already asked the similar questions. However, most of them still want to return the first row or last row if multiple rows have the same attributes. For my case, I want to simply discard the rows which have the same specific attributes.
For example, I have a toy dataset like this:
gender age name
f 20 zoe
f 20 natalia
m 39 tom
f 20 erika
m 37 eric
m 37 shane
f 22 jenn
I only want to distinct on gender and age, then discard all rows if those two attributes, which returns:
gender age name
m 39 tom
f 22 jenn

You could use the window (analytic) variant of count to find the rows that have a just one occurance of the gender/age combination:
SELECT gender, age, name
FROM (SELECT gender, age, name, COUNT(*) OVER (PARTITION BY gender, age) AS cnt
FROM mytable) t
WHERE cnt = 1

Use the HAVING clause in a CTE.
;WITH DistinctGenderAges AS
(
SELECT gender
,age
FROM YourTable
GROUP BY gender
,age
HAVING COUNT(*) = 1
)
SELECT yt.gender, yt.age, yt.name
FROM DistinctGenderAges dga
INNER JOIN YourTable yt ON dga.gender = yt.gender AND dga.age = yt.age

No matter what, you have to tell the database which value to pick for name. If you don't care an easy solution is to group:
SELECT gender, age, MIN(name) as name FROM mytable GROUP BY gender, age HAVING COUNT(*)=1
You can use any valid aggregate for name, but you have to pick something.

Get maximum number with combination of columns

I have table with columns like AutoID, Number, Name, City, State, Country.
What I wanted is the maximum number entered in the "Number" column with the combination of Name, City, State and Country.
Example:
Name City State Country
Smith NY NY USA
John NY NY USA
John NJ NY USA
Now smith should get "Number" 1, John 2, and again John(in NJ) 1 as he is the first from NJ.
I can simply put a where clause in query and get the max number + 1. But the problem is that when I have huge amount of data and the number of users increases, my query will be really slow. I am also inserting data in the same table so it will keep on piling.
I hope I have made my self clear.
Vipul Parekh

I am guessing that this is what you want:
Name City State Country Number
Smith NY NY USA 1
John NY NY USA 2
John NJ NY USA 1
This is provided by row_number():
select name, city,state, country,
row_number() over (partition by city, state, country order by (select null)) as Number
from table t;
Note that the sequencing of the Number within a group is arbitrary, because you don't provide an id or createdat column. There is no guarantee that it is in the same order as the table, because SQL tables are inherently unordered.

You can create a trigger on your table. whenever you insert a row it will count the number of rows and update the latest one with a number with the count + 1.
To get around slowness you can create a index on Name, City, State and Country.
Let me know if you need a sample code or some pointers.

You can Use ROW_NUMBER
SELECT * FROM
(
Select Name,
City,
State,
Country,
Row_Number() over (Partition by City, State, Country ORDER BY Name) AS Number
From table1
) AS T
WORKING FIDDLE

1)
WITH Optimize_CTE (id, cityid,stateid, countryid)
AS
-- Define the CTE query.
(
SELECT id, cityid,stateid, countryid
FROM Optimize
WHERE id IS NOT NULL
)
-- Define the outer query referencing the CTE name.
SELECT id,
convert(Varchar(255),cityid) as CityId,
convert(Varchar(255),stateid) as StateId,
convert(Varchar(255),countryid) as CountryId
FROM Optimize_CTE OPTION (MAXRECURSION 1)

DECLARE #RowsPerPage INT = 10, #PageNumber INT = 1
SELECT InvoiceID, CustomerID
FROM dbo.SalesInvoice
ORDER BY InvoiceID
OFFSET (#PageNumber-1)*#RowsPerPage ROWS
FETCH NEXT #RowsPerPage ROWS ONLY
GO

SQL find repeated values

I need to identify rows where a certain value is repeated. Here is a sample table:
COUNTRY CITY
Italy Milan
Englad London
USA New York
Canada London
USA Atlanta
The query should return...
COUNTRY CITY
Englad London
Canada London
...because London is repeated. Thank you in advance for your help.

The easiest way is to use a subquery that counts the number of times each city appears (and filter to those values that appear more than once):
SELECT * FROM Cities
WHERE City in
(
SELECT City FROM Cities
GROUP BY City
HAVING COUNT(*) > 1
)

If your DBMS supports windowed aggregates.
SELECT COUNTRY,
CITY
FROM (SELECT COUNTRY,
CITY,
COUNT(*) OVER (PARTITION BY CITY) AS Cnt
FROM Cities) T
WHERE Cnt > 1
SQL Fiddle

select country, city
from aTable
where city in
(
select city
from aTable
group by city
HAVING count(1) > 1
)
Try it here: http://sqlfiddle.com/#!3/e9b1a/1
Or if the same city & country combo appears twice and you're only interested where the countries are different:
select distinct country, city
from aTable
where city in
(
select city
from aTable
group by city
HAVING count(distinct country) > 1
)
Try it here: http://sqlfiddle.com/#!3/2dfaa/2

This one works. Got it from my wife (she finally had time to look into this). Thought you might be interested.
SELECT * FROM Cities
WHERE City in ( select city
from (SELECT City,
count(distinct country)
FROM Cities
GROUP BY City
HAVING count(distinct country) > 1) a )

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Find/Count similar duplicates - sql

I am trying to to do a count on all records that are similar. Eg. I have 2 records 1 is 1A Smith St and the other is 1 Smith St. So this data needs to be returned/counted as 1. Any assistance would be gratefully appreciated. Thanks

Counting those where the (street, number) combination isn't unique. SELECT COUNT() AS Total FROM ( SELECT Street, Number FROM your_table GROUP BY Street, Number HAVING COUNT() > 1 ) q

use SELECT COUNT(1) AS NUMBER, Suffix, Street FROM TableName GROUP BY Suffix, Street

Related

How to get values of one column without the aggregate column?

SQL Server : finding duplicates based on first few characters on column

Distinct on specific columns in SQL

Get maximum number with combination of columns

SQL find repeated values

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Find/Count similar duplicates - sql

I am trying to to do a count on all records that are similar. Eg. I have 2 records 1 is 1A Smith St and the other is 1 Smith St. So this data needs to be returned/counted as 1. Any assistance would be gratefully appreciated. Thanks

Counting those where the (street, number) combination isn't unique. SELECT COUNT(*) AS Total FROM ( SELECT Street, Number FROM your_table GROUP BY Street, Number HAVING COUNT(*) > 1 ) q

use SELECT COUNT(1) AS NUMBER, Suffix, Street FROM TableName GROUP BY Suffix, Street

Related

How to get values of one column without the aggregate column?

SQL Server : finding duplicates based on first few characters on column

Distinct on specific columns in SQL

Get maximum number with combination of columns

SQL find repeated values

Categories

Resources

Counting those where the (street, number) combination isn't unique. SELECT COUNT() AS Total FROM ( SELECT Street, Number FROM your_table GROUP BY Street, Number HAVING COUNT() > 1 ) q