SQL Remove non duplicate entires in a table - sql

I have a table with two columns CountryCode CountryName. There are duplicate entries in countrycode. But I want to remove the non-duplicate entires and keep the rows which are duplicates in the countrycode column. So I am trying to write an SQL statement to do this. I think I have to use Having but not too sure how exactly to incorporate it. Thanks

That's a bit odd. I was expecting you to want to remove the duplicate entries, not the other way around. But something like this should work regardless of the database you are using:
delete from TableName
where CountryCode in (select CountryCode
from TableName
group by CountryCode
having count(*) = 1).
So to be clear, the subquery:
select CountryCode
from TableName
group by CountryCode
having count(*) = 1
... returns rows with unique CountryCodes. And then the delete statement:
delete from TableName
where CountryCode in (...)
... deletes those unique rows so that the only rows remaining in your table should be the ones with duplicates.
However, by your comments, it sounds like you just want a query that returns only the duplicates. If that's the case, then just use the subquery inside a select statement, but modify the having clause to return only duplicates:
select *
from TableName
where CountryCode in (select CountryCode
from TableName
group by CountryCode
having count(*) > 1)

This is a quick solution, it is probablly not the fastes with alot of entries, but it works.
SELECT * FROM [table] AS tbl
WHERE countrycode IN
(SELECT countrycode FROM [table] WHERE tbl.countryname <> countryname)
/* Words in uppercase are SQL Syntax */
naming the first table (tbl) you can use it in the nested query

Related

Is the GROUP BY needed to insert

INSERT INTO NEWTABLE
(Street,
Number,
NuDate,
XValue)
SELECT
a1.Street,
a2.Number,
a2.NuDate,
a2.XValue
FROM
ABC.dbo.Faculty a1 INNER JOIN
ABC.dbo.Faculty2 a2
ON a1.NameID = a2.NameID
WHERE
a1.Bologna = 'True'
GROUP BY
a1.Street,
a2.Number,
a2.NuDate,
a2.XValue
In this completely fictitious SQL statement, is the GROUP by needed to insert properly into NEWTABLE? and/or does the group by need to match up perfectly with the INSERT INTO for this statement to work properly?
EDIT: Sorry, I realized I had the wrong values for the GROUP BY statement, they're supposed to match the INSERT INTO
In this completely fictitious SQL statement, is the GROUP by needed to insert properly into NEWTABLE?
It's not necessary if you don't mind duplicates
If you don't want duplicate rows then yes, you'll need to use GROUP BY (or DISTINCT):
SELECT DISTINCT
a1.Street,
a2.Number,
a2.NuDate,
a2.XValue
does the group by need to match up perfectly with the INSERT INTO for this statement to work properly?
Yes, the selected columns must match.
There are cases when what you group by doesn't match what you select but that's when you aggregating:
SELECT
column1, -- no aggregation, must match
sum(column2) -- aggregation, so does not need to match
FROM a
GROUP BY column1

SQL - Find duplicate fields and count how many fields are matched

I have a a large customer database where customers have been added multiple times in some circumstances which is causing problems. I am able to use a query to identify the records which are an exact match, although some records have slight variations such as different addresses or given names.
I want to query across 10 fields, some records will match all 10 which is clearly a duplicate although other fields may only match 5 fields with another record and require further investigation. Therefore i want to create a results set which has field with a count how many fields have been matched. Basically to create a rating of the likely hood the result is an actual match. All 10 would be a clear dup but 5 would only be a possible duplicate.
Some will only match on POSTCODE and FIRSTNAME which is generally can be discounted.
Something like this helps but as it only returns records which explicitly match on all 3 records its not really useful due the sheer amount of data.
SELECT field1,field2,field3, count(*)
FROM table_name
GROUP BY field1,field2,field3
HAVING count(*) > 1
You are just missing the magic of CUBE(), which generates all the combinations of columns automatically
DECLARE #duplicate_column_threshold int = 5;
WITH cte AS (
SELECT
field1,field2,...,field10
,duplicate_column_count = (SELECT COUNT(col) FROM (VALUES (field1),(field2),...,(field10)) c(col))
FROM table_name
GROUP BY CUBE(field1,field2,...,field10)
HAVING COUNT(*) > 1
)
SELECT *
INTO #duplicated_rows
FROM cte
WHERE duplicate_column_count >= #duplicate_column_threshold
Update: to fetch the rows from the original table, join it against the #duplicated_rows using a technique that treats NULLs as wildcards when comparing the columns.
SELECT
a.*
,b.duplicate_column_count
FROM table_name a
INNER JOIN #duplicated_rows b
ON NULLIF(b.field1,a.field1) IS NULL
AND NULLIF(b.field2,a.field2) IS NULL
...
AND NULLIF(b.field10,a.field10) IS NULL
You might try something like
Select field1, field2, field3, ... , field10, count(1)
from customerdatabase
group by field1, field2, field3, ... , field10
order by field1, field2, field3, ... , field10
Where field1 through field10 are ordered by the "most identifiable/important" to least.
This is as close I've got to what i'm trying to achieve, which will return all records which have any duplicate fields. I want to add a column to the results which indicate how many fields have matched any other record in the table. There are around 40,000 records in total.
select * from [CUST].[dbo].[REPORTA] as a
where exists
(select [GIVEN.NAMES],[FAMILY.NAME],[DATE.OF.BIRTH],[POST.CODE],[STREET],[TOWN.COUNTRY]
from [CUST].[dbo].[REPORTA] as b
where a.[GIVEN.NAMES] = b.[GIVEN.NAMES]
or a.[FAMILY.NAME] = b.[FAMILY.NAME]
or a.[DATE.OF.BIRTH] = b.[DATE.OF.BIRTH]
or a.[POST.CODE] = b.[POST.CODE]
or a.[STREET] = b.[STREET]
or a.[TOWN.COUNTRY] = b.[TOWN.COUNTRY]
group by [GIVEN.NAMES],[FAMILY.NAME],[DATE.OF.BIRTH],[POST.CODE],[STREET],[TOWN.COUNTRY]
having count(*) >= 1)
This query will return thousands of records but I'm generally interested in the record with a high count of exactly matching fields

SQL query for removing non-unique row

I'm using postgreSQL 9.2.
Let I've the following table:
id name definition
serial varchar(128) text
1 name1 definition1
..........................................
I need to write a query that remove all rows with the same name such that every row will have unique name. If two rows have the same name, their definitions are also the same.
Use row_number() function on name and remove all rows that have row_number() > 1
Here is an example query: Deleting duplicates
DELETE FROM mytable dd
WHERE EXISTS (
SELECT *
FROM mytable ex
WHERE ex.name = dd.name
AND ex.id < dd.id
);
Why do you even let client applications to add rows when name duplicates in the first place?

Building a SELECT clause with dynamic number of columns

Is it possible to create a SELECT clause with a varying number of columns to be returned depending on joined tables?
For instance.
If I join a table depending on a value in the WHERE-clause I want to return either tbl1.col1, tbl1.col2 if tabl tbl1 is joined or tbl2.col4, tbl2.col5, tbl2.col8 if table tbl2 is joined.
Is this possible? How?
No, you can't write one query that sometimes returns n columns and another time m columns. What you can do is something like this: Use UNION ALL on two queries with conditions that either query 1 or query 2 Returns data. Have columns match, so where one query has no value let it select null in this place.
select tbl1.col1 as firstname, tbl1.col2 as lastname, null as street, tbl1.col3 as job as street from ...
where #variable = 1
UNION ALL
select tbl2.col4 as firstname, tbl2.col5 as lastname, tbl2.col8 as street, null as job from ...
where #variable = 2;
Or you just build your SQL dynamically with whatever language and use completely different SQL, which is what one would normally do.

How can I write a SQL statement to update a column in a table from another column in the same table?

I have an oracle DB where a program is writing into two columns when it updates the table. The 2nd column is based on the value from the 1st column. Well over time people have hand edited the database and forgot to insert values into the 2nd column. I'd like to write a simple sql statement that updates all columns and syncs the 2nd column to the 1st column. I know there's some simple statement to do this. Doing a little googling I thought something like the following:
UPDATE suppliers
SET supplier_name = (
SELECT customers.name
FROM customers
WHERE customers.customer_id = suppliers.supplier_id
)
WHERE EXISTS (
SELECT customers.name
FROM customers
WHERE customers.customer_id = suppliers.supplier_id
);
However, that is between 2 different tables where I would be doing it on the same table.
The following works in SQL Server (haven't checked Oracle yet).
UPDATE SUPPLIERS SET Supplier_Name = CustomerName
I'd give this a try and see if it works...
If both columns are in the same table, you can use the simplest query:
UPDATE your_table
SET column1 = column2
WHERE column1 != column2;
This supposes that both columns are NOT NULL. If the columns are nullable however, use decode instead:
UPDATE your_table
SET column1 = column2
WHERE decode(column1, column2, 1, 0) = 0;
update tableName set col2 = col1