Alter a column to indicate there are duplicates in a separate column? - sql

I have a table in which one of the columns, column_A has duplicate values. I also have a blank column_indicator which I would like to populate with 1s for all cases where the value in column_A occurs more than once.
I know how to SELECT the duplicates and have used the following formula:
SELECT [dbo].[myTable].[column_A], COUNT(*)
FROM [dbo].[myTable]
GROUP BY [dbo].[myTable].[column_A]
HAVING COUNT(*) > 1
How do I update column_indicator? I have tried:
UPDATE [dbo].[myTable]
SET [dbo].[myTable].[column_indicator] = 1
WHERE
GROUP BY [dbo].[myTable].[column_A]
HAVING COUNT(*) > 1
I know I am off base but cannot figure out how to proceed with this column update.

You can do a window count in a common table expression and then update it:
WITH cte AS (
SELECT
[column_indicator],
[column_A],
COUNT(*) OVER(PARTITION BY [column_A]) cnt
FROM [dbo].[myTable]
)
UPDATE cte SET [column_indicator] = 1 WHERE cnt > 1

I think you can also use a nested update query to solve this.
UPDATE [dbo].[myTable] SET [column_indicator] = 1
FROM
(
SELECT a.[column_A],a.[column_indicator], COUNT(*) AS COUNT
FROM (select [dbo].[myTable].[column_A],[dbo].[myTable].[column_indicator]
from [dbo].[myTable]) a
GROUP BY a.[column_A],a.[column_indicator]
having COUNT > 1
) t
WHERE [dbo].[myTable].[column_A] = t.[column_A];

Related

Delete results in subquery

I would like to delete the results of the subquery below. How do I do that?
delete
from mytable
where rowid in (select rowid, count(*) as count from mytable group by mygroup having count > 50)
Did not work:
[1] [SQLITE_ERROR] SQL error or missing database (sub-select returns 8
columns - expected 1)
If you check this page, you would notice that when using IN and a subquery:
A list of values is a fixed value list or a result set of one column returned by a subquery. The returned type of the expression and values in the list must be the same.
Your mistake here is that your subquery is returning more than 1 column (rowid and count).
To fix that you can try this:
delete
from mytable
where rowid in (select rowid
from (select rowid,
count(*) as count
from mytable
group by mygroup
having count > 50)
)
Note that, assuming that you want to delete all the results having mygroup count > 50, this query won't do the full job. It will just return one row of that group. You should try this:
select * FROM mytable
where mygroup in (select mygroup
from (select mygroup,
count(*) as count
from mytable
group by mygroup
having count > 1)
);
as stated in D-Shish's answer.
Here's a demo to see the two different result set from these two queries.
delete
from mytable
where rowid in (select rowid from (select rowid, count(*) as count from mytable group by mygroup having count > 50))
I guess you might want to do this when you use aggregate function you need to add non-aggregate column in group by
delete
from mytable
where mygroup in (select mygroup from mytable group by mygroup having count(*) > 50)

LIMIT equivalent for SQL Server 2012 in an UPDATE statement

I have the following table below and am trying to update the first available row with an user ID through a query, but I need to limit this to only update one row and not multiple.
ID Model UserID
1 X12T5 1
2 X13T5 2
3 X14T5 NULL
4 X15T5 NULL
The first available row would be where ID is 3. I would update it with the following query:
UPDATE Table SET UserID = '3' WHERE UserID IS NULL
But I want to make sure it affects only 1 row and not multiple that are available, LIMIT doesn't exist in SQL Server.
What would the best way to achieve this?
You can do this with UPDATE TOP. It's the equivalent of a SELECT TOP but for updates; and TOP is SQL Server's equivalent of MySQL's LIMIT.
See further info.
UPDATE Table SET UserID = '3'
WHERE UserID IS NULL
AND Id IN (SELECT top 1 ID FROM table where UserId IS NULL)
You can also use any of the following methods (Assuming column ID is of integer datatype)
Using subquery :
UPDATE a
SET a.UserID='3'
FROM YourTable a
WHERE ID =( SELECT MIN(ID)
FROM YourTable
WHERE UserID is NULL)
Using JOIN:
UPDATE a
SET a.UserID='3'
FROM YourTable a
JOIN
( SELECT MIN(ID) MinId
FROM YourTable
WHERE UserID is NULL) b
ON a.ID=b.MinId
Using CTE
WITH cte_a
AS
(SELECT MIN(ID) MinId
FROM YourTable
WHERE UserID is NULL)
UPDATE a
SET a.UserID='3'
FROM YourTable a
JOIN cte_a b a.ID=b.MinId
You can also use CTE & perform the SELECT\UPDATE\DELETE operations ,
; WITH CTE AS (
SELECT TOP 10 * FROM TABLE
)
UPDATE CTE
SET ...
DELETE\UPDATE\SELECT works with CTE ... Cool !!!

SQL: Delete all NOT MAX Records in GroupBy

My goal is to delete all records from my table that are NOT the MAX(recordDate) of a grouped CaseKey. So if I have 9 records with 3 sets of 3 casekeys, and each casekey has its 3 dates. I'd delete the 2 lower dates of each set and come up with 3 total records, only the MAX(recordDate) of each remaining.
I have the following SQL Query:
DELETE FROM table
WHERE tableID NOT IN (
SELECT tableID
FROM (
Select MAX(recordDate) As myDate, tableID From table
Group By CaseKey
) As foo
)
I receive the error:
Error on Line 3... Column 'table.tableID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Obviously I could add tableID to my Group By clause, but then the result of that statement is incorrect and returns all rows instead of just returning the MAX recordDate of the grouped CaseKeys.
Server is down right now, but the apparent answer is: (tiny tweak from WildPlasser's answer)
DELETE zt FROM ztable zt
WHERE EXISTS (
SELECT * FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
In other words, for each record in zt, run a query to see if the same record also has a record with a higher recordDate. If so, the WHERE EXISTS statement passes and the record is deleted, otherwise the WHERE statement fails and the record is its own MAX recordDate.
Thank you, WildPlasser, for that simplistic methodology that I was somehow blowing up.
There is one special property of MAX: there is no record with a higher value than max. So we can delete all the records for which a record with the same CaseKey, but with a higher recordDate exists:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
BTW: The above query (as well as the MAX() version) assumes that there is only one record with the maximum date. There could be ties.
In the case of ties, you'll need to add an extra field to the where clause; as a tie-breaker. Assuming that TableId can function as such, the query would become:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ( ex.recordDate > zt.recordDate
OR (ex.recordDate = zt.recordDate AND ex.TableId > zt.TableId)
)
);
Just express
delete all records from my table that are NOT the MAX(recordDate) of a
grouped CaseKey
in sql as
DELETE FROM table t1
WHERE t1.recordDate <>
(SELECT MAX(recordDate)
FROM table t2
WHERE t2.CaseKey = t1.CaseKey)
You can rank all records with the same caseKey where the rank > 1 to only return the lower dates. That way you can use your tableID.
DELETE FROM [table]
WHERE [tableID] IN
(SELECT
[sub].[tableID]
FROM
(
SELECT
[tableID],
Rank() OVER (PARTITION BY [caseKey] ORDER BY [recordDate] DESC, [tableID] DESC) AS [rank]
FROM [table]
) AS [sub]
WHERE [sub].[rank] > 1)

update rows with duplicate entries

I have the same situation as this other question, but I don't want to select the rows, I want to update these rows.
I used the solution Scott Saunders made:
select * from table where email in (
select email from table group by email having count(*) > 1
)
That worked, but I wanted to change/update a row-value in these entries, so I tried:
UPDATE `members` SET `banned` = "1" WHERE `ip` IN (
SELECT `ip` FROM `members` GROUP BY `ip` HAVING COUNT(*) > 1
)
but I get this error:
You can't specify target table
'members' for update in FROM
clause
Use an intermediate subquery to get around the 1093 error:
UPDATE `members`
SET `banned` = '1'
WHERE `ip` IN (SELECT x.ip
FROM (SELECT `ip`
FROM `members`
GROUP BY `ip`
HAVING COUNT(*) > 1) x)
Otherwise, use a JOIN on a derived table:
UPDATE MEMBERS
JOIN (SELECT `ip`
FROM `members`
GROUP BY `ip`
HAVING COUNT(*) > 1) x ON x.ip = MEMBERS.ip
SET banned = '1'
This error means you can't update the members table based on criteria of the members table. In your case, you are attempting to update the members table based on a subquery of the members table. In the process you are changing that table. Think of it like a chicken before the egg paradox.
You'll need to make a temporary reference table or save/paste the ip ranges in order to run that update statement.

How do I find duplicate values in a table in Oracle?

What's the simplest SQL statement that will return the duplicate values for a given column and the count of their occurrences in an Oracle database table?
For example: I have a JOBS table with the column JOB_NUMBER. How can I find out if I have any duplicate JOB_NUMBERs, and how many times they're duplicated?
Aggregate the column by COUNT, then use a HAVING clause to find values that appear more than once.
SELECT column_name, COUNT(column_name)
FROM table_name
GROUP BY column_name
HAVING COUNT(column_name) > 1;
Another way:
SELECT *
FROM TABLE A
WHERE EXISTS (
SELECT 1 FROM TABLE
WHERE COLUMN_NAME = A.COLUMN_NAME
AND ROWID < A.ROWID
)
Works fine (quick enough) when there is index on column_name. And it's better way to delete or update duplicate rows.
Simplest I can think of:
select job_number, count(*)
from jobs
group by job_number
having count(*) > 1;
You don't need to even have the count in the returned columns if you don't need to know the actual number of duplicates. e.g.
SELECT column_name
FROM table
GROUP BY column_name
HAVING COUNT(*) > 1
How about:
SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;
To answer the example above, it would look like:
SELECT job_number, count(*)
FROM jobs
GROUP BY job_number HAVING COUNT(*) > 1;
In case where multiple columns identify unique row (e.g relations table ) there you can use following
Use row id
e.g. emp_dept(empid, deptid, startdate, enddate)
suppose empid and deptid are unique and identify row in that case
select oed.empid, count(oed.empid)
from emp_dept oed
where exists ( select *
from emp_dept ied
where oed.rowid <> ied.rowid and
ied.empid = oed.empid and
ied.deptid = oed.deptid )
group by oed.empid having count(oed.empid) > 1 order by count(oed.empid);
and if such table has primary key then use primary key instead of rowid, e.g id is pk then
select oed.empid, count(oed.empid)
from emp_dept oed
where exists ( select *
from emp_dept ied
where oed.id <> ied.id and
ied.empid = oed.empid and
ied.deptid = oed.deptid )
group by oed.empid having count(oed.empid) > 1 order by count(oed.empid);
Doing
select count(j1.job_number), j1.job_number, j1.id, j2.id
from jobs j1 join jobs j2 on (j1.job_numer = j2.job_number)
where j1.id != j2.id
group by j1.job_number
will give you the duplicated rows' ids.
SELECT SocialSecurity_Number, Count(*) no_of_rows
FROM SocialSecurity
GROUP BY SocialSecurity_Number
HAVING Count(*) > 1
Order by Count(*) desc
I usually use Oracle Analytic function ROW_NUMBER().
Say you want to check the duplicates you have regarding a unique index or primary key built on columns (c1, c2, c3).
Then you will go this way, bringing up ROWID s of rows where the number of lines brought by ROW_NUMBER() is >1:
Select *
From Table_With_Duplicates
Where Rowid In (Select Rowid
From (Select ROW_NUMBER() Over (
Partition By c1, c2, c3
Order By c1, c2, c3
) nbLines
From Table_With_Duplicates) t2
Where nbLines > 1)
I know its an old thread but this may help some one.
If you need to print other columns of the table while checking for duplicate use below:
select * from table where column_name in
(select ing.column_name from table ing group by ing.column_name having count(*) > 1)
order by column_name desc;
also can add some additional filters in the where clause if needed.
Here is an SQL request to do that:
select column_name, count(1)
from table
group by column_name
having count (column_name) > 1;
1. solution
select * from emp
where rowid not in
(select max(rowid) from emp group by empno);
Also u can try something like this to list all duplicate values in a table say reqitem
SELECT count(poid)
FROM poitem
WHERE poid = 50
AND rownum < any (SELECT count(*) FROM poitem WHERE poid = 50)
GROUP BY poid
MINUS
SELECT count(poid)
FROM poitem
WHERE poid in (50)
GROUP BY poid
HAVING count(poid) > 1;