Delete results in subquery - sql

I would like to delete the results of the subquery below. How do I do that?
delete
from mytable
where rowid in (select rowid, count(*) as count from mytable group by mygroup having count > 50)
Did not work:
[1] [SQLITE_ERROR] SQL error or missing database (sub-select returns 8
columns - expected 1)

If you check this page, you would notice that when using IN and a subquery:
A list of values is a fixed value list or a result set of one column returned by a subquery. The returned type of the expression and values in the list must be the same.
Your mistake here is that your subquery is returning more than 1 column (rowid and count).
To fix that you can try this:
delete
from mytable
where rowid in (select rowid
from (select rowid,
count(*) as count
from mytable
group by mygroup
having count > 50)
)
Note that, assuming that you want to delete all the results having mygroup count > 50, this query won't do the full job. It will just return one row of that group. You should try this:
select * FROM mytable
where mygroup in (select mygroup
from (select mygroup,
count(*) as count
from mytable
group by mygroup
having count > 1)
);
as stated in D-Shish's answer.
Here's a demo to see the two different result set from these two queries.

delete
from mytable
where rowid in (select rowid from (select rowid, count(*) as count from mytable group by mygroup having count > 50))

I guess you might want to do this when you use aggregate function you need to add non-aggregate column in group by
delete
from mytable
where mygroup in (select mygroup from mytable group by mygroup having count(*) > 50)

Related

I can't figure out how to do this DISTINCT

Good morning
I tried and tried to understand why this Query gives the usual error on Group By. I would like to find the duplicate lines and delete them. I found this query on Microsoft's MSDN but despite this it keeps giving me this error on Group By.
The main table has 3 fields "Id, Item, Description", the table name is "tlbDescription", this query should in theory create a table named "duplicate_table" insert the duplicate values inside the "duplicate_table", then delete the values from table "tlbDescription" and finally delete the table "duplicate_table".
If someone can kindly give me a hand
Thank you
Fabrizio
This is the query:
SELECT DISTINCT *
INTO duplicate_table
FROM [tlbDescrizione]
GROUP BY [Articolo]
HAVING COUNT([Articolo]) > 1
DELETE [tlbDescrizione]
WHERE [Articolo] IN (SELECT [Articolo] FROM duplicate_table)
INSERT [tlbDescrizione]
SELECT * FROM duplicate_table
DROP TABLE duplicate_table
This query doesn't make sense:
SELECT DISTINCT *
INTO duplicate_table
FROM [tlbDescrizione]
GROUP BY [Articolo]
HAVING COUNT([Articolo]) > 1;
It is selecting all columns but is an aggregation query because of the GROUP BY. Hence, the SELECT columns are inconsistent with the GROUP BY columns and you get an error.
If you want all the columns then you can use window functions:
SELECT DISTINCT *
INTO duplicate_table
FROM (SELECT d.*, COUNT(*) OVER (PARTITION BY d.Articolo) as cnt
FROM tlbDescrizione d
) d
WHERE cnt > 1;
Or, if you want only the ids:
SELECT Articolo
INTO duplicate_table
FROM tlbDescrizione
GROUP BY [Articolo]
HAVING COUNT(*) > 1;

Alter a column to indicate there are duplicates in a separate column?

I have a table in which one of the columns, column_A has duplicate values. I also have a blank column_indicator which I would like to populate with 1s for all cases where the value in column_A occurs more than once.
I know how to SELECT the duplicates and have used the following formula:
SELECT [dbo].[myTable].[column_A], COUNT(*)
FROM [dbo].[myTable]
GROUP BY [dbo].[myTable].[column_A]
HAVING COUNT(*) > 1
How do I update column_indicator? I have tried:
UPDATE [dbo].[myTable]
SET [dbo].[myTable].[column_indicator] = 1
WHERE
GROUP BY [dbo].[myTable].[column_A]
HAVING COUNT(*) > 1
I know I am off base but cannot figure out how to proceed with this column update.
You can do a window count in a common table expression and then update it:
WITH cte AS (
SELECT
[column_indicator],
[column_A],
COUNT(*) OVER(PARTITION BY [column_A]) cnt
FROM [dbo].[myTable]
)
UPDATE cte SET [column_indicator] = 1 WHERE cnt > 1
I think you can also use a nested update query to solve this.
UPDATE [dbo].[myTable] SET [column_indicator] = 1
FROM
(
SELECT a.[column_A],a.[column_indicator], COUNT(*) AS COUNT
FROM (select [dbo].[myTable].[column_A],[dbo].[myTable].[column_indicator]
from [dbo].[myTable]) a
GROUP BY a.[column_A],a.[column_indicator]
having COUNT > 1
) t
WHERE [dbo].[myTable].[column_A] = t.[column_A];

T-SQL, select rows

as shown in the screenshot, for different id, I want to select all rows that have common updateTime. For example here, all three IDs have common updateTime 9:30:02 and 9:30:04. Therefore, I want to select the 3rd and 4th rows (for id 211709), 6th and 8th rows (for id 301801), 9th and 10th rows (for id 931801), I want to select all these 6 rows. What sql code should I write? Thanks in advance!!!
As you need to return only update times that are common for all IDs, you can use a query like this:
SELECT * FROM MyTable WHERE UpdateTime IN (
SELECT DISTINCT UpdateTime
FROM MyTable
GROUP BY UpdateTime
HAVING (SELECT COUNT(DISTINCT c.Id) FROM MyTable c WHERE c.UpdateTime = UpdateTime) = COUNT(DISTINCT Id)
)
If you wonder what the HAVING clause does - for every UpdateTime you checking if number of IDs with this UpdateTime equals total number of IDs.
This query will give you all update times that occur more than once
Select UpdateTime
From MyTable
Group By UpdateTime
Having Count (*) > 1
Use this as a sub query
Select *
From MyTable
Where UpdateTime IN
(
Select UpdateTime
From MyTable
Group By UpdateTime
Having Count (*) > 1
)

SQL: Delete all NOT MAX Records in GroupBy

My goal is to delete all records from my table that are NOT the MAX(recordDate) of a grouped CaseKey. So if I have 9 records with 3 sets of 3 casekeys, and each casekey has its 3 dates. I'd delete the 2 lower dates of each set and come up with 3 total records, only the MAX(recordDate) of each remaining.
I have the following SQL Query:
DELETE FROM table
WHERE tableID NOT IN (
SELECT tableID
FROM (
Select MAX(recordDate) As myDate, tableID From table
Group By CaseKey
) As foo
)
I receive the error:
Error on Line 3... Column 'table.tableID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Obviously I could add tableID to my Group By clause, but then the result of that statement is incorrect and returns all rows instead of just returning the MAX recordDate of the grouped CaseKeys.
Server is down right now, but the apparent answer is: (tiny tweak from WildPlasser's answer)
DELETE zt FROM ztable zt
WHERE EXISTS (
SELECT * FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
In other words, for each record in zt, run a query to see if the same record also has a record with a higher recordDate. If so, the WHERE EXISTS statement passes and the record is deleted, otherwise the WHERE statement fails and the record is its own MAX recordDate.
Thank you, WildPlasser, for that simplistic methodology that I was somehow blowing up.
There is one special property of MAX: there is no record with a higher value than max. So we can delete all the records for which a record with the same CaseKey, but with a higher recordDate exists:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ex.recordDate > zt.recordDate
);
BTW: The above query (as well as the MAX() version) assumes that there is only one record with the maximum date. There could be ties.
In the case of ties, you'll need to add an extra field to the where clause; as a tie-breaker. Assuming that TableId can function as such, the query would become:
DELETE FROM ztable zt
WHERE EXISTS (
SELECT *
FROM ztable ex
WHERE ex.CaseKey = zt.CaseKey
AND ( ex.recordDate > zt.recordDate
OR (ex.recordDate = zt.recordDate AND ex.TableId > zt.TableId)
)
);
Just express
delete all records from my table that are NOT the MAX(recordDate) of a
grouped CaseKey
in sql as
DELETE FROM table t1
WHERE t1.recordDate <>
(SELECT MAX(recordDate)
FROM table t2
WHERE t2.CaseKey = t1.CaseKey)
You can rank all records with the same caseKey where the rank > 1 to only return the lower dates. That way you can use your tableID.
DELETE FROM [table]
WHERE [tableID] IN
(SELECT
[sub].[tableID]
FROM
(
SELECT
[tableID],
Rank() OVER (PARTITION BY [caseKey] ORDER BY [recordDate] DESC, [tableID] DESC) AS [rank]
FROM [table]
) AS [sub]
WHERE [sub].[rank] > 1)

How do I find duplicate values in a table in Oracle?

What's the simplest SQL statement that will return the duplicate values for a given column and the count of their occurrences in an Oracle database table?
For example: I have a JOBS table with the column JOB_NUMBER. How can I find out if I have any duplicate JOB_NUMBERs, and how many times they're duplicated?
Aggregate the column by COUNT, then use a HAVING clause to find values that appear more than once.
SELECT column_name, COUNT(column_name)
FROM table_name
GROUP BY column_name
HAVING COUNT(column_name) > 1;
Another way:
SELECT *
FROM TABLE A
WHERE EXISTS (
SELECT 1 FROM TABLE
WHERE COLUMN_NAME = A.COLUMN_NAME
AND ROWID < A.ROWID
)
Works fine (quick enough) when there is index on column_name. And it's better way to delete or update duplicate rows.
Simplest I can think of:
select job_number, count(*)
from jobs
group by job_number
having count(*) > 1;
You don't need to even have the count in the returned columns if you don't need to know the actual number of duplicates. e.g.
SELECT column_name
FROM table
GROUP BY column_name
HAVING COUNT(*) > 1
How about:
SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;
To answer the example above, it would look like:
SELECT job_number, count(*)
FROM jobs
GROUP BY job_number HAVING COUNT(*) > 1;
In case where multiple columns identify unique row (e.g relations table ) there you can use following
Use row id
e.g. emp_dept(empid, deptid, startdate, enddate)
suppose empid and deptid are unique and identify row in that case
select oed.empid, count(oed.empid)
from emp_dept oed
where exists ( select *
from emp_dept ied
where oed.rowid <> ied.rowid and
ied.empid = oed.empid and
ied.deptid = oed.deptid )
group by oed.empid having count(oed.empid) > 1 order by count(oed.empid);
and if such table has primary key then use primary key instead of rowid, e.g id is pk then
select oed.empid, count(oed.empid)
from emp_dept oed
where exists ( select *
from emp_dept ied
where oed.id <> ied.id and
ied.empid = oed.empid and
ied.deptid = oed.deptid )
group by oed.empid having count(oed.empid) > 1 order by count(oed.empid);
Doing
select count(j1.job_number), j1.job_number, j1.id, j2.id
from jobs j1 join jobs j2 on (j1.job_numer = j2.job_number)
where j1.id != j2.id
group by j1.job_number
will give you the duplicated rows' ids.
SELECT SocialSecurity_Number, Count(*) no_of_rows
FROM SocialSecurity
GROUP BY SocialSecurity_Number
HAVING Count(*) > 1
Order by Count(*) desc
I usually use Oracle Analytic function ROW_NUMBER().
Say you want to check the duplicates you have regarding a unique index or primary key built on columns (c1, c2, c3).
Then you will go this way, bringing up ROWID s of rows where the number of lines brought by ROW_NUMBER() is >1:
Select *
From Table_With_Duplicates
Where Rowid In (Select Rowid
From (Select ROW_NUMBER() Over (
Partition By c1, c2, c3
Order By c1, c2, c3
) nbLines
From Table_With_Duplicates) t2
Where nbLines > 1)
I know its an old thread but this may help some one.
If you need to print other columns of the table while checking for duplicate use below:
select * from table where column_name in
(select ing.column_name from table ing group by ing.column_name having count(*) > 1)
order by column_name desc;
also can add some additional filters in the where clause if needed.
Here is an SQL request to do that:
select column_name, count(1)
from table
group by column_name
having count (column_name) > 1;
1. solution
select * from emp
where rowid not in
(select max(rowid) from emp group by empno);
Also u can try something like this to list all duplicate values in a table say reqitem
SELECT count(poid)
FROM poitem
WHERE poid = 50
AND rownum < any (SELECT count(*) FROM poitem WHERE poid = 50)
GROUP BY poid
MINUS
SELECT count(poid)
FROM poitem
WHERE poid in (50)
GROUP BY poid
HAVING count(poid) > 1;