What is the quickest way in Oracle SQL to find out if one or more duplicates exist in a table? - sql

I'm looking to create a statement that stops and returns true the very second it finds a duplicate value on a column. I don't care what the value is and simply need to know whether a duplicate exists or not; nothing else.
I know i can write Select count(*) from myTable group by primary_id having count(*) > 1; but this goes through every single row of the table, whereas I want the query to stop as soon as it encounters a single case of a duplicate existing.
The best shot i've attempted with what i know is this:-
select 1 as thingy from dual outer_qry
where exists
(
select * from
(
select some_ID,
case when COUNT(*) > 1 then 'X' else 'N' end as TRIG
from myTable
group by some_ID
)INNER_QRY
where INNER_QRY.trig = outer_qry.dummy
);
However this takes 13 seconds and I doubt it takes that long to find the first duplicate.
Can anyone please suggest where my thinking is going wrong as, hopefully from my SQL, my assumption is that the EXISTS function will be checked for every row returned for the inner_qry, but this doesn't seem to be the case.

You would use exists. This returns all the duplicates:
select t.*
from mytable t
where exists (select 1
from mytable t t2
where t2.some_id = t.some_id and t2.rowid <> t.rowid
);
In Oracle 12c, you would add fetch first 1 row only. And it can take advantage of an index on mytable(some_id).
In earlier versions:
select 1 as HasDuplicate
from (select t.*
from mytable t
where exists (select 1
from mytable t t2
where t2.some_id = t.some_id and t2.rowid <> t.rowid
)
) t
where rownum = 1;
If this returns no rows, then there are no duplicates.

select * from table1 t1 natural join table1 t2 where t1.rowid < t2.rowid;

you can use this to understand which id is dublicate
select some_ID
from myTable
group by some_ID having count(*) >1

Related

How to exclude all rows with the same ID based on one record's value in psql?

Say I have the results above, and want to exclude all rows with ID of 14010497 because at least one of the rows has a date of 2/25. How would I filter it down? Using a WHERE table.end_date > '2019-02-25' would still include the row with a date of 2-23
Try something like this:
select * from your_table
where id not in (
select distinct id
from your_table
where end_date > '2019-02-25'
)
/
I would use not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.id = t.id and t2.end_date = '2019-02-25'
);
I strongly advise using not exists over not in because it handles NULL values much more intuitively. NOT IN will return no rows at all if any value in the subquery is NULL.

How to get duplicate text values from SQL query

I have to get table only with duplicate text values using SQL query. I have used Having count(columnname) > 1 but I'm not getting result, only with duplicate values instead getting all values.
Can anyone suggest whether I have to add anything to my query?
Thanks.
Use the below query. mention the column which is getting duplicated in the patition by clause..
with CTE_1
AS
(SELECT *,COUNT(1) OVER(PARTITION BY LTRIM(RTRIM(REPLACE(yourDuplicateColumn,' ',''))) Order by -anycolunm- ) cnt
FROM YourTable
)
SELECT *
FROM CTE_1
WHERE cnt>1
Assuming id is a primary key
select *
from myTable t1
where exists (select 1
from myTable t2
where t2.text = t1.text and t2.id != t1.id)
You can use similar to following query:
SELECT
column1, COUNT(*)
FROM table
GROUP BY column1
HAVING COUNT(*) > 1

exporting unique rows using row id

I have a huge table with duplicates of 500 GB and with partitions, have almost 2 billion records.
I am planning to write a condition where if 3 column values matches then pick those records, (say we got 4 duplicate records). And take export one of the record with min/max row id.
I know there may be different methods, like deletes or creating new table xyz. But We decided to approach using query option in export. So please help me if any one can get me a correct syntax.
I have been trying with:
query= schema.table:"WHERE ROWID <>
(SELECT MAX(ROWID) FROM schema.table A WHERE A.col1 = A.col1 AND A.col2 = A.col2 AND A.col3 = A.col3)"
But this will probably take duplicates, i tried with just = and <= sign and thats not helping me out. Its not exporting any rows, says exporting 0rows
So any suggestions ?
Use an analytic function to get the minimum for each group; this will only require a single table scan (i.e. no correlated sub-queries).
SELECT a,b,c -- ,d,e,f,g ...
FROM (
SELECT t.*,
ROWID AS rid,
MIN( ROWID ) OVER ( PARTITION BY a, b, c ) AS min_rid
FROM schema_name.table_name t
)
WHERE rid = min_rid;
You can use NOT EXIST operator to pick only one record from the table with the highest ROWID:
SELECT *
FROM table A
WHERE NOT EXISTS (
SELECT 1 FROM table B
WHERE A.col1 = B.col1 AND A.col2 = B.col2 AND A.col3 = B.col3
AND A.ROWID < B.ROWID
)
use having and group by your dublicate columns
select a,b,c, count(*), min(rowid), max(rowid)
from your_table
group by a,b,c
having count(*) > 1
then u can select needed rows like this
select *
from your_table
where rowid in (
select min(rowid)--, max(rowid)
from your_table
group by a,b,c
having count(*) > 1
)
sorry, if you need all rows inlude uniq and 1 of row what dublicate, no need having
select *
from your_table
where rowid in (
select min(rowid)--, max(rowid)
from your_table
group by a,b,c
)
Thanks every one... really appreciate the fast response.. i guess i had tried similar logics..dnt remember but will try surely.. on the subpartition level and check..
but my quick question to every one what you guyies have specified, will this work out in export parameter in query block ?

SQL: Get running row delta for records

Let's say we have this table with columns RowID and Call:
RowID Call DesiredOut
1 A 0
2 A 0
3 B
4 A 1
5 A 0
6 A 0
7 B
8 B
9 A 2
10 A 0
I want to SQL query the last column DesiredOut as follows:
Each time Call is 'A' go back until 'A' is found again and count the number of records which are in between two 'A' entries.
Example: RowID 4 has 'A' and the nearest predecessor is in RowID 2. Between RowID 2 and RowID 4 we have one Call 'B', so we count 1.
Is there an elegant and performant way to do this with ANSI SQL?
I would approach this by first finding the rowid of the previous "A" value. Then count the number of values in-between.
The following query implements this logic using correlated subqueries:
select t.*,
(case when t.call = 'A'
then (select count(*)
from table t3
where t3.id < t.id and t3.id > prevA
)
end) as InBetweenCount
from (select t.*,
(select max(rowid)
from table t2
where t2.call = 'A' and t2.rowid < t.rowid
) as prevA
from table t
) t;
If you know that rowid is sequential with no gaps, you can just use subtraction instead of a subquery for the calculation in the outer query.
You could use a query to find the previous Call = A row. Then, you could count the number of rows between that row and the current row:
select RowID
, `Call`
, (
select count(*)
from YourTable t2
where RowID < t1.RowID
and RowID > coalesce(
(
select RowID
from YourTable t3
where `Call` = 'A'
and RowID < t1.RowID
order by
RowID DESC
limit 1
),0)
)
from YourTable t1
Example at SQL Fiddle.
Here is another solution using window functions:
with flagged as (
select *,
case
when call = 'A' and lead(call) over (order by rowid) <> 'A' then 'end'
when call = 'A' and lag(call) over (order by rowid) <> 'A' then 'start'
end as change_flag
from calls
)
select t1.rowid,
t1.call,
case
when change_flag = 'start' then rowid - (select max(t2.rowid) from flagged t2 where t2.change_flag = 'end' and t2.rowid < t1.rowid) - 1
when call = 'A' then 0
end as desiredout
from flagged t1
order by rowid;
The CTE first marks the start and end of each "A"-Block and the final select then uses these markers to get the difference between the start of one block and the end of the previous one.
If the rowid is not gapless, you can easily add a gapless rownumber inside the CTE to calculate the difference.
I'm not sure about the performance though. I wouldn't be surprised if Gordon's answer is faster.
SQLFiddle example: http://sqlfiddle.com/#!15/e1840/1
Believe it or not, this will be pretty fast if the two columns are indexed.
select r1.RowID, r1.CallID, isnull( R1.RowID - R2.RowID - 1, 0 ) as DesiredOut
from RollCall R1
left join RollCall R2
on R2.RowID =(
select max( RowID )
from RollCall
where RowID < R1.RowID
and CallID = 'A')
and R1.CallID = 'A';
Here is the Fiddle.
You could do something like that:
SELECT a.rowid - b.rowid
FROM table as a,
(SELECT rowid FROM table where rowid < a.rowid order by rowid) as b
WHERE <something>
ORDER BY a.rowid
As I cannot say which DBMS you are using this is more kind of pseudo code which could work based on your system.

t-sql - delete second value only

i would like to run a sql statement that will delete ONLY the second value for example
delete from table1 where condition1
i want this statement to delete ONLY the second value
how can i accomplish this?
i would like to clarify. i have a field called field1 which is an autonumber and and it is a primary key and it increments. i would like to delete the record containing the greater number
You could also employ the ROW_NUMBER() function of SQL server to number each row, and use this number to isolate just the second item for deletion, according to your own custom ordering in the inner query ( over (ORDER BY <myKey> asc) ). This provides a great deal of flexibility.
DELETE a FROM table1
FROM table1 a
JOIN (
select ROW_NUMBER() over (ORDER BY <myKey> asc) as AutoNumber, <myKey> from table1
) b on a.<myKey> = b.<myKey>
WHERE condition1
AND b.AutoNumber = 2
Do you want to delete only the last duplicate, or all but the first?
For all but the first: (Edited to use CTE per #Martin's suggestion.)
with target as (select * from table1 where condition1)
delete from target goner
where exists (select * from target keeper
where keeper.field1 < goner.field1)
In other words, if there is another matching record with a lower field1, delete this record.
EDIT:
To delete only the last:
with target as (select * from table1 where condition1)
delete from target goner
where exists (select * from target keeper
where keeper.field1 < goner.field1)
and not exists (select * from target missing
where missing.field1 > goner.field1)
In other words, if there is another matching record with a lower field1, AND there is no matching record with a higher field1, then we have the highest duplicate, so nuke it.
It's been a while (so my syntax my not quite be right), and this may not be the best solution, but the "academic" answer would be something like:
delete from table1 where condition1
and field1 = (select max(field1) from table1 where condition1)
Try this:
DELETE MyTable
FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(id) as id, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.id= KeepRows.id
WHERE
KeepRows.RowId IS NULL
UPDATE
While this might not be as "pretty" as #Jeffrey's it works. From what I can tell, #Jeffrey's does not. See sql below (Delete replaced with SELECT * for demonstration):
WITH TEMP as
(
SELECT 1 as id,'A' as a,'Z' as b
UNION
SELECT 2,'A','Z'
UNION
SELECT 3,'B','Z'
UNION
SELECT 4,'B','Z'
)
SELECT *
FROM TEMP
LEFT OUTER JOIN (
SELECT MIN(id) as id, a, b
FROM TEMP
GROUP BY a, b
) as KeepRows ON
temp.id= KeepRows.id
WHERE
KeepRows.id IS NULL