Find duplicate values in oracle - sql

I'm using this query to find duplicate values in a table:
select col1,
count(col1)
from table1
group by col1
having count (col1) > 1
order by 2 desc;
But also I want to add another column from the same table, like this:
select col1,
col2,
count(col1)
from table1
group by col1
having count (col1) > 1
order by 2 desc;
I get an ORA-00979 error with that second query
How can I add another column in my search?

Your query should be
SELECT * FROM (
select col1,
col2,
count(col1) over (partition by col1) col1_cnt
from table1
)
WHERE col1_cnt > 1
order by 2 desc;

Presumably you want to get col2 for each duplicate of col1 that turns up. You can't really do that in a single query^. Instead, what you need to do is get your list of duplicates, then use that to retrieve any other associated values:
select col1, col2
from table1
where col1 in (select col1
from table1
group by col1
having count (col1) > 1)
order by col2 desc
^ Okay, you can, by using analytic functions, as #rs. demonstrated. For this scenario, I suspect that the nested query will be more efficient, but both should give you the same results.
Based on comments, it seems like you're not clear on why you can't just add the second column. Assume you have sample data that looks like this:
Col1 | Col2
-----+-----
1 | A
1 | B
2 | C
2 | D
3 | E
If you run
select Col1, count(*) as cnt
from table1
group by Col1
having count(*) > 1
then your results will be:
Col1 | Cnt
-----+-----
1 | 2
2 | 2
You can't just add Col2 to this query without adding it to the group by clause because the database will have no way of knowing which value you actually want (i.e. for Col1=1 should the DB return 'A' or 'B'?). If you add Col2 to the group by clause, you get the following:
select Col1, Col2, count(*) as cnt
from table1
group by Col1, Col2
having count(*) > 1
Col1 | Col2 | Cnt
-----+------+----
[no results]
This is because the count is for each combination of Col1 and Col2 (each of which are unique).
Finally, by using either a nested query (as in my answer) or an analytic function (as in #rs.'s answer), you'll get the following result (query changed slightly to return the count):
select t1.col1, t1.col2, cnt
from table1 t1
join (select col1, count(*) as cnt
from table1
group by col1
having count (col1) > 1) t2
on table1.col1 = t2.col1
Col1 | Col2 | Cnt
-----+------+----
1 | A | 2
1 | B | 2
2 | C | 2
2 | D | 2

You should list all selected columns in the group by clause as well.
select col1,
col2,
count(col1)
from table1
group by col1, col2
having count (col1) > 1
order by 2 desc;

Cause of Error
You tried to execute an SQL SELECT statement that included a GROUP BY
function (ie: SQL MIN Function, SQL MAX Function, SQL SUM Function,
SQL COUNT Function) and an expression in the SELECT list that was not
in the SQL GROUP BY clause.
select col1,
col2,
count(col1)
from table1
group by col1,col2
having count (col1) > 1
order by 2 desc;

Related

How can I find groups with more than one rows and list the rows in each such group?

I have a table "mytable" in a database.
Given a subset of the columns of the table, I would like to group by the subset of the columns, and find those groups with more than one rows:
For example, if the table is
col1 col2 col3
1 1 1
1 1 2
1 2 1
2 2 1
2 2 3
2 1 1
I am interested in finding groups by col1 and col2 with more than one rows, which are:
col1 col2 col3
1 1 1
1 1 2
and
col1 col2 col3
2 2 1
2 2 3
I was wondering how to write a SQL query for that purpose?
Is the following the best way to do that?
First get the col1 and col2 values of such groups:
SELECT col1 col2 COUNT(*)
FROM mytable
GROUP BY col1, col2
HAVING COUNT(*) > 1
Then based on the output of the previous query, manually write a query for each group:
SELECT *
FROM mytable
WHERE col1 = val1 AND col2 = val2
If there are many such groups, then I will have to manually write many queries, which can be a disadvantage.
I am using SQL Server.
Thanks.
This is a common problem. One solution is to get the "keys" in a derived table and join to that to get the rows.
declare #test as table (col1 int, col2 int, col3 int)
insert into #test values (1,1,1),(1,1,2),(1,2,1),(2,2,1),(2,2,3),(2,1,1)
select t.*
from #test t
inner join (
select col1, col2
from #test
group by col1, col2
having count(*) > 1
) k
on k.col1 = t.col1 and k.col2 = t.col2
col1 col2 col3
----------- ----------- -----------
1 1 1
1 1 2
2 2 1
2 2 3
The window function sum() over() may help here
Example
with cte as (
Select *
,Cnt = sum(1) over (partition by Col1,Col2)
From YourTable
)
Select *
From cte
Where Cnt>=2
Results
Another option (less performant)
Select top 1 with ties *
From YourTable
Order By case when sum(1) over (partition by Col1,Col2) > 1 then 1 else 2 end
Results

Oracle query - Selecting unique row number based on order of another column

I'm trying to find the best way to make a query with two columns, one is a number and the order a date:
Doing a select and ordering by the date column.
Table1:
col1 (NUMBER)
col2 (DATE)
1
02/2019
2
02/2019
3
02/2019
4
03/2019
2
04/2019
3
05/2019
I'm doing a query like this:
select col1, col2
from table1
order by col2 asc, col1 asc
fetch next 10;
The result I'm getting is also getting the next day's values, and repeating the value on col1 result like this:
col1 (NUMBER)
col2 (DATE)
1
02/2019
2
02/2019
3
02/2019
4
03/2019
2
04/2019
3
05/2019
But I would like a filter to limit to only a sequential col1 value like this:
col1 (NUMBER)
col2 (DATE)
1
02/2019
2
02/2019
3
02/2019
4
03/2019
ignoring values that would come in a "next batch" and not going through the risk of repeating col1 values, or getting col1 values that have a bigger col2 value than a previous result.
Any ideas on the best way to do this?
If I understand correctly, you can use a cumulative max():
select col1, col2
from (select t1.*,
max(col1) over (order by col2, col1 rows between unbounded preceding and 1 preceding) as running_max
from table1 t1
) t1
where running_max is null or col1 > running_max;
This returns rows whose value is greater than the values on the preceding rows.
EDIT:
If you want to return rows only up to the first time there is a decline, then:
select t1.*
from (select t1.*,
sum(case when prev_col1 > col1 then 1 else 0 end) over (order by col2, col1) as num_decreases
from (select t1.*,
lag(col1) over (order by col2, col1) as prev_col1
from table1 t1
) t1
where num_decreases = 0;

select query to fetch rows corresponding to all values in a column

Consider this example table "Table1".
Col1 Col2
A 1
B 1
A 4
A 5
A 3
A 2
D 1
B 2
C 3
B 4
I am trying to fetch those values from Col1 which corresponds to all values (in this case, 1,2,3,4,5). Here the result of the query should return 'A' as none of the others have all values 1,2,3,4,5 in Col2.
Note that the values in Col2 are decided by other parameters in the query and they will always return some numeric values. Out of those values the query needs to fetch values from Col1 corresponding to all in Col2. The values in Col2 could be 11,12,1,2,3,4 for instance (meaning not necessarily in sequence).
I have tried the following select query:
select distinct Col1 from Table1 where Col1 in (1,2,3,4,5);
select distinct Col1 from Table1 where Col1 exists (select distinct Col2 from Table1);
and its different variations. But the problem is that I need to apply an 'and' for Col2 not an 'or'.
like Return a value from Col1 where Col2 'contains' all values between 1 and 5.
Appreciate any suggestion.
You could use analytic ROW_NUMBER() function.
SQL FIddle for a setup and working demonstration.
SELECT col1
FROM
(SELECT col1,
col2,
row_number() OVER(PARTITION BY col1 ORDER BY col2) rn
FROM your_table
WHERE col2 IN (1,2,3,4,5)
)
WHERE rn =5;
UPDATE As requested by OP, some explanation about how the query works.
The inner sub-query gives you the following resultset:
SQL> SELECT col1,
2 col2,
3 row_number() OVER(PARTITION BY col1 ORDER BY col2) rn
4 FROM t
5 WHERE col2 IN (1,2,3,4,5);
C COL2 RN
- ---------- ----------
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
B 1 1
B 2 2
B 4 3
C 3 1
D 1 1
10 rows selected.
PARTITION BY clause will group each sets of col1, and ORDER BY will sort col2 in each group set of col1. Thus the sub-query gives you the row_number for each row in an ordered way. now you know that you only need those rows where row_number is at least 5. So, in the outer query all you need ot do is WHERE rn =5 to filter the rows.
You can use listagg function, like
SELECT Col1
FROM
(select Col1,listagg(Col2,',') within group (order by Col2) Col2List from Table1
group by Col1)
WHERE Col2List = '1,2,3,4,5'
You can also use below
SELECT COL1
FROM TABLE_NAME
GROUP BY COL1
HAVING
COUNT(COL1)=5
AND
SUM(
(CASE WHEN COL2=1 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=2 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=3 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=4 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=5 THEN 1 ELSE 0
END))=5

Removing rows in SQL that have a duplicate column value

I have looked high and low on SO for an answer over the last couple of hours (subqueries, CTE's, left-joins with derived tables) to this question but none of the solutions are really meeting my criteria..
I have a table with data like this :
COL1 COL2 COL3
1 A 0
2 A 1
3 A 1
4 B 0
5 B 0
6 B 0
7 B 0
8 B 1
Where column1 1 is the primary key and is an int. Column 2 is nvarchar(max) and column 3 is an int. I have determined that by using this query:
select name, COUNT(name) as 'count'
FROM [dbo].[AppConfig]
group by Name
having COUNT(name) > 3
I can return the total counts of "A, B and C" only if they have an occurrence of column C more than 3 times. I am now trying to remove all the rows that occur after the initial value of column 3. The sample table I provided would look like this now:
COL1 COL2 COL3
1 A 0
2 A 1
4 B 0
8 B 1
Could anyone assist me with this?
If all you want is the first row with a ColB-ColC combination, the following will do it:
select min(id) as id, colB, colC
from tbl
group by colB, colC
order by id
SQL Fiddle
This should work:
;WITH numbered_rows as (
SELECT
Col1,
Col2,
Col3,
ROW_NUMBER() OVER(PARTITION BY Col2, Col3 ORDER BY Col3) as row
FROM AppConfig)
SELECT
Col1,
Col2,
Col3
FROM numbered_rows
WHERE row = 1
SELECT DISTINCT MIN(COL1) AS COL1,COL2,COL3
FROM TABLE
GROUP BY COL2,COL3
ORDER BY COL1

Oracle equivalent of Postgres' DISTINCT ON?

In postgres, you can query for the first value of in a group with DISTINCT ON. How can this be achieved in Oracle?
From the postgres manual:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the "first row" of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first.
For example, for a given table:
col1 | col2
------+------
A | AB
A | AD
A | BC
B | AN
B | BA
C | AC
C | CC
Ascending sort:
> select distinct on(col1) col1, col2 from tmp order by col1, col2 asc;
col1 | col2
------+------
A | AB
B | AN
C | AC
Descending sort:
> select distinct on(col1) col1, col2 from tmp order by col1, col2 desc;
col1 | col2
------+------
A | BC
B | BA
C | CC
The same effect can be replicated in Oracle either by using the first_value() function or by using one of the rank() or row_number() functions.
Both variants also work in Postgres.
first_value()
select distinct col1,
first_value(col2) over (partition by col1 order by col2 asc)
from tmp
first_value gives the first value for the partition, but repeats it for each row, so it is necessary to use it in combination with distinct to get a single row for each partition.
row_number() / rank()
select col1, col2 from (
select col1, col2,
row_number() over (partition by col1 order by col2 asc) as rownumber
from tmp
) foo
where rownumber = 1
Replacing row_number() with rank() in this example yields the same result.
A feature of this variant is that it can be used to fetch the first N rows for a given partition (e.g. "last 3 updated") simply by changing rownumber = 1 to rownumber <= N.
If you have more than two fields then use beerbajays answer as a sub query (note in DESC order):
select col1,col2, col3,col4 from tmp where col2 in
(
select distinct
first_value(col2) over (partition by col1 order by col2 DESC) as col2
from tmp
--WHERE you decide conditions
)