SQL how to highlight duplicates under some conditions - sql

I need to mark dupliactes in the data buy only under some complex conditions. Let's say I have a table like this:
col1 col2
1 a
1 a
1 a
2 #B
2 #B
1 a
3 #B
3 #B
2 #B
1 a
4 #A
4 #A
5 c
I need to mark those records where:
value in col2 begins with a '#' AND ( it is a duplicate value in col2 AND it is under different values in col1).
so I need to get this:
col1 col2 newcol
1 a
1 a
1 a
2 #B 1
2 #B 1
1 a
3 #B 1
3 #B 1
2 #B 1
1 a
4 #A
4 #A
5 c
the reason why rows with "#B" in col2 are marked is because it is a duplicate in col2 AND "#B" can be found under "3" and "2" (so 2 or more different values) in col1. The reson why records with "#A" are NOT marked is because while the are a duplicate in col2 they are only under one value ("4") in col1.
I am working in dashDB

I think DashDB supports window functions. If so, you can do:
select col1, col2,
(case when min_col1 <> max_col1 then 1 end) as flag
from (select t.*,
min(col1) over (partition by col2) as min_col1,
max(col1) over (partition by col2) as max_col1
from t
) t;
You can also do something similar without window functions.
Here is an alternative method:
select t.*, t2.flag
from t join
(select col2,
(case when min(col1) <> max(col1) then 1 end) as flag
from t
group by col2
) t2
on t.col2 = t2.col2;

Related

How can I find groups with more than one rows and list the rows in each such group?

I have a table "mytable" in a database.
Given a subset of the columns of the table, I would like to group by the subset of the columns, and find those groups with more than one rows:
For example, if the table is
col1 col2 col3
1 1 1
1 1 2
1 2 1
2 2 1
2 2 3
2 1 1
I am interested in finding groups by col1 and col2 with more than one rows, which are:
col1 col2 col3
1 1 1
1 1 2
and
col1 col2 col3
2 2 1
2 2 3
I was wondering how to write a SQL query for that purpose?
Is the following the best way to do that?
First get the col1 and col2 values of such groups:
SELECT col1 col2 COUNT(*)
FROM mytable
GROUP BY col1, col2
HAVING COUNT(*) > 1
Then based on the output of the previous query, manually write a query for each group:
SELECT *
FROM mytable
WHERE col1 = val1 AND col2 = val2
If there are many such groups, then I will have to manually write many queries, which can be a disadvantage.
I am using SQL Server.
Thanks.
This is a common problem. One solution is to get the "keys" in a derived table and join to that to get the rows.
declare #test as table (col1 int, col2 int, col3 int)
insert into #test values (1,1,1),(1,1,2),(1,2,1),(2,2,1),(2,2,3),(2,1,1)
select t.*
from #test t
inner join (
select col1, col2
from #test
group by col1, col2
having count(*) > 1
) k
on k.col1 = t.col1 and k.col2 = t.col2
col1 col2 col3
----------- ----------- -----------
1 1 1
1 1 2
2 2 1
2 2 3
The window function sum() over() may help here
Example
with cte as (
Select *
,Cnt = sum(1) over (partition by Col1,Col2)
From YourTable
)
Select *
From cte
Where Cnt>=2
Results
Another option (less performant)
Select top 1 with ties *
From YourTable
Order By case when sum(1) over (partition by Col1,Col2) > 1 then 1 else 2 end
Results

select query to fetch rows corresponding to all values in a column

Consider this example table "Table1".
Col1 Col2
A 1
B 1
A 4
A 5
A 3
A 2
D 1
B 2
C 3
B 4
I am trying to fetch those values from Col1 which corresponds to all values (in this case, 1,2,3,4,5). Here the result of the query should return 'A' as none of the others have all values 1,2,3,4,5 in Col2.
Note that the values in Col2 are decided by other parameters in the query and they will always return some numeric values. Out of those values the query needs to fetch values from Col1 corresponding to all in Col2. The values in Col2 could be 11,12,1,2,3,4 for instance (meaning not necessarily in sequence).
I have tried the following select query:
select distinct Col1 from Table1 where Col1 in (1,2,3,4,5);
select distinct Col1 from Table1 where Col1 exists (select distinct Col2 from Table1);
and its different variations. But the problem is that I need to apply an 'and' for Col2 not an 'or'.
like Return a value from Col1 where Col2 'contains' all values between 1 and 5.
Appreciate any suggestion.
You could use analytic ROW_NUMBER() function.
SQL FIddle for a setup and working demonstration.
SELECT col1
FROM
(SELECT col1,
col2,
row_number() OVER(PARTITION BY col1 ORDER BY col2) rn
FROM your_table
WHERE col2 IN (1,2,3,4,5)
)
WHERE rn =5;
UPDATE As requested by OP, some explanation about how the query works.
The inner sub-query gives you the following resultset:
SQL> SELECT col1,
2 col2,
3 row_number() OVER(PARTITION BY col1 ORDER BY col2) rn
4 FROM t
5 WHERE col2 IN (1,2,3,4,5);
C COL2 RN
- ---------- ----------
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
B 1 1
B 2 2
B 4 3
C 3 1
D 1 1
10 rows selected.
PARTITION BY clause will group each sets of col1, and ORDER BY will sort col2 in each group set of col1. Thus the sub-query gives you the row_number for each row in an ordered way. now you know that you only need those rows where row_number is at least 5. So, in the outer query all you need ot do is WHERE rn =5 to filter the rows.
You can use listagg function, like
SELECT Col1
FROM
(select Col1,listagg(Col2,',') within group (order by Col2) Col2List from Table1
group by Col1)
WHERE Col2List = '1,2,3,4,5'
You can also use below
SELECT COL1
FROM TABLE_NAME
GROUP BY COL1
HAVING
COUNT(COL1)=5
AND
SUM(
(CASE WHEN COL2=1 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=2 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=3 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=4 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=5 THEN 1 ELSE 0
END))=5

Removing rows in SQL that have a duplicate column value

I have looked high and low on SO for an answer over the last couple of hours (subqueries, CTE's, left-joins with derived tables) to this question but none of the solutions are really meeting my criteria..
I have a table with data like this :
COL1 COL2 COL3
1 A 0
2 A 1
3 A 1
4 B 0
5 B 0
6 B 0
7 B 0
8 B 1
Where column1 1 is the primary key and is an int. Column 2 is nvarchar(max) and column 3 is an int. I have determined that by using this query:
select name, COUNT(name) as 'count'
FROM [dbo].[AppConfig]
group by Name
having COUNT(name) > 3
I can return the total counts of "A, B and C" only if they have an occurrence of column C more than 3 times. I am now trying to remove all the rows that occur after the initial value of column 3. The sample table I provided would look like this now:
COL1 COL2 COL3
1 A 0
2 A 1
4 B 0
8 B 1
Could anyone assist me with this?
If all you want is the first row with a ColB-ColC combination, the following will do it:
select min(id) as id, colB, colC
from tbl
group by colB, colC
order by id
SQL Fiddle
This should work:
;WITH numbered_rows as (
SELECT
Col1,
Col2,
Col3,
ROW_NUMBER() OVER(PARTITION BY Col2, Col3 ORDER BY Col3) as row
FROM AppConfig)
SELECT
Col1,
Col2,
Col3
FROM numbered_rows
WHERE row = 1
SELECT DISTINCT MIN(COL1) AS COL1,COL2,COL3
FROM TABLE
GROUP BY COL2,COL3
ORDER BY COL1

selection based on certain condition

select col1, col2, col3 from tab1
rownum col1 col2 col3
1 1 10 A
2 1 15 B
3 1 0 A
4 1 0 C
5 2 0 B
6 3 20 C
7 3 0 D
8 4 10 B
9 5 0 A
10 5 0 B
Output required is
col1 col2 col3
1 10 A
1 15 B
2 0 B
3 20 C
4 10 B
5 0 A
5 0 B
col1 and col2 are my lookup/joining columns columns, if col2 is having "non zero" data then I need to ignore/filter record with 0 (in above example I need to filter record rownum 3 4 and 7) If col2 is not having any data other than "non zero" in that case only select record with 0 (in above example col1 with value 1 and 5).
I m trying to write sql for this. Hope I have mentioned requirement clearly, please let me know if you need anything more from my side. Seem to have gone blank in this case.
Database - Oracle 10g
SELECT col1,
col2,
col3
FROM (SELECT col1,
col2,
col3,
sum(col2) OVER (PARTITION BY col1) sum_col2
FROM tab1)
WHERE ( ( sum_col2 <> 0
AND col2 <> 0)
OR sum_col2 = 0)
If col2 can be negative and the requirement is that the sum of col2 has "non-zero" data then the above is OK, however, if it is the requirement that any col2 value has "non-zero" data then it should be changed to:
SELECT col1,
col2,
col3
FROM (SELECT col1,
col2,
col3,
sum(abs(col2)) OVER (PARTITION BY col1) sum_col2
FROM tab1)
WHERE ( ( sum_col2 <> 0
AND col2 <> 0)
OR sum_col2 = 0)
SELECT t1.*
FROM tab1 t1
JOIN (SELECT "col1", MAX("col2") AS max2
FROM tab1
GROUP BY "col1") t2
ON t1."col1" = t2."col1"
WHERE ((max2 = 0 AND "col2" = 0)
OR
(max2 != 0 AND "col2" != 0))
ORDER BY "rownum"
DEMO

SQL Delete duplicate records and leave the rest

I have 2 tables a and b. A have 5 records and B have same records as A but 7 rows. Thats is same values in 7 rows. I wants to delete only the first 5 records in B since the row number is matches with A. How to do this. please help me.
table :A
col1 col2 col3 DuplicateCount
1 2 n 1
1 2 n 2
1 2 n 3
1 2 n 4
2 2 m 1
2 2 m 2
table b:
col1 col2 col3 DuplicateCount
1 2 n 1
1 2 n 2
1 2 n 3
1 2 n 4
1 2 n 5
1 2 n 6
desired data should reside in table b is
col1 col2 col3 DuplicateCount
1 2 n 5
1 2 n 6
which is nothing but the last 2 rows in the table b.
Try this :
delete from TableB
WHERE Id IN
(
select b.id
from TableB b, TableA a
WHERE b.Id = a.ID
)
I added id column to identify rows in table B, I am not sure how to delete only some of duplicate rows without id column:
declare #a table
(
id int primary key,
col1 int,
col2 int,
col3 varchar
)
declare #b table
(
id int primary key,
col1 int,
col2 int,
col3 varchar
)
insert into #a values (1,1,2,'n')
insert into #a values (2,1,2,'n')
insert into #a values (3,1,2,'n')
insert into #a values (4,1,2,'n')
insert into #a values (5,2,2,'n')
insert into #a values (6,2,2,'n')
insert into #b values (10,1,2,'n')
insert into #b values (20,1,2,'n')
insert into #b values (30,1,2,'n')
insert into #b values (40,1,2,'n')
insert into #b values (50,1,2,'n')
insert into #b values (60,1,2,'n')
delete from #b
where id in
(
(
select t1.id from
(
select
id,
cnt = count(*) over(partition by col1, col2, col3),
rn = row_number() over(partition by col1, col2, col3 order by id)
from #b
) t1
join
(
select
*,
cnt = count(*) over(partition by col1, col2, col3)
from #a
) t2 on
t1.cnt > 1 and t1.rn <= t2.cnt
)
)
select * from #b
You can use TOP key word for deleting first five records
DELETE TOP (select * from TableA a,TableB b where a.col1=b.col1 AND a.col2=b.col2 AND
a.col3=b.col3) FROM TableA
or
Note: The below is an example for deleting one or more records based on their IDs
DELETE From yourTable where ID in (2,3,4,5,6)