Selecting duplicates with a unique identifying column

Selecting duplicates with a unique identifying column - sql

I have a table that looks like this (simplified)
| uniqueID | value1 | value2 | value3 |
|:--------:|:------:|:------:|:------:|
| 1 | a | b | c |
| 2 | e | f | g |
| 3 | a | b | c |
| 4 | a | b | c |
| 5 | e | f | g |
The end goal is to get a list of uniqueIDs that have the same value1, value2, and value3 but without the first occurence. For the table above I would ideally like the result of the query to be:
| uniqueID |
|:--------:|
| 3 |
| 4 |
| 5 |
This way I can then remove those uniqueID's from the table later. My current code looks like this:
select value1, value2, value3, count(*)
from myTable
group by value1, value2, value3 having count(*) > 1;
This gets me:
| value1 | value2 | value3 | count(*) |
|:------:|:------:|:------:|:--------:|
| a | b | c | 3 |
| e | f | g | 2 |
Which works great to see which set of values are duplicated but does not help me identify the uniqueID for them.
Thanks

You might try something like this:
SELECT uniqueID, value1, value2, value3 FROM (
SELECT uniqueID, value1, value2, value3
, ROW_NUMBER() OVER ( PARTITION BY value1, value2, value3 ORDER BY uniqueID ) AS rn
FROM mytable
) WHERE rn > 1;
This will get all the unique combinations of values for which more than one exists and will eliminate the first (by filtering on the result of ROW_NUMBER()) where "first" is the minimum value of uniqueID for that combination.
If you wanted to get the ones that you don't want removed, you could do the following instead:
SELECT uniqueID, value1, value2, value3 FROM (
SELECT uniqueID, value1, value2, value3
, ROW_NUMBER() OVER ( PARTITION BY value1, value2, value3 ORDER BY uniqueID ) AS rn
FROM mytable
) WHERE rn = 1;
EDIT: Fixed some identifier names. Really, not a good idea to use CamelCase and headlessCamelCase in Oracle, where your table names and column names are just going to be converted to uppercase (unless you quote your identifiers).

Related

Filter out rows from the final result, while still utilizing some of their values?

To give an example, let's say I have a view that returns the following result:
| id | foreignkey | value1 | value2 |
|----|------------|--------|--------|
| 1 | 500 | -100 | 0 |
| 2 | 500 | 900 | 15 |
| 3 | 500 | 570 | 25 |
| 4 | 999 | 100 | 57 |
| 5 | 999 | 150 | 0 |
The logic I'm trying to implement is as follows -
Filter out all rows that have value2 = 0.
But, for rows that have value2 = 0, I need to add it's value1 to the value1 of all other rows with the same foreign key where value2 != 0. If there are no other rows with the same foreign key, then rows with value2 = 0 simply get filtered out.
So in this example, I want the final result to be
| id | foreignkey | value1 | value2 |
|----|------------|--------|--------|
| 2 | 500 | 800 | 15 |
| 3 | 500 | 470 | 25 |
| 4 | 999 | 250 | 57 |
Any ideas? I was thinking something with group by might be possible but haven't been able to come up with a solution yet.

With SUM() window function:
select id, foreignkey, value1 + coalesce(total, 0) value1, value2
from (
select *,
sum(case when value2 = 0 then value1 end) over (partition by foreignkey) total
from tablename
) t
where value2 <> 0
See the demo.
Results:
> id | foreignkey | value1 | value2
> -: | ---------: | -----: | -----:
> 2 | 500 | 800 | 15
> 3 | 500 | 470 | 25
> 4 | 999 | 250 | 57

Hmmm . . . assuming that this doesn't filter out all rows, you can use window functions like this:
select id, foreignkey, value1, value2 + (case when seqnum = 1 then value2_0 else 0 end)
from (select t.*,
row_number() over (partition by foreignkey order by value1 desc) as seqnum,
sum(case when value1 = 0 then value2 end) over (partition by foreignkey) as value2_0
from t
) t
where value2 <> 0;

One way is to treat all zero rows as one group and all others as another group (based on foreignkey) and then simply join and add the values and finally select only the required ones:
;with cte as
(
select id, foreignkey, value1, value2,dense_rank() over (partition by foreignkey order by (case when value2 = 0 then 0 else 1 end)) as rn
from #t t1
)
,cte2 as
(
select t1.id, t1.foreignkey, t1.value1 + isnull(t2.value1,0) as value1, t1.value2
from cte t1
left join cte t2 on (t2.foreignkey = t1.foreignkey and t1.rn<> t2.rn)
)
select * from cte2
where value2 <> 0
Please find the db<>fiddle here.

How to make 2 columns from one in one select in sqlite?

I've got one database with two columns (id and value). There are two types of values and each id has both of this values. How can I make a select to this database to have three columns in result (id, value1 and value2)
I've tried CASE and GROUP BY, but it shows only one result of each id
Example of a db:
| id | value |
| 0 | a |
| 0 | b |
| 1 | a |
| 1 | b |
Example of the result I am looking for is:
| id | value_a | value_b |
| 0 | a | b |
| 1 | a | b |
UPDATE:
As it was noted in comments, there is too simple data in the example.
The problem is more complicated
An example that would better describe it:
DB:
| id | value | value2 | value3 |
| 0 | a | a2 | a3 |
| 0 | b | b2 | b3 |
| 1 | a | c2 | c3 |
| 1 | b | d2 | d3 |
RESULT:
| id | value_a | value_b | value2_a | value2_b | value3_a | value3_b |
| 0 | a | b | a2 | b2 | a3 | b3 |
| 1 | a | b | c2 | d2 | c3 | d3 |
The output should be sorted by id an have all info from the both rows of each id.

If there are always two values per ID, you can try an aggregation using min() and max().
SELECT id,
min(value) value_a,
max(value) value_b
FROM elbat
GROUP BY id;

select t0.id,t0.Value as Value_A, t1.Value as Value_B
from test t0
inner join test t1 on t0.id = t1.id
where t0.Value = 'a' and t1.value = 'b';

I have used this method to turn "rows" into "columns". Depending on the number of unique values that exist in the table, you may or may not want to use this :)
SELECT id, SUM(CASE WHEN value = "a" then 1 else 0 END) value_a,
SUM(CASE WHEN value = "b" then 1 else 0 END) value_b,
SUM(CASE WHEN value = "c" then 1 else 0 END) value_c,
SUM(CASE WHEN value ="a2" then 1 else 0 END) value_a2,
.
.
.
FROM table
GROUP BY id;

Thanks all for the answers! This is the way how I did this:
WITH a_table AS
(
SELECT id, value, value2, value3 FROM table1 WHERE table1.value = 0
),
b_table AS
(
SELECT id, value, value2, value3 FROM table1 WHERE table1.value = 1
)
SELECT DISTINCT
a_table.id AS id,
a_table.value AS value_a,
a_table.value2 AS value2_a,
a_table.value3 AS value3_a,
b_table.value AS value_b,
b_table.value2 AS value2_b,
b_table.value3 AS value3_b
FROM a_table
JOIN b_table ON a_table.id = b_table.id
GROUP BY id;

Selecting row where Value = Parameter or Value is NULL AND Parameter is not in Value

I am trying to select a row from a table with the logic that if a value in a field matches a parameter or the field is null and no record matches the parameter, return that row. This is a super simple example:
Sample Table:
ID | Value1 | Value2 | Value3
===|========|========|===========
1 | NULL | Hello | World
2 | Hello | NULL | World
3 | NULL | Hello | Everybody
Sample query that doesn't quite work:
SELECT *
FROM table
WHERE (Value1 IS NULL OR Value1 = 'Hello')
AND (Value3 IS NULL OR Value3 = 'World')
Results that I get with above query:
ID | Value1 | Value2 | Value3
===|========|========|========
1 | NULL | Hello | World
2 | Hello | NULL | World
My Desired Results:
ID | Value1 | Value2 | Value3
===|========|========|========
2 | Hello | NULL | World
Unfortunately, due to the complexity of the table and the select, I can't do similar to the following:
SELECT *
FROM table
WHERE ((Value1 IS NULL
AND NOT EXIST (SELECT * FROM table WHERE Value1 = 'Hello')
OR Value1 = 'Hello'))
AND (Value3 IS NULL OR Value3 = 'World')

SQL DB2 Select multiple columns values for multiple instances of IDs

Here is my data:
| ID | FIELD1 | FIELD2 | FIELD3 |
|-------------------------------|
| 1 | NULL | value1 | value2 |
|-------------------------------|
| 2 | NULL | value3 | NULL |
|-------------------------------|
| 3 | value4 | NULL | NULL |
|-------------------------------|
| 4 | value5 | value6 | value7 |
|-------------------------------|
| .. | ... | .... | .... |
Here is what I need to select:
| ID | ID2 | FIELDX |
|-------------------|
| 1 | 10 | value1 |
| 1 | 10 | value2 |
| 2 | 20 | value3 |
| 3 | 30 | value4 |
| 4 | 40 | value5 |
| 4 | 40 | value6 |
| 4 | 40 | value7 |
| .. | .. | .... |
The order of the data doesn't really matter. What matters is that each ID appears once for every associated FIELD1,2,3... value. Please note that there are many fields. I just chose to use these three as an example.
My attempt at the solution was this query:
SELECT x.ID, a.ID2, x.FIELDX
FROM (
SELECT t.ID, t.FIELD1
FROM SCHEMA1.TABLE1 t
UNION ALL
SELECT t.ID, t.FIELD2
FROM SCHEMA1.TABLE1 t
UNION ALL
SELECT t.ID, t.FIELD3
FROM SCHEMA1.TABLE1 t
) x
JOIN SCHEMA2.TABLE2 a ON x.ID = a.ID
WHERE x.FIELDX != NULL
WITH UR;
While this does do the job, I would rather not have to add a new inner select statement for each additional field. Moreover, I feel as though there is a more efficient way to do it.
Please advise.

DB2 doesn't have an explicit unpivot and your method is fine. A more efficient method is probably to do:
SELECT id, id2, fieldx
FROM (SELECT x.ID, a.ID2,
(case when col = 'field1' then field1
when col = 'field2' then field2
when col = 'field3' then field3
end) as FIELDX
FROM SCHEMA1.TABLE1 x join
SCHEMA2.TABLE2 a
on x.ID = a.ID cross join
(select 'field1' as col from sysibm.sysdummy1 union all
select 'field2' from sysibm.sysdummy1 union all
select 'field3' from sysibm.sysdummy1
) c
) x
WHERE x.FIELDX is not NULL;
This doesn't necessarily simplify the code. It does make it easier for DB2 to optimize the joins. And it only requires reading table1 once instead of one time for each column.
As a note: you should use fieldx is not null rather than fieldx != null.

SQL to merge max values from multiple rows

suppose I have a table
-----------------------------------------------
| id | value1 | value2 | value3 |
-----------------------------------------------
| 102 | 10 | 1 | 3 |
-----------------------------------------------
| 102 | 2 | 11 | 0 |
-----------------------------------------------
| 102 | 0 | 9 | 13 |
-----------------------------------------------
| 102 | 3 | 5 | 7 |
-----------------------------------------------
and for each distinct id I want to return a row with max value in columns value1, value2 and value3, i.e.
-----------------------------------------------
| id | value1 | value2 | value3 |
-----------------------------------------------
| 102 | 10 | 11 | 13 |
-----------------------------------------------
(of course there are other ids than 102 in the table)
I managed to do it with "partition by" but the problem is that I have to use it in powerbuilder's datawindow, and as soon as I paste it there the whole IDE crashes and project gets corrupted.
I managed to create an sql that for each row does 3 inner joins with selects that return max of every column.
Is there any other easier way?
Thanks in advance for answering!

use GROUP BY and MAX()
SELECT id,
MAX(value1) val1,
MAX(value2) val2,
MAX(value3) val3
FROM tableName
GROUP BY ID
SQLFiddle Demo

SELECT id, MAX(value1) value1, MAX(value2) value2, MAX(value3) value3
FROM yourtable
GROUP BY id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting duplicates with a unique identifying column - sql

Related

Filter out rows from the final result, while still utilizing some of their values?

How to make 2 columns from one in one select in sqlite?

Selecting row where Value = Parameter or Value is NULL AND Parameter is not in Value

SQL DB2 Select multiple columns values for multiple instances of IDs

SQL to merge max values from multiple rows

Categories

Resources