Search duplicated values and show their context

Search duplicated values and show their context - sql

I'm not sure if the solution for this question is incredible simple or not possible in pure SQL.
I have a simple table with 2 columns
Number Text
1 a
1 b
2 m
3 x
3 y
3 z
Now the task is:
Search all repeated numbers and show the "Text" which uses these duplicated numbers.
We see: 1 is used twice (with a and b), 3 is used with x and y and z. But no line is completely duplicated.
Edit:
So I expect something like this.
Dup_Num Text
1 a
1 b
3 x
3 y
3 z
The search for the duplicate is easy, but I don't have an idea how to connect is with "Text", because when I add "Text" to my SELECT I have to use it for GROUP and this give no duplicates ..
Thanks for help on a lousy day ..

If I understand correctly, you can use exists:
select t.*
from t
where exists (select 1 from t t2 where t2.number = t.number and t2.text <> t.text)
order by t.number;
For performance, you want an index on (number, text).

The canonical way to find duplicates in SQL is the self join.
In your example:
select s1.*
from stuff s1
inner join stuff s2
on s1.number = s2.number
and s1.text <> s2.text

In your case, you might want to use LISTAGG to group those values and its relation to the other column
SQL> with result
as (
select '1' as c1 , 'a' as c2 from dual union all
select '1' as c1 , 'b' as c2 from dual union all
select '2' as c1 , 'm' as c2 from dual union all
select '3' as c1 , 'x' as c2 from dual union all
select '3' as c1 , 'y' as c2 from dual union all
select '3' as c1 , 'z' as c2 from dual )
select c1, listagg(c2,',') within group(order by c1) as c3 from result
group by c1;
C C3
- --------------------
1 a,b
2 m
3 x,y,z
SQL>

This might also help you.
select * from t where Number in
(select Number from t group by Number having count(*) > 1)
order by Text
Fiddle

Related

SQL query : how to check existence of multiple rows with one query

I have this table MyTable:
PROG VALUE
-------------
1 aaaaa
1 bbbbb
2 ccccc
4 ddddd
4 eeeee
now I'm checking the existence of a tuple with a certain id with a query like
SELECT COUNT(1) AS IT_EXISTS
FROM MyTable
WHERE ROWNUM = 1 AND PROG = {aProg}
For example I obtain with aProg = 1 :
IT_EXISTS
---------
1
I get with aProg = 3 :
IT_EXISTS
---------
0
The problem is that I must do multiple queries, one for every value of PROG to check.
What I want is something that with a query like
SELECT PROG, ??? AS IT_EXISTS
FROM MyTable
WHERE PROG IN {1, 2,3, 4, 5} AND {some other condition}
I can get something like
PROG IT_EXISTS
------------------
1 1
2 1
3 0
4 1
5 0
The database is Oracle...
Hope I'm clear
regards
Paolo

Take a step back and ask yourself this: Do you really need to return the rows that don't exist to solve your problem? I suspect the answer is no. Your application logic can determine that records were not returned which will allow you to simplify your query.
SELECT PROG
FROM MyTable
WHERE PROG IN (1, 2, 3, 4, 5)
If you get a row back for a given PROG value, it exists. If not, it doesn't exist.
Update:
In your comment in the question above, you stated:
the prog values are from others tables. The table of the question has only a subset of the all prog values
This suggests to me that a simple left outer join could do the trick. Assuming your other table with the PROG values you're interested in is called MyOtherTable, something like this should work:
SELECT a.PROG,
CASE WHEN b.PROG IS NOT NULL THEN 1 ELSE 0 END AS IT_EXISTS
FROM MyOtherTable AS a
LEFT OUTER JOIN MyTable AS b ON b.PROG = a.PROG
A WHERE clause could be tacked on to the end if you need to do some further filtering.

I would recommend something like this. If at most one row can match a prog in your table:
select p.prog,
(case when t.prog is null then 0 else 1 end) as it_exists
from (select 1 as prog from dual union all
select 2 as prog from dual union all
select 3 as prog from dual union all
select 4 as prog from dual union all
select 5 as prog from dual
) p left join
mytable t
on p.prog = t.prog and <some conditions>;
If more than one row could match, you'll want to use aggregation to avoid duplicates:
select p.prog,
max(case when t.prog is null then 0 else 1 end) as it_exists
from (select 1 as prog from dual union all
select 2 as prog from dual union all
select 3 as prog from dual union all
select 4 as prog from dual union all
select 5 as prog from dual
) p left join
mytable t
on p.prog = t.prog and <some conditions>
group by p.prog
order by p.prog;

One solution is to use (arguably abuse) a hierarchical query to create an arbitrarily long list of numbers (in my example, I've set the largest number to max(PROG), but you could hardcode this if you knew the top range you were looking for). Then select from that list and use EXISTS to check if it exists in MYTABLE.
select
PROG
, case when exists (select 1 from MYTABLE where PROG = A.PROG) then 1 else 0 end IT_EXISTS
from (
select level PROG
from dual
connect by level <= (select max(PROG) from MYTABLE) --Or hardcode, if you have a max range in mind
) A
;

It's still not very clear where you get the prog values to check. But if you can read them from a table, and assuming that the table doesn't contain duplicate prog values, this is the query I would use:
select a.prog, case when b.prog is null then 0 else 1 end as it_exists
from prog_values_to_check a
left join prog_values_to_check b
on a.prog = b.prog
and exists (select null
from MyTable t
where t.prog = b.prog)
If you do need to hard code the values, you can do it rather simply by taking advantage of the SYS.DBMS_DEBUG_VC2COLL function, which allows you to convert a comma-delimited list of values into rows.
with prog_values_to_check(prog) as (
select to_number(column_value) as prog
from table(SYS.DBMS_DEBUG_VC2COLL(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) -- type your values here
)
select a.prog, case when b.prog is null then 0 else 1 end as it_exists
from prog_values_to_check a
left join prog_values_to_check b
on a.prog = b.prog
and exists (select null
from MyTable t
where t.prog = b.prog)
Note: The above queries take into account that the MyTable table may have multiple rows with the same prog value, but that you only want one row in the result. I make this assumption based the WHERE ROWNUM = 1 condition in your question.

Denormalized table. SQL Select

As I understand I have a denormalized table. Here is some list of table columns:
... C, F, T, C1, F1, T1, .... C8, T8, F8.....
Is it possible to select those values in a rows?
Something like this:
C, F, T
C1, F1, T1
......
C8, F8, T8

You can do it easily with a union all:
select C, F, T from table t
union all
select C1, F1, T1 from table t
union all
. . .
select C8, F8, T8 from table t;
Note the use of union all instead of union. union does automatic duplicate elimination, so you might not get all your values with union (as well as it being a more expensive operation).
This will generally result in the table being scanned 9 times. If you have a large table, there are other methods that are likely to be more efficient.
EDIT:
A more efficient method is likely to be a cross join and case. In DB2, I think this would be:
select (case n.n when 0 then C
when 1 then C1
. . .
when 8 then C8
end) as C,
(case n.n when 0 then F
when 1 then F1
. . .
when 8 then F8
end) as F,
(case n.n when 0 then T
when 1 then T1
. . .
when 8 then T8
end) as T
from table t cross join
(select 0 as n from sysibm.sysdummy1 union all select 1 from sysibm.sysdummy1 union all . . .
select 9 from sysibm.sysdummy1
) n;
This may seem like more work, but it should only be reading the bigger table once, with the rest of the work being in-memory operations.

select c,f,t from table
union all
select c1,f1,t1 from table
union all
select c8,f8,t8 from table
Make sure to filter by WHERE clause each SELECT statement.

A trick similar to Gordons is to use LATERAL to avoid multiple scans:
with t(c1,t1,f1,c2,t2,f2) as ( values (1,2,3,4,5,6) )
select y.c, y.t, y.f
from t x
cross join lateral ( values (x.c1, x.t1, x.f1)
, (x.c2, x.t2, x.f2) ) y(c,t,f)
C T F
----------- ----------- -----------
1 2 3
4 5 6

How can I Pivot a table in DB2? [duplicate]

This question already has answers here:
Pivoting in DB2
(3 answers)
Closed 5 years ago.
I have table A, below, where for each unique id, there are three codes with some value.
ID Code Value
---------------------
11 1 x
11 2 y
11 3 z
12 1 p
12 2 q
12 3 r
13 1 l
13 2 m
13 3 n
I have a second table B with format as below:
Id Code1_Val Code2_Val Code3_Val
Here there is just one row for each unique id. I want to populate this second table B from first table A for each id from the first table.
For the first table A above, the second table B should come out as:
Id Code1_Val Code2_Val Code3_Val
---------------------------------------------
11 x y z
12 p q r
13 l m n
How can I achieve this in a single SQL query?

select Id,
max(case when Code = '1' then Value end) as Code1_Val,
max(case when Code = '2' then Value end) as Code2_Val,
max(case when Code = '3' then Value end) as Code3_Val
from TABLEA
group by Id

SELECT Id,
max(DECODE(Code, 1, Value)) AS Code1_Val,
max(DECODE(Code, 2, Value)) AS Code2_Val,
max(DECODE(Code, 3, Value)) AS Code3_Val
FROM A
group by Id

If your version doesn't have DECODE(), you can also use this:
INSERT INTO B (id, code1_val, code2_val, code3_val)
WITH Ids (id) as (SELECT DISTINCT id
FROM A) -- Only to construct list of ids
SELECT Ids.id, a1.value, a2.value, a3.value
FROM Ids -- or substitute the actual id table
JOIN A a1
ON a1.id = ids.id
AND a1.code = 1
JOIN A a2
ON a2.id = ids.id
AND a2.code = 2
JOIN A a3
ON a3.id = ids.id
AND a3.code = 3
(Works on my V6R1 DB2 instance, and have an SQL Fiddle Example).

Here is a SQLFiddle example
insert into B (ID,Code1_Val,Code2_Val,Code3_Val)
select Id, max(V1),max(V2),max(V3) from
(
select ID,Value V1,'' V2,'' V3 from A where Code=1
union all
select ID,'' V1, Value V2,'' V3 from A where Code=2
union all
select ID,'' V1, '' V2,Value V3 from A where Code=3
) AG
group by ID

Here is the SQL Query:
insert into pivot_insert_table(id,code1_val,code2_val, code3_val)
select * from (select id,code,value from pivot_table)
pivot(max(value) for code in (1,2,3)) order by id ;

WITH Ids (id) as
(
SELECT DISTINCT id FROM A
)
SELECT Ids.id,
(select sub.value from A sub where Ids.id=sub.id and sub.code=1 fetch first rows only) Code1_Val,
(select sub.value from A sub where Ids.id=sub.id and sub.code=2 fetch first rows only) Code2_Val,
(select sub.value from A sub where Ids.id=sub.id and sub.code=3 fetch first rows only) Code3_Val
FROM Ids

You want to pivot your data. Since DB2 has no pivot function, yo can use Decode (basically a case statement.)
The syntax should be:
SELECT Id,
DECODE(Code, 1, Value) AS Code1_Val,
DECODE(Code, 2, Value) AS Code2_Val,
DECODE(Code, 3, Value) AS Code3_Val
FROM A

SELECT DISTINCT for data groups

I have following table:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:
DataID Data
101 A
102 A
102 B
103 C
104 D
So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
How can I do that?
PS. Code to generate the first table:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'

In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this
with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings

The following query using CASE will give you the result shown below.
From there on, getting the distinct datagroups and proceeding further should not really be a problem.
SELECT
id,
MAX(CASE data WHEN 'A' THEN data ELSE '' END) +
MAX(CASE data WHEN 'B' THEN data ELSE '' END) +
MAX(CASE data WHEN 'C' THEN data ELSE '' END) +
MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM t1
GROUP BY id
ID DataGroups
1 A
2 AB
3 AB
4 C
5 D
6 AB
However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.
In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)
LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.

From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:
with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
select ID,data
from #t1 where id in (select id from NonDups)
), Renumbered as (
select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered
Giving:
1 A
2 A
2 B
3 C
4 D
I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.

Just to share my own dirty solution that I'm using for the moment:
SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY (
SELECT CAST(Data AS VARCHAR) + ','
FROM #t1 t2
WHERE t2.id = t1.id
ORDER BY Data ASC
FOR XML PATH('') )
D ( Data )
And then going analog to LuckyLuke's solution.

sql select to start with a particular record

Is there any way to write a select record starting with a particular record? Suppose I have an table with following data:
SNO ID ISSUE
----------------------
1 A1 unknown
2 A2 some_issue
3 A1 unknown2
4 B1 some_issue2
5 B3 ISSUE4
6 B1 ISSUE4
Can I write a select to start showing records starting with B1 and then the remaining records? The output should be something like this:
4 B1 some_issue2
6 B1 ISSUE4
1 A1 unknown
2 A2 some_issue
3 A1 unknown2
5 B3 ISSUE4
It doesn't matter if B3 is last, just that B1 should be displayed first.

Couple of different options depending on what you 'know' ahead of time (i.e. the id of the record you want to be first, the sno, etc.):
Union approach:
select 1 as sortOrder, SNO, ID, ISSUE
from tableName
where ID = 'B1'
union all
select 2 as sortOrder, SNO, ID, ISSUE
from tableName
where ID <> 'B1'
order by sortOrder;
Case statement in order by:
select SNO, ID, ISSUE
from tableName
order by case when ID = 'B1' then 1 else 2 end;
You could also consider using temp tables, cte's, etc., but those approaches would likely be less performant...try a couple different approaches in your environment to see which works best.

Assuming you are using MySQL, you could either use IF() in an ORDER BY clause...
SELECT SNO, ID, ISSUE FROM table ORDER BY IF( ID = 'B1', 0, 1 );
... or you could define a function that imposes your sort order...
DELIMITER $$
CREATE FUNCTION my_sort_order( ID VARCHAR(2), EXPECTED VARCHAR(2) )
RETURNS INT
BEGIN
RETURN IF( ID = EXPECTED, 0, 1 );
END$$
DELIMITER ;
SELECT SNO, ID, ISSUE FROM table ORDER BY my_sort_sort( ID, 'B1' );

select * from table1
where id = 'B1'
union all
select * from table1
where id <> 'B1'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Search duplicated values and show their context - sql

If I understand correctly, you can use exists: select t.* from t where exists (select 1 from t t2 where t2.number = t.number and t2.text <> t.text) order by t.number; For performance, you want an index on (number, text).

The canonical way to find duplicates in SQL is the self join. In your example: select s1.* from stuff s1 inner join stuff s2 on s1.number = s2.number and s1.text <> s2.text

This might also help you. select * from t where Number in (select Number from t group by Number having count(*) > 1) order by Text Fiddle

Related

SQL query : how to check existence of multiple rows with one query

Denormalized table. SQL Select

How can I Pivot a table in DB2? [duplicate]

SELECT DISTINCT for data groups

sql select to start with a particular record

Categories

Resources