I want to find the duplicates of two columns in the same table.
Example data set is as follows.
Column_1 Column_2
**15440100000220** 15440300002980
15440100000150 **15440100000220**
15440100000170 **15440300002160**
**15440300002160** 15440100006170
As you can see, I have duplicates in the two columns. Records in the first column are present in the second column and records in the second are present in the first.
I looked for a solution but only came across examples comparing duplicates of two tables.
Is there a way to get these duplicates into a select query? If a record in column 2 is present in column1, then that record in column 2 should be captured in the query.
Another approach to just list the column_2 values that also appear in column_1 is to use exists:
select column_2
from your_table yt
where exists (
select null
from your_table yt2
where yt2.column_1 = yt.column_2
);
I think the intent of this is clearer, but you should check the performance of the various approaches.
you could use an having on the union subselect
select column_1, count(*) from (
select column_1 as column_1
from my_table
union all
select column_2
from my_table
) t
group by column_1
having count(*) > 1
You could self-join the table:
SELECT t1.column_1 AS col1, t1.column_2 AS col2,
t2.column_1 AS duplicate_col1, t2.column+2 AS duplicate_col2
FROM mytable t1
JOIN mytable t2 ON t1.column_1 = t2.column_2
You just want those duplicated id's? Do a self join:
select distinct t1.column_1
from tablename t1
join tablename t2 on t1.column_1 = t2.column_2
Related
I have a table (Table1) like the following:
Col1
Col2
First
Code1,Code2,Code3
Second
Code2
So Col2 can contain multiple values comma separated, I have another table (Table2) that contains this:
ColA
ColB
Code1
Value1
Code2
Vaue2
Code3
Vaue3
I need to create a view that joins the two tables (Table1 and Table2) and returns something like this:
Col1
Col2
First
Value1,Value2,Value3
Second
Value2
Is that possible? (I'm on Oracle DB if that helps.)
It's a violation of first normal form to have a list in a column value like that. It causes a lot of difficulties in a relational database, like the one you are encountering now.
However, you can get what you want by using the LIKE operator to find colA values that are substrings of the Col2 column. Add delimiters before and after to catch the first and last ones. Then aggregate back up to a single list using LISTAGG.
SELECT table1.col1,
LISTAGG(table2.colB,',') WITHIN GROUP (ORDER BY table2.colB) value_list
FROM table1,
table2
WHERE ','||table1.col2||',' LIKE '%,'||table2.colA||',%'
GROUP BY table1.col1
This will not perform well on large volumes, because without an equijoin it's going to use nested loops, and you can't use an index on a LIKE predicate with % at the beginning. The combination of nested loops + FTS is not pleasant with large volumes of data. Therefore, if this is your situation, you will need to fix the 1NF problem by transforming table1 into normal relational format, and then join it to table2 with an equijoin, which will enable it to use a hash join instead. So:
SELECT table1.col1,
LISTAGG(table2.colB,',') WITHIN GROUP (ORDER BY table2.colB) value_list
FROM (SELECT t.col1,
SUBSTR(t.col2,INSTR(t.col2,',',1,seq)+1,INSTR(t.col2,',',1,seq+1)-(INSTR(t.col2,',',1,seq)+1)) col2_piece
FROM (SELECT col1,
','||col2||',' col2
FROM table1) t,
(SELECT ROWNUM seq FROM dual CONNECT BY LEVEL < 10) x) table1,
table2
WHERE table1.col2_piece IS NOT NULL
AND table1.col2_piece = table2.colA
GROUP BY table1.col1
If you want the values in the same order in the list as the terms then you can use:
SELECT t1.col1,
LISTAGG(t2.colb, ',') WITHIN GROUP (
ORDER BY INSTR(','||t1.col2||',', ','||t2.colA||',')
) AS value2
FROM table1 t1
INNER JOIN table2 t2
ON INSTR(','||t1.col2||',', ','||t2.colA||',') > 0
GROUP BY
t1.col1
Which, for the sample data:
CREATE TABLE Table1 (Col1, Col2) AS
SELECT 'First', 'Code1,Code2,Code3' FROM DUAL UNION ALL
SELECT 'Second', 'Code2' FROM DUAL;
CREATE TABLE Table2 (ColA, ColB) AS
SELECT 'Code1', 'XXXX' FROM DUAL UNION ALL
SELECT 'Code2', 'ZZZZ' FROM DUAL UNION ALL
SELECT 'Code3', 'YYYY' FROM DUAL;
Outputs:
COL1
VALUE2
First
XXXX,ZZZZ,YYYY
Second
ZZZZ
fiddle
I've got a query like this
select column, count(*)
from mytable
where column in ('XXX','YYY','ZZZ',....)
group by column;
But I want also to get a row for values the aren't in the table.
Let's suppose that 'ZZZ' doesn't exist in mytable, I'd like to get:
COLUMN COUNT(*)
XXX 3
YYY 2
ZZZ 0 (or NULL)
Oracle version 10g
Thanks in advance
Mark
In general, you would need to have a second table which contains all the possible column values whose counts you want to appear in the output. For demo purposes only, we can use a CTE for that:
WITH vals AS (
SELECT 'XXX' AS val UNION ALL
SELECT 'YYY' UNION ALL
SELECT 'ZZZ'
)
SELECT t1.val, COUNT(t2.col) AS cnt
FROM vals t1
LEFT JOIN mytable t2
ON t2.col = t1.val
GROUP BY
t1.val;
I am trying to create a query with multiple WITH clause in bigquery. I am getting an error : Duplicate column names in the result are not supported. Found duplicate(s): because I have some repeated columns in the tables.
Problem is I can't remove them as I need it to display in my table also they are needed in the group by clause in the tables.
My code somewhat looks like this:
WITH table0 as (## query0),
table1 AS (## query1),
table2 as (## query2),
table3 as (## query3),
table4 as (## query4),
table5 as (## query 5)
select
*
from
table0,
table1,
table2,
table3,
table4,
table5
How do I handle duplicate columns in multiple WITH clause in SQL
Why are you creating a Cartesian product of the subqueries?
In any case, BigQuery gives you more control over the columns than other databases. So, if col1 is common to table0 and table1, you can do:
select t1.*, t2.* except (col1)
If you want to keep both values:
select t1.*, t2.* except (col1), t2.col1 as t2_col1
or
select t1.* except (col1),
t2.* except (col1),
t1.col1 as t1_col1,
t2.col1 as t2_col1
From your previous question (that looks like was deleted) I think I remember your use case and there you had (I can be wrong ) only fields that are used for JOINs are "duplicate"
In such cases you can use below approach (in below example it is assumed that those duplicate fields are id and day)
#standardSQL
SELECT *
FROM `project.dataset.table0`
JOIN `project.dataset.table1` USING(id, day)
JOIN `project.dataset.table2` USING(id, day)
JOIN `project.dataset.table3` USING(id, day)
for example in below super-simplified dummy example
#standardSQL
WITH `project.dataset.table0` AS (
SELECT 1 id, '2019-01-01' day, 0 col0
), `project.dataset.table1` AS (
SELECT 1 id, '2019-01-01' day, 1 col1
), `project.dataset.table2` AS (
SELECT 1 id, '2019-01-01' day, 2 col2
), `project.dataset.table3` AS (
SELECT 1 id, '2019-01-01' day, 3 col3
)
SELECT *
FROM `project.dataset.table0`
JOIN `project.dataset.table1` USING(id, day)
JOIN `project.dataset.table2` USING(id, day)
JOIN `project.dataset.table3` USING(id, day)
result will be
Row id day col0 col1 col2 col3
1 1 2019-01-01 0 1 2 3
w/o any complains about duplicate fields
As you can see from above example - using USING() instead of ON "magicaly" resolves the issue - but again - note - only for case when "duplicate" fields are all JOIN fields
Using SQL Server, I have two tables that I want to merge. Table2 has a tbID column that has NULL values. I would like for NULL values to be automatically updated, beginning with the next value of the tbID column, from Table1 when the tables are merged.
I am entering something like this..
SELECT
A.tbID,
A.etID,
A.cNumber,
A.cName
FROM
Table1 AS A
UNION ALL
SELECT
B.tbID,
B.etID,
B.cNumber,
B.cName
FROM
Table2 AS B
My results has a NULL value (instead of an automatically inserted number), in the records from Table2.
If by "merge" you want all the records to end up in table1 - just leave the id column out of the insert:
INSERT INTO table1 (etID, cNumber, cName)
SELECT etID, cNumber, cName
FROM table2
If you just want to select and see incremented ids for the data coming from table2, here is one way:
SELECT A.tbID,
A.etID,
A.cNumber,
A.cName
FROM Table1 AS A
UNION ALL
SELECT (SELECT MAX(tbID) FROM Table1) +
ROW_NUMBER() OVER (ORDER BY B.etID),
B.etID,
B.cNumber,
B.cName
FROM Table2 AS B
If you just want a result set, you can use:
SELECT A.tbID, A.etID, A.cNumber, A.cName
FROM Table1 A
UNION ALL
SELECT (a.maxID + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) as tbID,
B.etID, B.cNumber, B.cName
FROM Table2 B CROSS JOIN
(SELECT MAX(A.tbID) as maxID FROM Table1 as A) a;
If you are inserting these rows into a new table, then the id can be assigned automatically -- but the values might differ for A.
I want to delete all records that are returned by a certain query, but I can't figure out a proper way to do this. I tried to DELETE FROM mytable WHERE EXISTS (subquery), however, that deleted all records from the table and not just the ones selected by the subquery.
My subquery looks like this:
SELECT
MAX(columnA) as columnA,
-- 50 other columns
FROM myTable
GROUP BY
-- the 50 other columns above
having count(*) > 1;
This should be easy enough, but my mind is just stuck right now. I'm thankful for any suggestions.
Edit: columnA is not unique (also no other column in that table is globally unique)
Presumably, you want to use in:
DELETE FROM myTable
WHERE columnA IN (SELECT MAX(columnA) as columnA
FROM myTable
GROUP BY -- the 50 other columns above
HAVING count(*) > 1
);
This assumes that columnA is globally unique in the table. Otherwise, you will have to work a bit harder.
DELETE FROM myTable t
WHERE EXISTS (SELECT 1
FROM (SELECT MAX(columnA) as columnA,
col1, col2, . . .
FROM myTable
GROUP BY -- the 50 other columns above
HAVING count(*) > 1
) t2
WHERE t.columnA = t2.columnA AND
t.col1 = t2.col1 AND
t.col2 = t2.col2 AND . . .
);
And, even this isn't guaranteed to work if any of the columns have NULL values (although the conditions can be easily modified to handle this).
Another solution if the uniqueness is only guaranteed by a set of columns:
delete table1 where (col1, col2, ...) in (
select min(col1), col2, ...
from table1
where...
group by col2, ...
)
Null values will be ignored and not deleted.
To achieve this, try something like
with data (id, val1, val2) as
(
select 1, '10', 10 from dual union all
select 2, '20', 21 from dual union all
select 2, null, 21 from dual union all
select 2, '20', null from dual
)
-- map null values in column to a nonexistent value in this column
select * from data d where (d.id, nvl(d.val1, '#<null>')) in
(select dd.id, nvl(dd.val1, '#<null>') from data dd)
If you need to delete all the rows of a table such that the value of a given field is in the result of a query, you can use something like
delete table
my column in ( select column from ...)