Oracle SQL How to find duplicate values in different columns? - sql

I have a set of rows with many columns. For example,
ID | Col1 | Col2 | Col3 | Duplicate
------------------------------------
81 | 101 | 102 | 101 | YES
82 | 101 | 103 | 104 | NO
I need to calculate the "Duplicate" column. It is duplicate because it has the same value in Col1 and Col3. I know there is the LEAST function, which is similar to the MIN function but with columns. Does something similar to achieve this exists?
The approach I have in mind is to write all possible combinations in a case like this:
SELECT ID, col1, col2, col3,
CASE WHEN col1 = col2 or col1 = col3 or col2 = col3 then 1 else 0 end as Duplicate
FROM table
But, I wish to avoid that, since I have too many columns in some cases, and is very prone to errors.
What is the best way to solve this?

Hmmm. You are looking for within-row duplicates. This is painful. More recent versions of Oracle support lateral joins. But for just a handful of non-NULL columns, you can do:
select id, col1, col2, col3,
(case when col1 in (col2, col3) or col2 in (col3) then 1 else 0 end) as Duplicate
from t;
For each additional column, you need to add one more in comparison and update the other in-lists.

Something like this... note that in the lateral clause we still need to unpivot, but that is one row at a time - resulting in possibly much faster execution than simple unpivot and standard aggregation.
with
input_data ( id, col1, col2, col3 ) as (
select 81, 101, 102, 101 from dual union all
select 82, 101, 103, 104 from dual
)
-- End of simulated input data (for testing purposes only).
-- Solution (SQL query) begins BELOW THIS LINE.
select i.id, i.col1, i.col2, i.col3, l.duplicates
from input_data i,
lateral ( select case when count (distinct val) = count(val)
then 'NO' else 'YES'
end as duplicates
from input_data
unpivot ( val for col in ( col1, col2, col3 ) )
where id = i.id
) l
;
ID COL1 COL2 COL3 DUPLICATES
-- ---- ---- ---- ----------
81 101 102 101 YES
82 101 103 104 NO

You can do this by unpivoting and then counting the distinct values per id and checking if it equals the number of rows for that id. Equal means there are no duplicates. Then left join this result to the original table to caclulate the duplicate column.
SELECT t.*,
CASE WHEN x.id IS NOT NULL THEN 'Yes' ELSE 'No' END AS duplicate
FROM t
LEFT JOIN
(SELECT id
FROM
(SELECT *
FROM t
unpivot (val FOR col IN (col1,col2,col3)) u
) t
GROUP BY id
HAVING count(*)<>count(DISTINCT val)
) x ON x.id=t.id

The best way† is to avoid storing repeating groups of columns. If you have multiple columns that essentially store comparable data (i.e. a multi-valued attribute), move the data to a dependent table, and use one column.
CREATE TABLE child (
ref_id INT,
col INT
);
INSERT INTO child VALUES
(81, 101), (81, 102), (81, 101),
(82, 101), (82, 103), (82, 104);
Then it's easier to find cases where a value occurs more than once:
SELECT id, col, COUNT(*)
FROM child
GROUP BY id, col
HAVING COUNT(*) > 1;
If you can't change the structure of the table, you could simulate it using UNIONs:
SELECT id, col1, COUNT(*)
FROM (
SELECT id, col1 AS col FROM mytable
UNION ALL SELECT id, col2 FROM mytable
UNION ALL SELECT id, col3 FROM mytable
... for more columns ...
) t
GROUP BY id, col
HAVING COUNT(*) > 1;
† Best for the query you are trying to run. A denormalized storage strategy might be better for some other types of queries.

SELECT ID, col1, col2,
NVL2(NULLIF(col1, col2), 'Not duplicate', 'Duplicate')
FROM table;
If you want to compare more than 2 columns can implement same logic with COALESCE

I think you want to use fresh data that doesnot contains any duplicate values inside table if it right then use SELECT DISTINCT statement like
SELECT DISTINCT * FROM TABLE_NAME
It will conatins duplicate free data,
Note: It will also applicable for a particular column like
SELECT DISTINCT col1 FROM TABLE_NAME

Related

Merge into (in SQL), but ignore the duplicates

I try to merge two tables in snowflake with:
On CONCAT(tab1.column1, tab1.column2) = CONCAT(tab1.column1, tab1.column2)
The problem here is that there are duplicates. that means rows where column1 and column2 in table2 are identical. the only difference is the column timestamp. Therefore i would like to have two options: either i ignore the duplicate and take only one row (with the biggest timestamp), or distinguish again based on the timestamp. the second would be nicer
But I have no clue how to do it
Example:
Table1:
Col1 Col2 Col3 Timestamp
24 10 3 05.05.2022
34 19 2 04.05.2022
24 10 4 06.05.2022
Table2:
Col1 Col2 Col3
24 10 Null
34 19 Null
What I want to do:
MERGE INTO table1 AS dest USING
(SELECT * FROM table2) AS src
ON CONCAT(dest.col1, dest.col2) = CONCAT(src.col1, src.col2)
WHEN MATCHED THEN UPDATE
SET dest.col3 = src.col3
It feels like you want to update from TABLE1 too TABLE2 not the other way around, because as your example is there is no duplicates.
It also feels like you want to use two equi join's on col1 AND col2 not concat them together:
thus how I see your data, and the words you used, I think you should do this:
create or replace table table1(Col1 number, Col2 number, Col3 number, timestamp date);
insert into table1 values
(24, 10, 3, '2022-05-05'::date),
(34, 19, 2, '2022-05-04'::date),
(24, 10, 4, '2022-05-06'::date);
create or replace table table2(Col1 number, Col2 number, Col3 number);
insert into table2 values
(24, 10 ,Null),
(34, 19 ,Null);
MERGE INTO table2 AS d
USING (
select *
from table1
qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
ON d.col1 = s.col1 AND d.col2 = s.col2
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;
which runs fine:
number of rows updated
2
select * from table2;
shows it's has been updated:
COL1
COL2
COL3
24
10
4
34
19
2
but the JOIN being your way work as you have used if that is correct for your application, albeit it feels very wrong to me.
MERGE INTO table2 AS d
USING (
select *
from table1
qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
ON concat(d.col1, d.col2) = concat(s.col1, s.col2)
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;
This is it:
WITH CTE AS
(
SELECT *,
RANK() OVER (PARTITION BY col1,col2
ORDER BY Timestamp desc) AS rn
FROM table1
)
UPDATE CTE
SET col3 = (select col3 from table2 where CONCAT(table2.col1,table2.col2) = CONCAT(CTE.col1, CTE.col2))
where CTE.rn =1;

Select (show) only different columns from almost similar rows

I have a table with many columns 50+. in order to take decisions I analyze any variant data.
Actually my query:
SELECT maincol, count(maincol) FROM table where (conditions) group by maincol having count(maincol) > 1
then:
SELECT * FROM table where (conditions) and maincol = (previous result)
before consult displays all rows and I have to search one by one
col1, col2, col3, col4, col5, col6, manycolumns..., colN
5 7 1 13 341 9 123
5 7 2 13 341 5 123
I want to get:
col3, col6
1 9
2 5
because it's difficult searching manually column by column.
- N columns could be different
- I don't have access to credentials, then I can't use a programing language to manage results.
- Working on DB2
This will be a little tedious but worth it. This assumes that col1 through coln are all of the same type. If not, cast each to character in the select clause.
The result set will identify the maincol values that occur more than once that also have one or more columns with differing values. The columns that differ will be named.
Select maincol, colname, count(distinct colvalue)
From (
Select maincol, ‘column1’ as colname, col1 as colvalue
from table
Union
Select maincol, ‘column2’ as colname, col2 as colvalue
from table
Union
Select maincol, ‘column3’ as colname, col3 as colvalue
from table
Repeat this pattern for remaining columns
)
Group by maincol, colname
Having count(distinct colvalue) > 1
You could even join the result set from above with the original table to show the entire row including the name of the columns that differ:
Select b.colname, a.*
From table a, Select(
include entire query from above
) as b
Where a.maincol = b.maincol

Use the value from a field in a table as a select statement

I am trying to figure out how to use the values from a table as a select statement for example
table1 contains:
column- cl1
Value - numb
table2:
column - numb
values 1,2,3,4
i want to select cl1 (value: numb) and then use it to run a statement
select numb from table2
so far i have used
select (select cl1 from table1) from table2;
this returns numb 4 times but i want the actual values
the output that i expect is
1,2,3,4.
I want the query to select from table 1 which will return the field name and then use that field name(numb) as part of the select statement so expecting the end sql to look like:
select numb from table2;
However numb will be whatever is in table1;
You CAN do it, but you need to expand out your table to include by the list of possible columns that you are selecting with one row per column per source row. IF this is a large list of possibilities or a large data set....well, it ain't gonna be pretty.
For example:
With thedata as (
select 1 row_id, 11 col1, 12 col2, 13 col3 from dual union all
select 2 row_id, 21 col1, 22 col2, 23 col3 from dual union all
select 3 row_id, 31 col1, 32 col2, 33 col3 from dual union all
select 4 row_id, 41 col1, 42 col2, 43 col3 from dual )
, col_list as (
select 1 col_id, 'col1' col from dual union all
select 2 col_id, 'col2' col from dual union all
select 3 col_id, 'col3' col from dual )
select row_id, coldata
FROM (
-- here's where I have to mulitply the source data, generating one row for each possible column, and hard-coding that column to join to
SELECT row_id, 'col1' as col, col1 as coldata from thedata
union all
SELECT row_id, 'col2' as col, col2 as coldata from thedata
union all
SELECT row_id, 'col3' as col, col3 as coldata from thedata
) expanded_Data
JOIN col_list
on col_list.col = expanded_data.col
where col_id = :your_id;
Set the id to 2 and get:
ROW_ID COLDATA
1 12
2 22
3 32
4 42
So yes it can be done, but not truly dynamically as you need to be fully aware before-hand and hard-code the possible column name values that you are pulling from your table. If you need a truly dynamic select that may pick any column, or from any table, then you need to build your query dynamically and EXECUTE IMMEDIATE.
Edit - Add this caveat:
I should add also that this only works if all of the possible columns grabbed are of the same datatype, or you will need to cast them all to a common data type.
You can use a variable to store the value of numb, then reuse it in your SELECT statement, like this:
DECLARE #numb int
SET #numb = (SELECT cl1 from table2)
SELECT * from stat1 s WHERE s.numb = #numb

SQL script for retrieving 5 unique values in a table ( google big query )

I am looking for a query where I can get unique values(5) in a table. For example.
The table consists of more 100+ columns. Is there any way I can get unique values.
I am using google big query and tried this option
select col1 col2 ... coln
from tablename
where col1 is not null and col2 is not null
group by col1,col2... coln
order by col1, col2... coln
limit 5
But problem is it gives zero records if all the column are null
Thanks
R
I think you might be able to do this in Google bigquery, assuming that the types for the columns are compatible:
select colname, colval
from (select 'col1' as colname, col1 as colvalue
from t
where col1 is not null
group by col1
limit 5
),
(select 'col2' as colname, col2 as colvalue
from t
where col2 is not null
group by col2
limit 5
),
. . .
For those not familiar with the syntax, a comas in the from clause means union all, not cross join in this dialect. Why did they have to change this?
Try This one, i hope it works
;With CTE as (
select * ,ROW_NUMBER () over (partition by isnull(col1,''),isnull(col2,'')... isnull(coln,'') order by isnull(col1,'')) row_id
from tablename
) select * from CTE where row_id =1

SQL move data from rows to cols

Sorry for the bad title - I simply do not know what to call the thing I want to do.
Here it goes:
In MS SQL Server 2008
I have a temp table with 4000+ rows created with the WITH statement looking like this:
ID (varchar) DATE (int)
AB1135000097 | 20151221
AB1135000097 | 20160119
AB1135000097 | 20160219
AB1135001989 | 20120223
AB1135001989 | 20120323
AB1135001989 | 20120423
.
.
.
I want to pair the data in date-ranges based on DATE.
AB1135000097 | 20151221 | 20160119
AB1135000097 | 20160119 | 20160219
AB1135001989 | 20120223 | 20120323
AB1135001989 | 20120323 | 20120423
Does this action have a name ? (I will add tags to the post when I know what I'm asking for)
Assumed schema
I am assuming that your table is like:
CREATE TABLE "TABLE"
(
tag CHAR(1) NOT NULL,
value INTEGER NOT NULL,
PRIMARY KEY(tag, value)
);
I really shouldn't have to guess the schema though.
Possible answer
Superficially, you might be after:
SELECT t1.tag, t1.value, t2.value
FROM "TABLE" AS t1
JOIN "TABLE" AS t2
ON t1.tag = t2.tag AND t2.value = t1.value + 1
ORDER BY t1.tag, t1.value;
This joins the table with itself, combining rows where the tag column values (A, B, ...) are the same, and where the value column in one row is one more than the value column in the other.
On the other hand, if you add a row ('A', 5) to the table and expect it to appear in the output as part of a row ('A', 3, 5), then the query is much harder to write without using OLAP features.
if you are using Oracle database then you can refer following query to solve this question -
with t as
(
SELECT 'A' Col1, 1 Col2
FROM Dual
UNION ALL
SELECT 'A' Col1, 2 Col2
FROM Dual
UNION ALL
SELECT 'A' Col1, 3 Col2
FROM Dual
UNION ALL
SELECT 'B' Col1, 4 Col2
FROM Dual
UNION ALL
SELECT 'B' Col1, 5 Col2
FROM Dual
UNION ALL
SELECT 'B' Col1, 6 Col2 FROM Dual
)
SELECT *
FROM (SELECT Col1,
Col2,
Lead(Col1) Over(ORDER BY Col1, Col2) Col3,
Lead(Col2) Over(ORDER BY Col1, Col2) Col4
FROM t --(your table name)
ORDER BY Col1, Col2)
WHERE Col1 = Col3
as I don't have your table name and table structure I have created one temp table in Query itself.
you need to change From t to From with your table name . .. please change col1 and col2 column name also accordingly.
I found a solution to my problem. Inspired by Jonathan Leffler's solution. Thanks a lot!
It is based on adding row-numbers to the table ordered by ID and DATE, and then self-join with ROW+1 to get the next date as a second date column.
with
SCHEDULE as
( -- remove duplicates and NULL entries
select DISTINCT ID, DATE from TABLE1
where DATE IS NOT NULL
),
SCHEDULE_WITH_ROW as
(
select * from (
select DISTINCT ROW_NUMBER()
OVER (ORDER BY ID, DATE) AS
ROW, ID, DATE
from SCHEDULE) AS SCHED
)
select
S1.ID
, S1.DATE
, S2.DATE
from SCHEDULE_WITH_ROW S1
join SCHEDULE_WITH_ROW S2 on S2.ID = S1.ID and S1.ROW + 1 = S2.ROW