How to intersect two tables without losing the duplicate values oracle - sql

How to intersect two tables without losing the duplicate values in Oracle?
TAB1:
A
A
B
C
TAB2:
A
A
B
D
Output:
A
A
B

A subquery will filter the rows:
select *
from tab1
where col in (select col from tab2)

If I understand correctly:
select a.*, row_number() over (partition by col1 order by col1)
from a
intersect
select b.*, row_number() over (partition by col1 order by col1)
from b;
This adds a new sequential number to each row. Intersect will go up to the matching number.
This uses partition by col1 -- the col1 is arbitrary. You may need to include all columns in the partition by.

Related

SQL query to remove duplicates from a table with 139 columns and load all columns to another table

I need to remove the duplicates from a table with 139 columns based on 2 columns and load the unique rows with 139 columns into another table.
eg :
col1 col2 col3 .....col139
a b .............
b c .............
a b .............
o/p:
col1 col2 col3 .....col139
a b .............
b c .............
need a SQL query for DB2?
If the "other table" does not exist yet you can create it like this
CREATE TABLE othertable LIKE originaltable
And the insert the requested row with this statement:
INSERT INTO othertable
SELECT col1,...,coln
FROM (SELECT
t.*,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS num
FROM t) t
WHERE num = 1
There are numerous tools out there that generate queries and column lists - so if you do not want to write it by hand you could generate it with these tools or use another SQL statement to select it from the Db2 catalog table (syscat.columns).
You might be better just deleting the duplicates in place. This can be done without specifying a column list.
DELETE FROM
( SELECT
ROW_NUMBER() OVER (PARTITION BY col1, col2) AS DUP
FROM t
)
WHERE
DUP > 1
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by a) as seqnum
from t
) t;
If you don't want seqnum in the result set, though, you need to list out all the columns.
To find duplicate values in col1 or any column, you can run the following query:
SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1;
And if you want to delete those duplicate rows using the value of col1, you can run the following query:
DELETE FROM your_table WHERE col1 IN (SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1);
You can use the same approach to delete duplicate rows from the table using col2 values.

SQL DISTINCT based on a single column, but keep all columns as output

--mytable
col1 col2 col3
1 A red
2 A green
3 B purple
4 C blue
Let's call the table above mytable. I want to select only distinct values from col2:
SELECT DISTINCT
col2
FROM
mytable
When I do this the output looks like this, which is expected:
col2
A
B
C
but how do I perform the same type of query, yet keep all columns? The output would look like below. In essence I'm going through mytable looking at col2, and when there's multiple occurrences of col2 I'm only keeping the first row.
col1 col2 col3
1 A red
3 B purple
4 C blue
Do SQL functions (eg DISTINCT) have arguments I could set? I could imagine it to be something like KeepAllColumns = TRUE for this DISTINCT function? Or do I need to perform JOINs to get what I want?
You can use window functions, particularly row_number():
select t.*
from (select t.*, row_number() over (partition by col2 order by col2) as seqnum
from mytable t
) t
where seqnum = 1;
row_number() enumerates the rows, starting with "1". You can control whether you get the oldest, earliest, biggest, smallest . . .
You can use the QUALIFY clause in Teradata:
SELECT col1, col2, col3
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col2) = 1 -- Get 1st row per group
If you want to change the ordering for how to determine which col2 row to get, just change the expression in the ORDER BY.
With NOT EXISTS:
select m.* from mytable m
where not exists (
select 1 from mytable
where col2 = m.col2 and col1 < m.col1
)
This code will return the rows for which there is not another row with the same col2 and a smaller value in col1.

In an SQL join operation, how to get the rows from the left join and only the aggregate of two columns from the right table

I am trying to SUM the quantity in tpos table and count the distinct number of stores for each item that is in tpos.
For each row in inv_dtl there could be mulitple rows in tpos tables. I would like to put a script together that would give me all the rows from the inv_dtl table and add two aggregate columns sum(tpos.quantiy), count(distinct, tpos.store_number) that matches the join condition.
Here is what I have so far. The aggregates are working but my output contains the number or rows that match in tpos.
For example 1 row in inv_dtl could have 100 rows in tpos. My output should contain 1 row plus the two aggregate columns but my current script generates 100 rows.
WITH FT1 As
(
SELECT * FROM inv_dtl WHERE inv_no IN (16084, 23456, 14789)
),
FT2 As
(
SELECT
FT1.*,
SUM(tpos.quantity) OVER (partition by tpos.item_id) As pos_qty,
DENSE_RANK() OVER (partition by tpos.store_number ORDER BY tpos.item_id ASC) +
DENSE_RANK() OVER (partition by tpos.store_number ORDER BY tpos.item_id DESC)
As unique_store_cnt
FROM FT1
LEFT JOIN tpos
ON tpos.item_id = FT1.ITEM_ID
And tpos.movement_date Between FT1.SDATE And FT1.EDATE
And tpos.store_number != 'CMPNY'
)
SELECT * FROM FT2 ORDER BY ITEM_ID
Just use a conventional GROUP BY which will reduce the number of rows. But as I have no idea what columns you want from the first mentioned table so I have just invented 4 as an example.
WITH
FT1 AS (
SELECT
col1, col2, col3, col4
FROM inv_dtl
WHERE inv_no IN (16084, 23456, 14789)
),
FT3 AS (
SELECT
FT1.col1, FT1.col2, FT1.col3, FT1.col4
, SUM(tpos.quantity) OVER (PARTITION BY tpos.item_id) AS pos_qty
, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ASC) +
AS unique_store_cnt
FROM FT1
LEFT JOIN tpos ON tpos.item_id = FT1.ITEM_ID
AND tpos.movement_date BETWEEN FT1.SDATE AND FT1.EDATE
AND tpos.store_number != 'CMPNY'
GROUP BY
FT1.col1, FT1.col2, FT1.col3, FT1.col4
)
SELECT
*
FROM FT3
ORDER BY col1, col2, col3, col4
Do pleae note that RANK() and DENSE_RANK() can repeat numbers if data is of "equal rank". To guarantee a unique integer per row use ROW_NUMBER() instead.

How to select all columns for rows where I check if just 1 or 2 columns contain duplicate values

I'm having difficulty with what I figure should be an easy problem. I want to select all the columns in a table for which one particular column has duplicate values.
I've been trying to use aggregate functions, but that's constraining me as I want to just match on one column and display all values. Using aggregates seems to require that I 'group by' all columns I'm going to want to display.
If I understood you correctly, this should do:
SELECT *
FROM YourTable A
WHERE EXISTS(SELECT 1
FROM YourTable
WHERE Col1 = A.Col1
GROUP BY Col1
HAVING COUNT(*) > 1)
You can join on a derived table where you aggregate and determine "col" values which are duplicated:
SELECT a.*
FROM Table1 a
INNER JOIN
(
SELECT col
FROM Table1
GROUP BY col
HAVING COUNT(1) > 1
) b ON a.col = b.col
This query gives you a chance to ORDER BY cola in ascending or descending order and change Cola output.
Here's a Demo on SqlFiddle.
with cl
as
(
select *, ROW_NUMBER() OVER(partition by colb order by cola ) as rn
from tbl)
select *
from cl
where rn > 1

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).