Get the values from the Max date after union - sql

This query works but I'd like to see if there's a more optimize/shorter way to get the same result. I'd like to retrieve all the data with the maximum date from the union of 3 tables, TABLE_01, TABLE_02, TABLE_03. Whichever table has the latest bill_date, I want to retrieve the rows for that bill_date. It will always have more than one row returned for the same PID and bill_date.
SELECT T1.PID, T1.BILL_DATE, T2.COL3, T2.COL4, T2.COL5
FROM
(
SELECT T.PID, MAX(T.BILL_DATE)
FROM
(
SELECT DISTINCT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0501
GROUP BY 1,2,3,4,5
UNION ALL
SELECT DISTINCT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0502
GROUP BY 1,2,3,4,5
UNION ALL
SELECT DISTINCT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0503
GROUP BY 1,2,3,4,5
) T
GROUP BY 1
) T1
INNER JOIN
( SELECT DISTINCT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0501
GROUP BY 1,2,3,4,5
UNION ALL
SELECT DISTINCT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0501
GROUP BY 1,2,3,4,5
UNION ALL
SELECT DISTINCT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0501
GROUP BY 1,2,3,4,5
) T2
ON T1.PID = T2.PID
AND T1.BILL_DATE = T2.BILL_DATE

Yes, a QUALIFY clause comes handy here.
SELECT * FROM
(
SELECT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0501
GROUP BY 1,2,3,4,5
UNION ALL
SELECT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0502
GROUP BY 1,2,3,4,5
UNION ALL
SELECT PID, BILL_DATE, COL3, COL4, COL5
FROM TABLE_0503
GROUP BY 1,2,3,4,5
) T
QUALIFY RANK() OVER (PARTITION BY PID ORDER BY BILL_DATE DESC) = 1;
Within each group of PID, a rank will be assigned to rows starting from BILL_DATE to lowest. QUALIFY ... = 1 will select the highest ranked BILL_DATE.

Related

Delete duplicate data that some columns equal zero

I have SQL Server table that has col1, col2, col3, col4, col5, col6, col7, col8, col9, col10.
I want delete the duplicate based on col1, col2, col3.
The row that should be deleted is where col6=0 and col7=0 and col8=0.
We can use a deletable CTE here:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY col1, col2, col3) cnt
FROM yourTable
)
DELETE
FROM cte
WHERE cnt > 1 AND col6 = 0 AND col7 = 0 AND col8 = 0;
The CTE above identifies "duplicates" according to your definition, which is 2 or more records having the same values for col1, col2, and col3. Then we delete duplicates meeting the requirements on the other 3 columns.

SQL Union not including duplicates based on single column?

I'm trying to union two tables but I need to essentially 'prefer' the first table using just one 'id' column.
If an 'id' appears in the second table that already exists in the first, I do not want to include that record.
Query looks like this
select id, col2, col3
from table(p_package.getData(param))
union
select id, col2, col3
from table1
where col7 = 'pass'
and col8 <> 'A'
and col9 = to_date(Date, 'mm/dd/yyyy')
the p_package.getData(param) is a pipelined function which returns a table. I would like to avoid calling this twice for performance reasons
You can use the ROW_NUMBER() analytic function to remove the duplicates:
SELECT id, col2, col3
FROM (
SELECT id, col2, col3,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY priority ) AS rn
FROM (
select id, col2, col3, 1 AS priority
from table(p_package.getData(param))
UNION ALL
select id, col2, col3, 2
from table1
where col7 = 'pass'
and col8 <> 'A'
and col9 = to_date(Date, 'mm/dd/yyyy')
)
)
WHERE rn = 1
and as a bonus, since you're filtering the duplicates elsewhere, you could change UNION to UNION ALL.
If you can have duplicates id values from the pipelined function and you want those but not any from table1 then:
SELECT id, col2, col3
FROM (
SELECT id, col2, col3, priority
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY priority ) AS rn
FROM (
select id, col2, col3, 1 AS priority
from table(p_package.getData(param))
UNION ALL
select id, col2, col3, 2
from table1
where col7 = 'pass'
and col8 <> 'A'
and col9 = to_date(Date, 'mm/dd/yyyy')
)
)
WHERE priority = 1
OR rn = 1
Assuming you don't want to include any col1 value in the second half of the union which would introduce a value already included in the first half, you could use an exists clause:
select col1, col2, col3
from table(p_package.getData(param))
union
select col1, col2, col3
from table1 t1
where col7 = 'pass' and col8 <> 'A'and col9 = to_date(Date, 'mm/dd/yyyy') and
not exists (select 1 from table(p_package.getData(param)) t2
where t1.col1 = t2.col1);
The other solutions work but I opted to use a common table expression as suggested by xQbert
with cte as
(select id, col2, col3
from table(p_package.getData(param)))
select * from cte
union
select id, col2, col3
from table1
where col7 = 'pass'
and col8 <> 'A'
and col9 = to_date(Date, 'mm/dd/yyyy')
and id not in (select id from cte)
EDIT: I realized that a CTE does not actually store the data returned by a query but stores the query itself instead. While this works it does not avoid calling the pipelined function twice

with clause in union query

I have with clause in union query like
with t1 as(...) ---common for both query
select * from t2
union
select * from t3
how to handle same with cluase in both queries?
You can reuse a Common Table Expression
For example:
with cte as
(
select col1, col2, col3, col4, col5, col6
from sometable
where col1 = 42
)
select col1, col2, col3
from cte as t1
union all
select col4, col5, col6
from cte as t2
If you need more CTE, then a comma can be used to separate them.
with cte1 as
(
select col1, col2, col3
from sometable
where col1 = 42
group by col1, col2, col3
)
, cte2 as
(
select col4, col5, col6
from sometable
where col4 > col5
group by col4, col5, col6
)
select col1, col2, col3
from cte1 as t1
union all
select col4, col5, col6
from cte2 as t2
But in this example it would be more something for aesthetic reasons, by putting the more complicated queries at the top of the SQL.
Because it would be more straightforward to just union the queries from the CTE's together.
select col1, col2, col3
from sometable
where col1 = 42
group by col1, col2, col3
union all
select col4, col5, col6
from sometable
where col4 > col5
group by col4, col5, col6

Pgsql Delete rows with some columns (not all) duplicate

Table - col_pk, col1, col2,col3, col4, col_date_updated
This table has some rows with duplicate column values for col2 and col3.
I want to keep those rows with col_date_updated is latest(max).
Eg:
col_pk, col1, col2, col3, col4, col_date_updated
1, A, hello, now, 200.00, 2017-12-12 15:09:44.437546
2, B, hello, now, 490.00, 2017-12-12 15:09:42.437065
3, C, hi, now, 300.00, 2017-12-12 15:09:41.436617
4, D, hello, now, 250.00, 2017-12-12 15:09:45.436617
5, E, hi, now, 250.00, 2017-12-12 10:09:41.436617
Expected Result:
col_pk, col1, col2, col3, col4, col_date_updated
3, C, hi, now, 300.00, 2017-12-12 15:09:41.436617
4, D, hello, now, 250.00, 2017-12-12 15:09:45.436617
Check this.
SELECT DISTINCT ON (col2, col3) t.*
FROM table t
ORDER BY col_date_updated DESC
apply distinct on col2 and col3 cause you want them unique and keep the latest with order by desc
If you just want to select to get your expected output, then ROW_NUMBER comes in handy:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col2, col3
ORDER BY col_date_updated DESC) rn
FROM yourTable
)
SELECT col_pk, col1, col2, col3, col4, col_date_updated
FROM cte
WHERE rn = 1;
If you instead want to delete the other records, then we can also reuse the CTE:
DELETE FROM yourTable WHERE col_pk IN (SELECT col_pk FROM cte WHERE rn > 1);
You could try something like this.
SELECT t.*
FROM yourtable t
WHERE col_date_updated IN (SELECT MAX (col_date_updated)
FROM yourtable i
WHERE t.col2 = i.col2 AND t.col3 = i.col3);
So, If you wish to delete other records, you may use this.
DELETE
FROM yourtable t
WHERE col_date_updated NOT IN (SELECT MAX (col_date_updated)
FROM yourtable i
WHERE t.col2 = i.col2 AND t.col3 = i.col3);
DEMO
If you want to suppress all but the most recent rows for any {col2,col3}:
SELECT *
FROM thetable zt
WHERE NOT EXISTS (
-- If a record exists with the same col2,col3,
-- but a more recent date than zt.col_date_updated
-- then zt.* cannot be the most recent one
SELECT *
FROM thetable nx
WHERE nx.col2 = zt.col2 -- same value
AND nx.col3 = zt.col3 -- same value
AND nx.col_date_updated > zt.col_date_updated -- more recent
);
If you want to physically delete all but the most recent rows for the same {col2,col3}:
DELETE
FROM thetable zt
WHERE EXISTS (
-- If a record exists with the same col2,col3,
-- but a more recent date than zt.t.col_date_updated
-- then zt.* cannot be the most recent one
-- and we can delete zt.
SELECT *
FROM thetable nx
WHERE nx.col2 = zt.col2 -- same value
AND nx.col3 = zt.col3 -- same value
AND nx.col_date_updated > zt.col_date_updated -- more recent
);
This is fastest way:
SELECT * FROM tablename WHERE col_pk IN
(SELECT col_pk FROM
(SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
FROM tablename) t
WHERE t.rnum > 1);
if you want delete:
DELETE FROM tablename WHERE col_pk IN
(SELECT col_pk FROM
(SELECT col_pk, ROW_NUMBER() OVER (partition BY col2, col3 ORDER BY col_date_updated) AS rnum
FROM tablename DESC) t
WHERE t.rnum > 1);

SQL query to simulate distinct

SELECT DISTINCT col1, col2 FROM table t ORDER BY col1;
This gives me distinct combination of col1 & col2. Is there an alternative way of writing the Oracle SQL query to get the unique combination of col1 & col2 records with out using the keyword distinct?
Use the UNIQUE keyword which is a synonym for DISTINCT:
SELECT UNIQUE col1, col2 FROM table t ORDER BY col1;
I don't see why you would want to but you could do
SELECT col1, col2 FROM table_t GROUP BY col1, col2 ORDER BY col1
Another - yet overly complex and somewhat useless - solution:
select *
from (
select col1,
col2,
row_number() over (partition by col1, col2 order by col1, col2) as rn
from the_table
)
where rn = 1
order by col1
select col1, col2
from table
group by col1, col2
order by col1
or a less elegant way:
select col1,col2 from table
UNION
select col1,col2 from table
order by col1;
or a even less elegant way:
select a.col1, a.col2
from (select col1, col2 from table
UNION
select NULL, NULL) a
where a.col1 is not null
order by a.col1
Yet another ...
select
col1,
col2
from
table t1
where
not exists (select *
from table t2
where t2.col1 = t1.col1 and
t2.col2 = t1.col2 and
t2.rowid > t1.rowid)
order by
col1;
Variations on the UNION solution by #aF. :
INTERSECT
SELECT col1, col2 FROM tableX
INTERSECT
SELECT col1, col2 FROM tableX
ORDER BY col1;
MINUS
SELECT col1, col2 FROM tableX
MINUS
SELECT col1, col2 FROM tableX WHERE 0 = 1
ORDER BY col1;
MINUS (2nd version, it will return one row less than the other versions, if there is (NULL, NULL) group)
SELECT col1, col2 FROM tableX
MINUS
SELECT NULL, NULL FROM dual
ORDER BY col1;
Another ...
select col1,
col2
from (
select col1,
col2,
rowid,
min(rowid) over (partition by col1, col2) min_rowid
from table)
where rowid = min_rowid
order by col1;