How to select the latest row from a dataset in ODI - sql

My table structure :
ID1 ID2 ID3 Value Last_Update_date
10 11 12 0.1 21-SEP-17 01.46.12.623580000 PM
10 11 12 0.2 20-SEP-17 01.46.12.623580000 PM
10 11 12 0.3 19-SEP-17 01.46.12.623580000 PM
20 21 22 0.4 01-SEP-17 01.46.12.623580000 PM
20 21 22 0.5 12-SEP-17 01.46.12.623580000 PM
20 21 22 0.4 21-SEP-17 01.46.12.623580000 PM
I am considering ID1+ID2+ID3 as composite ID and I need to extract the the latest row for each composite key .
Suppose for this combination ,ID1,ID2,ID3 ->10,11,12 my select query should return 10,11,12 0.2 (as 20 sep is the latest ) .
I have tried the below code :
SELECT a.ID1 ,
a.ID2,
a.ID3 ,
a.value ,
a.Last_update_date
FROM a,
(SELECT ID1,
ID2,
ID3,
MAX(last_update_date) last_update_date
FROM a
GROUP BY ID1,
ID2,
ID3
) b
WHERE a.ID1 = b.ID1
AND a.ID2 = b.ID2
AND a.ID3 = a.ID3
AND a.last_update_date = b.last_update_date
Is there any better way to write this .
I will using the code in ODI so I have the option only for simple sql functions like group by etc .
Thanks

I believe ODI supports window functions...
And your join syntax is bad, really bad, like, never do it again bad. Use explicit joins (not that you need them here at all)
select x.*
from
(
select a.*,
row_number() over(partition by id1, id2, id3 order by last_update_date desc) rn
from a
) x
where rn = 1

Using ROW_NUMBER will not return the same values as you query in the case that you have multiple rows with the same maximum value. You could use RANK or DENSE_RANK instead:
SELECT ID1,
ID2,
ID3,
value
Last_update_date
FROM (
SELECT ID1,
ID2,
ID3,
value,
last_update_date
RANK() OVER ( PARTITION BY id1, id2, id3 ORDER BY last_update_date DESC )
AS rnk
FROM a
)
WHERE rnk = 1
However, the direct equivalent of your query using analytic functions is:
SELECT ID1,
ID2,
ID3,
value
Last_update_date
FROM (
SELECT ID1,
ID2,
ID3,
value,
last_update_date
MAX(last_update_date) OVER ( PARTITION BY id1, id2, id3 )
AS max_last_update_date
FROM a
)
WHERE last_update_date = max_last_update_date

Related

Select value for each id after sorting on multiple columns

Looking for a way to select one "status" per "id1" based on the lowest "id2" then the MAX "the_date". I was able to do this by creating multiple subqueries finding the min id2 per id1 first, then finding the max "the_date" per id2 and joining them back to the original table. But it seems like there should a way to do this with one qry?
with data as(
Select 101 as id1, 11 as id2, to_date('01/02/2019','MM/DD/YYYY') as the_date, 'a' as status from dual union all
Select 101 as id1, 11 as id2, to_date('01/01/2019','MM/DD/YYYY') as the_date, 'b' as status from dual union all
Select 101 as id1, 24 as id2, to_date('01/02/2019','MM/DD/YYYY') as the_date, 'g' as status from dual union all
Select 200 as id1, 41 as id2, to_date('01/02/2017','MM/DD/YYYY') as the_date, 'c' as status from dual union all
Select 200 as id1, 61 as id2, to_date('01/02/2019','MM/DD/YYYY') as the_date, 'z' as status from dual)
The result of the qry should be:
id1|id2|the_date|status
101|11|'01/02/2019'|a
200|41|'01/02/2017'|c
You can use row_number():
select d.*
from (select d.*,
row_number() over (partition by id1 order by id2, the_date desc) as seqnum
from data d
) d
where seqnum = 1;
Here is a db<>fiddle.

Select a subcategory ID to associate with a primary ID based off which has the highest sum

I have a primary ID, ID1, and a secondary ID, ID2. ID1 can be associated with multiple ID2 values, and vice versa. I want to sum a third Values column by ID2 under each ID1, and pull the ID2 with the highest sum. The source data is structured like:
ID1 ID2 Value
1 10 1
1 10 2
1 20 1
2 10 1
2 30 2
And I want the final results to look like:
ID1 ID2
1 10
2 30
So far, I only have a nonfunctioning query:
SELECT ID1,
CASE WHEN ID2_Value = MAX(ID2_Value) THEN ID2
ELSE NULL
END AS PrimaryID2
FROM ( SELECT ID1,
ID2,
SUM(Value) AS ID2_Value
FROM SOME_SCHEMA
GROUP BY ID1, ID2
) AS ID2_Value
GROUP BY ID1;
My query doesn't work right now because it expects me to include ID2_Value in the GROUP BY statement, but I don't want to group by those values.
I would use row_number():
select id1, id2
from (select id1, id2, sum(value) as sumv,
row_number() over (partition by id1 order by sum(value) desc) as seqnum
from t
group by id1, id2
) t
where seqnum = 1;

Recursive SQL retrieve all levels

I am unable to retrieve the desired result my query when using Oracle's recursive approach:
Foo
ID1 ID2
1 2
1 3
4 2
4 3
4 5
Query:
select sys_connect_by_path(id2,' -> ')
FROM Foo
START WITH id1 = 1
CONNECT BY PRIOR id1 = id2
ORDER BY 1;
Outputs only level 1 hierarchy (2,3). I want it to detect the tree ( 1 -> (2,3) -> 4 -> 5 ), such that selecting distinct ID2 yields (2,3,5). Thank you.
If you are using Oracle 11.2 or above, a CTE (Common Table Expression) is preferred over using Oracle's CONNECT BY statement.
WITH
aset -- Create pseudo table with ID2 as ID1 and vice versa
AS
(SELECT id1, id2
FROM (SELECT id1, id2
FROM foo
UNION
SELECT id2, id1
FROM foo)
WHERE id1 < id2),
bset (id1, id2) -- Extract hierarchy from pseudo table
AS
(SELECT id1, id2
FROM aset
WHERE id1 = 1
UNION ALL
SELECT aset.id1, aset.id2
FROM bset INNER JOIN aset ON bset.id2 = aset.id1
WHERE bset.id1 <> aset.id2)
SELECT DISTINCT bset.id2 -- Only keep values that were originally ID2
FROM bset INNER JOIN foo ON bset.id2 = foo.id2
ORDER BY id2;
Here is the same thing using CONNECT BY
WITH
aset
-- Create pseudo table with ID2 as ID1 and vice versa
AS
(SELECT id1, id2
FROM (SELECT id1, id2
FROM foo
UNION
SELECT id2, id1
FROM foo)
WHERE id1 < id2),
bset
-- Extract hierarchy from pseudo table
AS
( SELECT id2
FROM aset
START WITH id1 = 1
CONNECT BY PRIOR id2 = id1)
SELECT DISTINCT bset.id2
-- Only keep values that were originally ID2
FROM bset INNER JOIN foo ON bset.id2 = foo.id2
ORDER BY id2

Oracle 11.2 SQL - help to condense data in ordered set

I have a data-set with a timestamp column and multiple identifier columns. I want to condense it to a single row for each "block" of adjacent rows with equal identifiers, when ordered by the timestamp. The min and max timestamp for each block is required.
Source Data:
TSTAMP ID1 ID2
t1 A B <= start of new block
t2 A B
t3 C D <= start of new block
t4 E F <= start of new block
t5 E F
t6 E F
t7 A B <= start of new block
t8 G H <= start of new block
Desired Result:
MIN_TSTAMP MAX_TSTAMP ID1 ID2
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
I thought this was ripe for a window-ing analytic function but I cannot partition without grouping ALL equal combinations of IDn - rather than only those in adjacent rows, when ordered by timestamp.
A workaround is to create a key column first in an in-line view that I can later group by i.e. with same value for each row in the block and different value for each block. I can do this using LAG analytic function to compare row values and then calling a PL/SQL function to return nextval/currval values of a sequence (calling nextval/currval directly in the SQL is restricted in this context).
select min(ilv.tstamp), max(ilv.tstamp), id1, id2
from (
select case when (id1 != lag(id1,1,'*') over (partition by (1) order by tstamp)
or id2 != lag(id2,1,'*') over (partition by (1) order by tstamp))
then
pk_seq_utils.gav_get_nextval
else
pk_seq_utils.gav_get_currval
end ident, t.*
from tab1 t
order by tstamp) ilv
group by ident, id1, id2
order by 1;
where the gav_get_xxx functions simply return currval/nextval from a sequence.
But I would like to use SQL only and avoid PL/SQL (as I could also write this easily in PL/SQL and pipe out the result-rows from a pipeline function).
Any ideas?
Thanks.
Tabibitosan to the rescue!
with sample_data as (select 't1' tstamp, 'A' id1, 'B' id2 from dual union all
select 't2' tstamp, 'A' id1, 'B' id2 from dual union all
select 't3' tstamp, 'C' id1, 'D' id2 from dual union all
select 't4' tstamp, 'E' id1, 'F' id2 from dual union all
select 't5' tstamp, 'E' id1, 'F' id2 from dual union all
select 't6' tstamp, 'E' id1, 'F' id2 from dual union all
select 't7' tstamp, 'A' id1, 'B' id2 from dual union all
select 't8' tstamp, 'G' id1, 'H' id2 from dual)
select min(tstamp) min_tstamp, max(tstamp) max_tstamp, id1, id2
from (select tstamp,
id1,
id2,
row_number() over (order by tstamp) - row_number() over (partition by id1, id2 order by tstamp) grp
from sample_data)
group by id1,
id2,
grp
order by min(tstamp);
MIN_TSTAMP MAX_TSTAMP ID1 ID2
---------- ---------- --- ---
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
You can use an analytic 'trick' to identify the gaps and islands, comparing the position of each row just against the tstamp across all rows with its position against tstamp just for that id2, id2 combination:
select tstamp, id1, id2,
row_number() over (partition by id1, id2 order by tstamp)
- row_number() over (order by tstamp) as block_id
from tab1;
TS I I BLOCK_ID
-- - - ----------
t1 A B 0
t2 A B 0
t3 C D -2
t4 E F -3
t5 E F -3
t6 E F -3
t7 A B -4
t8 G H -7
The actual value of block_id doesn't matter, just that it's unique for each block for the combination. You can then group using that:
select min(tstamp) as min_tstamp, max(tstamp) as max_tstamp, id1, id2
from (
select tstamp, id1, id2,
row_number() over (partition by id1, id2 order by tstamp)
- row_number() over (order by tstamp) as block_id
from tab1
)
group by id1, id2, block_id
order by min(tstamp);
MI MA I I
-- -- - -
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
You should be able to use the row_number window function to do this, like below:
select
min(tstamp) mints, max(tstamp) maxts, id1, id2
from (
select
*,
row_number() over (order by tstamp)
- row_number() over (partition by id1, id2 order by tstamp) as rn
from t
) as subq
group by id1, id2, rn
order by rn
I haven't been able to test it with any Oracle db, but it works with MSSQL and should work in Oracle too as the window function works the same way.
You need to do this step by step:
Detect ID changes with LAG marking each change with a flag = 1.
Generate keys for the groups (i.e. adjacent records with the same ID) with SUM over the ID change flags (running total).
Group by generated group key and get min/max timestamp.
Query:
select
min(tstamp) as min_tstamp,
max(tstamp) as max_tstamp,
min(id1) as id1,
min(id2) as id2
from
(
select
grouped.*,
sum(newgroup) over (order by tstamp) as groupkey
from
(
select
mytable.*,
case when id1 <> lag(id1) over (order by tstamp)
or id2 <> lag(id2) over (order by tstamp)
then 1 else 0 end as newgroup
from mytable
order by tstamp
) grouped
)
group by groupkey
order by groupkey;

How to map the 2 different record set one by one?

Let' say I have 2 sql queries. Table A contains,
ID
--
1
1
1
2
3
4
This query,
Select distinct ID1 FROM A
gives me,
ID
--
1
2
3
4
Second one
Select ID2 FROM B
which gives me,
ID2
--
8
21
33
43
How 2 get this record set?
ID1 ID2
--- ---
1 8
2 21
3 33
4 43
You did not specify what version of sql server but if you are using sql server 2008+, one way that you can do this is by adding the row_number() to each table and then joining on the row_number():
select a.id, b.id2
from
(
select id, row_number() over(order by id) rn
from a
) a
inner join
(
select id2, row_number() over(order by id2) rn
from b
) b
on a.rn = b.rn
See SQL Fiddle with Demo
If you want to only use DISTINCT values, then you should be able to use:
select a.id, b.id2
from
(
select id, row_number() over(order by id) rn
from
(
select distinct id
from a
) a
) a
inner join
(
select id2, row_number() over(order by id2) rn
from b
) b
on a.rn = b.rn;
See SQL Fiddle with Demo
If you have a different number of rows in each table, then you might want to use a FULL OUTER JOIN:
select a.id, b.id2
from
(
select id, row_number() over(order by id) rn
from
(
select distinct id
from a
) a
) a
full outer join
(
select id2, row_number() over(order by id2) rn
from b
) b
on a.rn = b.rn;
See SQL Fiddle with Demo