Combine the most recent entries from a number of tables - sql

I have a master table with a number of IDs in it:
ID ...
0 ...
1 ...
And multiple tables (say vtbl1, vtbl2, vtbl3) with a foreign key to master, a timestamp and a value:
ID Timestamp Value
0 01/01/01.. 2
1 01/01/02.. 7
0 01/01/03.. 5
I would like to get one or more entries for each ID in master with an entry (or null if no entries exist) containing the most recent entry in each v... table (grouped by timestamps):
ID Timestamp vtbl1.Value vtbl2.Value vtbl3.value
0 01/01/03.. 5 2
0 01/01/01.. 4
1 01/01/02.. 7 4 9
I'm sure this is fairly simple but my SQL is rusty and I've been going in circles. Any help would be appreciated.
Clarification
These values come from one or more sensors able to read one or more of the values. So the latest value in each value table for the ID is to be considered the current system state for that ID. If the timestamps match they are considered one update.
I need the minimal set of updates required for each ID to give a full data set for the current state.
Also the values can be of different types.

If I understand your question correctly, one option is to use conditional aggregation and union all:
select id, timestamp,
max(case when tbl = 'tbl1' then value end) t1value,
max(case when tbl = 'tbl2' then value end) t2value,
max(case when tbl = 'tbl3' then value end) t3value
from (
select id, timestamp, value, 'tbl1' tbl
from tbl1
union all
select id, timestamp, value, 'tbl2' tbl
from tbl2
union all
select id, timestamp, value, 'tbl3' tbl
from tbl3
) t
group by id, timestamp
Or if you have multiple records per id and you want the highest value per by timestamp, you can include row_number() in your subquery:
select id, timestamp,
max(case when tbl = 'tbl1' then value end) t1value,
max(case when tbl = 'tbl2' then value end) t2value,
max(case when tbl = 'tbl3' then value end) t3value
from (
select id, timestamp, value, 'tbl1' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl1
union all
select id, timestamp, value, 'tbl2' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl2
union all
select id, timestamp, value, 'tbl3' tbl,
row_number() over (partition by id order by timestamp desc) rn
from tbl3
) t
where rn = 1
group by id, timestamp
This can get difficult though if max(timestamp) values aren't the same in each of the child tables. Which do you join on at that point?

select m.*, v1.value as t1_val, v2.value as t2_val, v3.value as t3_val
from master m
left join (select x.*
from vtbl1 x
join (select id, max(timestamp) as last_ts
from vtbl1
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v1
on m.id = v1.id
left join (select x.*
from vtbl2 x
join (select id, max(timestamp) as last_ts
from vtbl2
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v2
on m.id = v2.id
left join (select x.*
from vtbl3 x
join (select id, max(timestamp) as last_ts
from vtbl3
group by id) y
on x.id = y.id
and x.timestamp = y.last_ts) v3
on m.id = v3.id

The fastest query technique depends on the distribution of values. DISTINCT ON would be a simple solution in Postgres, ideal for just a few values per id in each child table. But guessing from your description I expect many rows per id, so I suggest a solution with LATERAL joins. Requires Postgres 9.3+:
Optimize GROUP BY query to retrieve latest record per user
One more complication for your already-not-so-simple case:
Also the values can be of different types
Alternative 1
Cast all values to text. Every data type can be cast to text.
Base query
SELECT m.id, v.timestamp, 1 AS tbl, v.value -- simple int as table id
FROM master m
, LATERAL (
SELECT timestamp, value::text -- cast to text
FROM vtbl1
WHERE id = m.id -- lateral reference
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp, 2 AS tbl, v.value -- ascending without gaps
FROM master m
, LATERAL (
SELECT timestamp, value::text
FROM vtbl2
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp, 3 AS tbl, value
FROM ...
;
All you need for this to be fast is an index on (id, timestamp) for each child table. Best in this form (adding value is only useful if you get index-only scans out of it):
CREATE INDEX vtbl1_combo_idx ON vtbl1 (id, timestamp DESC NULLS LAST, value)
1a. Aggregate (pseudo-crosstab)
To format as desired use aggregate functions on CASE expressions in Postgres 9.3 or older (like demonstrated by #sgeddes) or (better) the new aggregate FILTER clause in Postgres 9.4+:
How can I simplify this game statistics query?
SELECT id, timestamp
, max(value) FILTER (WHERE tbl = 1) AS val1
, max(value) FILTER (WHERE tbl = 2) AS val2
, ...
FROM ( <query frm above> ) t
GROUP BY 1, 2;
1b. Crosstab
Actual cross tabulation (also called "pivot" in other RDBMS) should be considerably faster. You need the additional module tablefunc installed, instructions below.
The special difficulty here: we have a composite "row name" (id, timestamp), but the function expects a single column as row name. So we substitute with row_number(), but do not display that surrogate key in the result:
SELECT id, timestamp, val1, val2, val3, ...
-- normally SELECT * is enough; explicit list to filter rn
FROM crosstab(
$$
SELECT row_number() OVER (ORDER BY id, timestamp DESC NULLS LAST) AS rn
, id, timestamp, tbl, value
FROM ( <query from above> ) t
ORDER BY 1
$$
, 'SELECT generate_series(1,3)' -- replace 3 with highest table nr.
) AS ct (
rn int, id int, timestamp date
, val1 text, val2 text, val3 text, ...);
Closely related:
Postgres - Transpose Rows to Columns
Relevant basics:
PostgreSQL Crosstab Query
Pivot on Multiple Columns using Tablefunc
Alternative 2
Simple, but may be just as fast and preserves original data types:
SELECT id, timestamp
, max(val1) AS val1, max(val2) AS val2, max(val3) AS val3, ...
FROM (
SELECT m.id, v.timestamp
, v.value AS val1, NULL::int AS val2, NULL::numeric AS val3, ...
-- list all values with actual data type
FROM master m
, LATERAL (
SELECT timestamp, value
FROM vtbl1
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp
, NULL, v.value, NULL, ... -- column names & data types defined in first SELECT
FROM master m
, LATERAL (
SELECT timestamp, value
FROM vtbl2
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp
, NULL, NULL, v.value, ...
FROM ...
) t
GROUP BY 1, 2
ORDER BY 1, 2;
Aside: Never use basic type names or reserved words (in standard SQL) like timestamp as identifier.

Related

How to return two values from PostgreSQL subquery?

I have a problem where I need to get the last item across various tables in PostgreSQL.
The following code works and returns me the type of the latest update and when it was last updated.
The problem is, this query needs to be used as a subquery, so I want to select both the type and the last updated value from this query and PostgreSQL does not seem to like this... (Subquery must return only one column)
Any suggestions?
SELECT last.type, last.max FROM (
SELECT MAX(a.updated_at), 'a' AS type FROM table_a a WHERE a.ref = 5 UNION
SELECT MAX(b.updated_at), 'b' AS type FROM table_b b WHERE b.ref = 5
) AS last ORDER BY max LIMIT 1
Query is used like this inside of a CTE;
WITH sql_query as (
SELECT id, name, address, (...other columns),
last.type, last.max FROM (
SELECT MAX(a.updated_at), 'a' AS type FROM table_a a WHERE a.ref = 5 UNION
SELECT MAX(b.updated_at), 'b' AS type FROM table_b b WHERE b.ref = 5
) AS last ORDER BY max LIMIT 1
FROM table_c
WHERE table_c.fk_id = 1
)
The inherent problem is that SQL (all SQL not just Postgres) requires that a subquery used within a select clause can only return a single value. If you think about that restriction for a while it does makes sense. The select clause is returning rows and a certain number of columns, each row.column location is a single position within a grid. You can bend that rule a bit by putting concatenations into a single position (or a single "complex type" like a JSON value) but it remains a single position in that grid regardless.
Here however you do want 2 separate columns AND you need to return both columns from the same row, so instead of LIMIT 1 I suggest using ROW_NUMBER() instead to facilitate this:
WITH LastVals as (
SELECT type
, max_date
, row_number() over(order by max_date DESC) as rn
FROM (
SELECT MAX(a.updated_at) AS max_date, 'a' AS type FROM table_a a WHERE a.ref = 5
UNION ALL
SELECT MAX(b.updated_at) AS max_date, 'b' AS type FROM table_b b WHERE b.ref = 5
)
)
, sql_query as (
SELECT id
, name, address, (...other columns)
, (select type from lastVals where rn = 1) as last_type
, (select max_date from lastVals where rn = 1) as last_date
FROM table_c
WHERE table_c.fk_id = 1
)
----
By the way in your subquery you should use UNION ALL with type being a constant like 'a' or 'b' then even if MAX(a.updated_at) was identical for 2 or more tables, the rows would still be unique because of the difference in type. UNION will attempt to remove duplicate rows but here it just isn't going to help, so avoid that wasted effort by using UNION ALL.
----
For another way to skin this cat, consider using a LEFT JOIN instead
SELECT id
, name, address, (...other columns)
, lastVals.type
, LastVals.last_date
FROM table_c
WHERE table_c.fk_id = 1
LEFT JOIN (
SELECT type
, last_date
, row_number() over(order by last_date DESC) as rn
FROM (
SELECT MAX(a.updated_at) AS last_date, 'a' AS type FROM table_a a WHERE a.ref = 5
UNION ALL
SELECT MAX(b.updated_at) AS last_date, 'b' AS type FROM table_b b WHERE b.ref = 5
)
) LastVals ON LastVals.rn = 1

Oracle Getting latest value from the other table

I have two tables, table1 contains old values and table2 contains latest values, I want to show latest value in table1 but I do not have anything which tells me this is the latest value in table2.
for example
Table1
CID-----PID-----RID
CT1-----C-------R1
CT2-----C-------R2
CT3-----C-------R3
CT4-----C-------R4
Table2
CID-----PID----RID
CT1-----A-------R1
CT1-----C-------R11
CT2-----C-------R2
CT3-----A-------R3
CT4-----A-------R4
The condition is I have to give priority to value C in case both values (A and C) exist also it's RID changes so need to get that also in output table, for the same CID and for unique value I will simple replace it in table1 from table2, so output will be like this
Table3
CID-----PID----RID
CT1-----C-------R11
CT2-----C-------R2
CT3-----A-------R3
CT4-----A-------R4
I may be missing something, but isn't this simply:
select cid, max(pid)
from table2
group by cid;
If you want whole records, use a ranking with ROW_NUMBER instead:
select cid, pid, rid
from
(
select cid, pid, rid, row_number() over (partition by cid order by pid desc) as rn
from table2
)
where rn = 1;
You can also use case expressions for ranking, e.g.:
(partition by cid order by case pid when 'C' then 1 when 'A' then 2 else 3 end) as rn
UPDATE: Now that you've finally explained what you are after ...
You want more or less the second query I gave you above. Only that you want data from both tables, which you can get with UNION ALL. You can easily give each row a rank on the way:
table2 PIM C => rank #1
table2 PIM A => rank #2
table1 rank #3
Then again take the row with the best rank:
select cid, pid, rid
from
(
select cid, pid, rid, row_number() over (partition by cid order by rnk) as rn
from
(
select cid, pid, rid, case when pid = 'C' then 1 else 2 end as rnk from table2
union
select cid, pid, rid, 3 as rnk from table1
)
)
where rn = 1;

SQL Case depending on previous status of record

I have a table containing status of a records. Something like this:
ID STATUS TIMESTAMP
1 I 01-01-2016
1 A 01-03-2016
1 P 01-04-2016
2 I 01-01-2016
2 P 01-02-2016
3 P 01-01-2016
I want to make a case where I take the newest version of each row, and for all P that has at some point been an I, they should be cased as a 'G' instead of P.
When I try to do something like
Select case when ID in (select ID from TABLE where ID = 'I') else ID END as status)
From TABLE
where ID in (select max(ID) from TABLE)
I get an error that this isn't possible using IN when casing.
So my question is, how do I do it then?
Want to end up with:
ID STATUS TIMESTAMP
1 G 01-04-2016
2 G 01-02-2016
3 P 01-01-2016
DBMS is IBM DB2
Have a derived table which returns each id with its newest timestamp. Join with that result:
select t1.ID, t1.STATUS, t1.TIMESTAMP
from tablename t1
join (select id, max(timestamp) as max_timestamp
from tablename
group by id) t2
ON t1.id = t2.id and t1.TIMESTAMP = t2.max_timestamp
Will return both rows in case of a tie (two rows with same newest timestamp.)
Note that ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP".
You can do this by using a common table expression find all IDs that have had a status of 'I', and then using an outer join with your table to determine which IDs have had a status of 'I' at some point.
To get the final result (with only the newest record) you can use the row_number() OLAP function and select only the "newest" record (this is shown in the ranked common table expression below:
with irecs (ID) as (
select distinct
ID
from
TABLE
where
status = 'I'
),
ranked as (
select
rownumber() over (partition by t.ID order by t.timestamp desc) as rn,
t.id,
case when i.id is null then t.status else 'G' end as status,
t.timestamp
from
TABLE t
left outer join irecs i
on t.id = i.id
)
select
id,
status,
timestamp
from
ranked
where
rn = 1;
other solution
with youtableranked as (
select f1.id,
case (select count(*) from yourtable f2 where f2.ID=f1.ID and f2."TIMESTAMP"<f1."TIMESTAMP" and f2.STATUS='I')>0 then 'G' else f1.STATUS end as STATUS,
rownumber() over(partition by f1.id order by f1.TIMESTAMP desc, rrn(f1) desc) rang,
f1."TIMESTAMP"
from yourtable f1
)
select * from youtableranked f0
where f0.rang=1
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"
try this
select distinct f1.id, f4.*
from yourtable f1
inner join lateral
(
select
case (select count(*) from yourtable f3 where f3.ID=f2.ID and f3."TIMESTAMP"<f2."TIMESTAMP" and f3.STATUS='I')>0 then 'G' else f2.STATUS end as STATUS,
f2."TIMESTAMP"
from yourtable f2 where f2.ID=f3.ID
order by f2."TIMESTAMP" desc, rrn(f2) desc
fetch first rows only
) f4 on 1=1
rrn(f2) order is for same last date
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"

Get the maximum values of column B per each distinct value of column A sql

I have this table:
I am trying to pull all records from this table for the max value in the DIST_NO column for every distinct ID in the left most column, but I still want to pull every record for each ID in which there are different Product_ID's as well.
I tried partitioning and using row_number, but I am having trouble at the moment.
Here are my desired results:
This is what my code looks like currently:
select *
from
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DIST_NO DESC) RN
FROM Table) V
WHERE RN<=3
you want the max(DIST_NO) for each ID, product_ID?
If so, you can:
SELECT
ID, product_ID, max(DIST_NO)
from table
group by ID, product_ID
If you want the detail rows related to the max row, you just need to join it back to your table:
Select
t.ID, max_dist_no, TRANSaction_ID , LINE_NO , PRODUCT_ID
from
table t inner join
(SELECT
ID, max(DIST_NO) as max_dist_no
from table
group by ID) mx on
t.ID = mx.ID and
t.DIST_NO = max_DIST_NO
Try
SELECT MT.ID
, MT.DIST_NO
, MT.TRANS_ID
, MT.LINE_NO
, MT.PRODUCT_ID
FROM MYTABLE MT
INNER JOIN (
SELECT T.ID, MAX(T.DIST_NO) as DIST_NO FROM MYTABLE T
GROUP BY T.ID
) MAX_MT ON MT.Id = MAX_MT.ID AND MT.DIST_NO = MAX_MT.DIST_NO
The sub query returns each combination of ID and Max value of DIST_NO:
SELECT T.ID, MAX(T.DIST_NO) as DIST_NO FROM MYTABLE T
GROUP BY T.ID
Joining this back to your original table will basically filter your original data-set by only these combinations of values.
Tested on PostgreSQL:
WITH t1 AS (
SELECT id, product_id, MAX(dist_no) AS dist_no
FROM test
GROUP BY 1,2)
SELECT t1.id, t1.dist_no, t2.trans_id, t2.line_no, t1.product_id
FROM test t2, t1
WHERE t1.id=t2.id AND t1.product_id=t2.product_id AND t1.dist_no=t2.dist_no
Use rank() or dense_rank():
select t.*
from (SELECT t.*
RANK() OVER (PARTITION BY ID ORDER BY DIST_NO DESC) as seqnum
FROM Table t
) t
WHERE seqnum = 1;
This is almost a literal translation of your request:
I am trying to pull all records from this table for the max value in
the DIST_NO column for every distinct ID in the left most column.
you can try something like this one :). (But is your result correct? I think there is little mistake in TRANS_ID...)
DECLARE #ExampleTable TABLE
(ID INT,
DIST_NO INT,
TRANS_ID INT,
LINE_NO INT,
PRODUCT_ID INT)
INSERT INTO #ExampleTable
( ID, DIST_NO, TRANS_ID,LINE_NO, PRODUCT_ID )
VALUES ( 102657, 1, 1105365, 1, 109119 ),
( 102657, 1, 1105366, 2, 109114 ),
( 102657, 2, 1105365, 1, 109119 ),
( 102657, 2, 1105366, 2, 109114 ),
( 104371, 1, 1190538, 1, 110981 ),
( 104371, 2, 1190538, 1, 110981 )
;WITH CTE AS ( SELECT DISTINCT ID, LINE_NO
FROM #ExampleTable)
SELECT a.ID,
x.DIST_NO,
x.TRANS_ID,
x.LINE_NO,
x.PRODUCT_ID
FROM CTE a
CROSS APPLY (SELECT TOP 1 *
FROM #ExampleTable f
WHERE a.ID = f.ID AND
a.LINE_NO = f. LINE_NO
ORDER BY DIST_NO DESC) x

How to get the first not null value from a column of values in Big Query?

I am trying to extract the first not null value from a column of values based on timestamp. Can somebody share your thoughts on this. Thank you.
What have i tried so far?
FIRST_VALUE( column ) OVER ( PARTITION BY id ORDER BY timestamp)
Input :-
id,column,timestamp
1,NULL,10:30 am
1,NULL,10:31 am
1,'xyz',10:32 am
1,'def',10:33 am
2,NULL,11:30 am
2,'abc',11:31 am
Output(expected) :-
1,'xyz',10:30 am
1,'xyz',10:31 am
1,'xyz',10:32 am
1,'xyz',10:33 am
2,'abc',11:30 am
2,'abc',11:31 am
You can modify your sql like this to get the data you want.
FIRST_VALUE( column )
OVER (
PARTITION BY id
ORDER BY
CASE WHEN column IS NULL then 0 ELSE 1 END DESC,
timestamp
)
Try this old trick of string manipulation:
Select
ID,
Column,
ttimestamp,
LTRIM(Right(CColumn,20)) as CColumn,
FROM
(SELECT
ID,
Column,
ttimestamp,
MIN(Concat(RPAD(IF(Column is null, '9999999999999999',STRING(ttimestamp)),20,'0'),LPAD(Column,20,' '))) OVER (Partition by ID) CColumn
FROM (
SELECT
*
FROM (Select 1 as ID, STRING(NULL) as Column, 0.4375 as ttimestamp),
(Select 1 as ID, STRING(NULL) as Column, 0.438194444444444 as ttimestamp),
(Select 1 as ID, 'xyz' as Column, 0.438888888888889 as ttimestamp),
(Select 1 as ID, 'def' as Column, 0.439583333333333 as ttimestamp),
(Select 2 as ID, STRING(NULL) as Column, 0.479166666666667 as ttimestamp),
(Select 2 as ID, 'abc' as Column, 0.479861111111111 as ttimestamp)
))
As far as I know, Big Query has no options like 'IGNORE NULLS' or 'NULLS LAST'. Given that, this is the simplest solution I could come up with. I would like to see even simpler solutions.
Assuming the input data is in table "original_data",
select w2.id, w1.column, w2.timestamp
from
(select id,column,timestamp
from
(select id,column,timestamp, row_number()
over (partition BY id ORDER BY timestamp) position
FROM original_data
where column is not null
)
where position=1
) w1
right outer join
original_data as w2
on w1.id = w2.id
SELECT id,
(SELECT top(1) column FROM test1 where id=1 and column is not null order by autoID desc) as name
,timestamp
FROM yourTable
Output :-
1,'xyz',10:30 am
1,'xyz',10:31 am
1,'xyz',10:32 am
1,'xyz',10:33 am
2,'abc',11:30 am
2,'abc',11:31 am