How to optimize this query? - sql

I have tables:
A (ID_A, VALID_FROM, DATA ...)
B (ID_B, ID, T1, T2, T3, DATE)
Table A can contain historical data (eg. data valid for given period)
I need to select records from table B joined with appropritate records from table A (from table A I need row where b.id = a.id_a and record was valid at b.date)
select *
from B, (select * from (select * from A where a.id_a = b.id and a.valid_from <= b.date order by valid_from desc) where rownum = 1)
where b.id = a.id_a

Sounds like you're looking for a JOIN: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/queries006.htm

This isn't much more optimal, but is probably more readable:
select *
from A a, B b
Where
a.id_a = b.id
and a.valid_from = (select max(valid_from)
from A
where id_a = b.id
and valid_from <= b.date)
order by valid_from desc
I've seen this problem before, and the best way I know of to optimise it is to put a valid_to column onto table A.
For the latest record, this should contain the biggest date Oracle can handle.
Whenever you create a newer version of the record, update it with the time the new record is created (minus a millisecond to avoid overlaps) so you have something like this:
ID Valid_from Valid_to
1 01/01/2011 12.34.56.0000 02/01/2011 12.34.56.0000
1 02/01/2011 12.34.56.0001 03/01/2011 12.34.56.0000
1 03/01/2011 12.34.56.0001 31/12/9999 23.59.59.9999
Then you can query it like this:
select *
from A a, B b
Where
a.id_a = b.id
and b.date between a.valid_from and a.valid_to
order by valid_from desc
With an index on the date columns, the performance should be ok..

I've taken StevieG's answer and expanded on it. Without a valid_to column there are tricky subqueries to write. I would propose using the LEAD analytic function to find the end of the current validity period and work with that. This is an alternative to the subqueries and the valid_to column.
The LEAD analytic function looks over the rows in the current data set and finds the next valid_from date and uses that as the end of the current period.
My query is shown below. It incorporates the sample data you provided, in a with clause.
with table_a as (
select 1 as id, 'XXX1' as data, date '2009-01-01' as valid_from from dual union all
select 1 as id, 'XXX2' as data, date '2009-05-30' as valid_from from dual union all
select 1 as id, 'XXX3' as data, date '2010-01-11' as valid_from from dual union all
select 2 as id, 'YYY' as data, date '1999-01-01' as valid_from from dual
),
table_b as (
select 1 as id, 1 as id_a, date '2009-02-01' as date_col from dual union all
select 2 as id, 2 as id_a, date '2009-09-12' as date_col from dual union all
select 3 as id, 1 as id_a, date '2009-06-30' as date_col from dual
)
select *
from table_b b
join (
select
id,
valid_from,
lead(valid_from, 1, date '9999-12-31') over (partition by a.id order by a.valid_from) as valid_to
from table_a a
) a on (a.id = b.id_a)
where
a.valid_from <= b.date_col and
b.date_col < a.valid_to

Related

Bigquery: WHERE clause using column from outside the subquery

New to Bigquery, and googling could not really point me to the solution of the problem.
I am trying to use a where clause in a subquery to filter and pick the latest row for each other row in the main query. In postgres I'd normally do it like this:
SELECT
*
FROM
table_a AS a
LEFT JOIN LATERAL
(
SELECT
score,
CONCAT( "AB", id ) AS id
FROM
table_b AS b
WHERE
id = a.company_id
and
b.date < a.date
ORDER BY
b.date DESC
LIMIT
1
) ON true
WHERE
id LIKE 'AB%'
ORDER BY
createdAt DESC
so this would essentially run the subquery against each row and pick the latest row from table B based on a given row's date from table A.
So if table A would have a row
id
date
12
2021-05-XX
and table B:
id
date
value
12
2022-01-XX
99
12
2021-02-XX
98
12
2020-03-XX
97
12
2019-04-XX
96
It would have joined only the row with 2021-02-XX to table a.
In another example, with
Table A:
id
date
15
2021-01-XX
Table B:
id
date
value
15
2022-01-XX
99
15
2021-02-XX
98
15
2020-03-XX
97
15
2019-04-XX
96
it would join only the row with date: 2020-03-XX, value: 97.
Hope that is clear, not really sure how to write this query to work
Thanks for help!
You can replace some of your correlated sub-select logic with a simple join and qualify statement.
Try the following:
SELECT *
FROM table_a a
LEFT JOIN table_b b
ON a.id = b.id
WHERE b.date < a.date
QUALIFY ROW_NUMBER() OVER (PARTITION BY b.id ORDER BY b.date desc) = 1
With your sample data it produces:
This should work for both truncated dates (YYYY-MM) as well as full dates (YYYY-MM-DD)
Something like below should work for your requirements
WITH
latest_record AS (
SELECT
a.id,
value,b.date, a.createdAt
FROM
`gcp-project-name.data-set-name.A` AS a
JOIN
`gcp-project-name.data-set-name.B` b
ON
( a.id = b.id
AND b.date < a.updatedAt )
ORDER BY
b.date DESC
LIMIT
1 )
SELECT
*
FROM
latest_record
I ran this with table A as
and table B as
and get result

Table with 2 dates I want a single list of unique dates

Say for example I have a table with date start and date end
Item 1 10/2/2019 12/2/2019
Item 2 10/2/2019 15/2/2019.
I wish to have a result of
Item 1 10/2/2019
Item 2 12/2/2019
Item 3 15/2/2019
In a single column that I can use for further queries
Can’t think of how to get the desired result
See above
Here is what you asked for if you are using Oracle:
select 'Item ' || ROW_NUMBER() OVER(
ORDER BY date_row
) row_num
, date_row
from (
select start_date as Date_row from table1
union
select end_date as Date_row from table1);
And here is the DEMO
Here is what you asked for if you are using MySQL:
select concat("Item ", ROW_NUMBER() OVER(ORDER BY date_row)) as Item
, date_row
from (
select start_date as Date_row from table1 as t1
union
select end_date as Date_row from table1 as t2
) as test;
And here is the DEMO
Here is what you asked for if you are using Postgres:
select 'Item ' || ROW_NUMBER() OVER(
ORDER BY date_row
) row_num
, date_row
from (
select start_date as Date_row from table1 as t1
union
select end_date as Date_row from table1 as t2) as t3;
And here is the DEMO
Probably this:
SELECT 'startdate' as datekind, startdate
FROM table
UNION
SELECT 'enddate', enddate
FROM table
The kind is optional but I added it in to demo how you would retain knowledge of whether a date was start or end. You can add other columns like ID in in the same way
If you don't want to squish duplicates add the word ALL after UNION
NOTE - the presence of the kind column will influence whether a date is deemed a duplicate of another row or not. This query can still produce repeated dates if one is a start and the other an end. If this is unacceptable, remove the dateline column (and accept that you won't know what they are)
If we're generating a unique list of dates and the lowest item associated:
SELECT x.d, min(x.item) as i
FROM(
SELECT startdate as d, item FROM table
UNION ALL
SELECT enddate, item FROM table
) x
GROUP BY x.d

How to return two values from PostgreSQL subquery?

I have a problem where I need to get the last item across various tables in PostgreSQL.
The following code works and returns me the type of the latest update and when it was last updated.
The problem is, this query needs to be used as a subquery, so I want to select both the type and the last updated value from this query and PostgreSQL does not seem to like this... (Subquery must return only one column)
Any suggestions?
SELECT last.type, last.max FROM (
SELECT MAX(a.updated_at), 'a' AS type FROM table_a a WHERE a.ref = 5 UNION
SELECT MAX(b.updated_at), 'b' AS type FROM table_b b WHERE b.ref = 5
) AS last ORDER BY max LIMIT 1
Query is used like this inside of a CTE;
WITH sql_query as (
SELECT id, name, address, (...other columns),
last.type, last.max FROM (
SELECT MAX(a.updated_at), 'a' AS type FROM table_a a WHERE a.ref = 5 UNION
SELECT MAX(b.updated_at), 'b' AS type FROM table_b b WHERE b.ref = 5
) AS last ORDER BY max LIMIT 1
FROM table_c
WHERE table_c.fk_id = 1
)
The inherent problem is that SQL (all SQL not just Postgres) requires that a subquery used within a select clause can only return a single value. If you think about that restriction for a while it does makes sense. The select clause is returning rows and a certain number of columns, each row.column location is a single position within a grid. You can bend that rule a bit by putting concatenations into a single position (or a single "complex type" like a JSON value) but it remains a single position in that grid regardless.
Here however you do want 2 separate columns AND you need to return both columns from the same row, so instead of LIMIT 1 I suggest using ROW_NUMBER() instead to facilitate this:
WITH LastVals as (
SELECT type
, max_date
, row_number() over(order by max_date DESC) as rn
FROM (
SELECT MAX(a.updated_at) AS max_date, 'a' AS type FROM table_a a WHERE a.ref = 5
UNION ALL
SELECT MAX(b.updated_at) AS max_date, 'b' AS type FROM table_b b WHERE b.ref = 5
)
)
, sql_query as (
SELECT id
, name, address, (...other columns)
, (select type from lastVals where rn = 1) as last_type
, (select max_date from lastVals where rn = 1) as last_date
FROM table_c
WHERE table_c.fk_id = 1
)
----
By the way in your subquery you should use UNION ALL with type being a constant like 'a' or 'b' then even if MAX(a.updated_at) was identical for 2 or more tables, the rows would still be unique because of the difference in type. UNION will attempt to remove duplicate rows but here it just isn't going to help, so avoid that wasted effort by using UNION ALL.
----
For another way to skin this cat, consider using a LEFT JOIN instead
SELECT id
, name, address, (...other columns)
, lastVals.type
, LastVals.last_date
FROM table_c
WHERE table_c.fk_id = 1
LEFT JOIN (
SELECT type
, last_date
, row_number() over(order by last_date DESC) as rn
FROM (
SELECT MAX(a.updated_at) AS last_date, 'a' AS type FROM table_a a WHERE a.ref = 5
UNION ALL
SELECT MAX(b.updated_at) AS last_date, 'b' AS type FROM table_b b WHERE b.ref = 5
)
) LastVals ON LastVals.rn = 1

Updating columns in Oracle based on values in other tables

I am fairly new to creating and altering tables in SQL (Oracle) and have a question involving updating one table based on values in others.
Say I have table A:
ID Date Status
--- --- ---
1 1/1/2000 Active
2 5/10/2007 Inactive
2 2/15/2016 Active
3 10/1/2013 Inactive
4 1/11/2004 Inactive
5 4/5/2012 Inactive
5 6/12/2014 Active
and table B:
ID Date Status Number of Records in A
--- --- --- ---
1
2
3
4
5
What is the best way to update table B to get the most recent Date and Status of each item and count of records in A? I know I could join tables but I would like B to exist as its own table.
Oracle lets you assign multiple columns at once in an update statement. So, you can do:
update b
set (dt, status, ct) =
(select max(dt),
max(status) keep (dense_rank first order by dt desc),
count(*)
from a
where a.id = b.id
) ;
You can basically use the subquery -- with a group by -- if you want the results for all ids as a query:
select max(dt),
max(status) keep (dense_rank first order by dt desc),
count(*)
from a
group by id;
You can also use create table as or insert into to put the records directly into b, without having to match them up using update.
Something like this. If you already have a table B and you need to populate it with the values from this query, or if you need to create a new table B with these values, adapt as needed. NOTE: I used dt as a column name, since "date" is a reserved word in Oracle. (For the same reason I used "ct" for "count".)
with
table_A ( id, dt, status ) as (
select 1, to_date( '1/1/2000', 'mm/dd/yyyy'), 'Active' from dual union all
select 2, to_date('5/10/2007', 'mm/dd/yyyy'), 'Inactive' from dual union all
select 2, to_date('2/15/2016', 'mm/dd/yyyy'), 'Active' from dual union all
select 3, to_date('10/1/2013', 'mm/dd/yyyy'), 'Inactive' from dual union all
select 4, to_date('1/11/2004', 'mm/dd/yyyy'), 'Inactive' from dual union all
select 5, to_date(' 4/5/2012', 'mm/dd/yyyy'), 'Inactive' from dual union all
select 5, to_date('6/12/2014', 'mm/dd/yyyy'), 'Active' from dual
),
prep ( id, dt, status, rn, ct ) as (
select id, dt, status,
row_number() over (partition by id order by dt desc),
count(*) over (partition by id)
from table_A
)
select id, to_char(dt, 'mm/dd/yyyy') as dt, status, ct
from prep
where rn = 1
;
ID DT STATUS CT
---------- ---------- -------- ----------
1 01/01/2000 Active 1
2 02/15/2016 Active 2
3 10/01/2013 Inactive 1
4 01/11/2004 Inactive 1
5 06/12/2014 Active 2
Added: You mentioned you are pretty new at this... so: for example, if you need to create table_B with these results, and table_A already exists and is populated: FIRST, you will not need the "table_A" factored subquery in my solution; and SECOND, you will create table_B with something like
create table table_B as
with
prep ( .....) -- rest of the solution here, up to and including the ;

SQl Query : need to get the latest created data in the child records

I have a requirment in which I need to get the latest created data in the child records.
Suppose there are two tables A and B. A is parent and B is child. They have 1:M relation. Both has some columns and B table has one 'created date' column also which holds the created date of the record in table B.
Now, I need to write a query which can fetch all records from A table and it's latest created child record from B table. suppose If two child records are created today in table B for a parent record then the latest one out of them should get fetch.
One record of A table could have many childs, so how can we achive this.
Result should be - Columns of tbl A, Columns of tbl B(Latest created one)
I hope the 'created date' is a DATETIME column. This would give you the most recent child record. Assuming you have a consistent ID in the parent table with the same ParentID in the child table as a foreign key....
select A.*, B.*
from A
join B on A.ParentID = B.ParentID
join (
select ParentID, max([created date]) as [created date]
from B
group by ParentID
) maxchild on A.ParentID = maxchild.ParentID
where B.ParentID = maxchild.ParentID and B.[created date] = maxchild.[created date]
Below is the query that can help you out.
select x, y from ( select a.coloumn_TAB_A x, b.coloumn_TAB_B y from TableA a ,
TableB b where a.primary_key=b.primary_key
and a.Primary_key ='XYZ' order by b.created_date desc) where rownum < 2
Here we have two tables A and B, Joined them based on primary keys, order them on created date column of Table B in Descending order.
Use this output as inline view for outer query and select whichever coloumn u want like x, y. where rownum < 2 (that will fetch the latest record of table B)
This is not the most efficient but will work (SQL Only):
SELECT [Table_A].[Columns], [Table_B].[Columns]
FROM [Table_A]
LEFT OUTER JOIN [Table_B]
ON [Table_B].ForeignKey = [Table_A].PrimaryKey
AND [Table_B].PrimaryKey = (SELECT TOP 1 [Table_B].PrimaryKey
FROM [Table_B]
WHERE [Table_B].ForeignKey = [Table_A].PrimaryKey
ORDER BY [Table_B].CREATIONDATE DESC)
You can use analytic functions to avoid hitting each table (or specifically B) more than once
Using CTEs to provide dummy data for A and B you can do this:
with A as (
select 1 as id from dual
union all select 2 from dual
union all select 3 from dual
),
B as (
select 1 as a_id, date '2012-01-01' as created_date, 'First for 1' as value
from dual
union all select 1, date '2012-01-02', 'Second for 1' from dual
union all select 1, date '2012-01-03', 'Third for 1' from dual
union all select 2, date '2012-02-01', 'First for 2' from dual
union all select 2, date '2012-02-03', 'Second for 2' from dual
union all select 3, date '2012-02-01', 'First for 3' from dual
union all select 3, date '2012-02-03', 'Second for 3' from dual
union all select 3, date '2012-02-05', 'Third for 3' from dual
union all select 3, date '2012-02-09', 'Fourth for 3' from dual
)
select id, created_date, value from (
select a.id, b.created_date, b.value,
row_number() over (partition by a.id order by b.created_date desc) as rn
from a
join b on b.a_id = a.id
)
where rn = 1
order by id;
ID CREATED_D VALUE
---------- --------- ------------
1 03-JAN-12 Third for 1
2 03-FEB-12 Second for 2
3 09-FEB-12 Fourth for 3
You can select any columns you want from A and B, but you'll need to alias them in the subquery if there are any with the same name in both tables.
You may also need to user rank() or dense_rank() instead of row_number to handle ties appropriately, if you can have child records with the same created date.