Retrieve records from versioned table - sql

this sql case has been troubling me for a while and I wanted to ask here what other folks think.
I have a table user who owns vehicles, but the same vehicle maybe owned by multiple user over time, there is another column called effective_date which tells from what day this is owning is effective. Two driver doesn't own the same vehicle, but records are versioned, meaning we can check who owned this vehicle 2 years ago, or 5 years ago using effective date.
Table has following columns,
id, version, name, vehicle_id, effective_date. Every change to this table is versioned
Now there is another table called accidents which tells what accident with vehicle and when, not versioned
it has id, description, vehicle_id, acc_date
Now I am trying to select all accidents and who caused the accident. Inner join doesn't work here, What I do is select all rows from accident table and run sub query for each row and find the user's id and version that was responsible for the cause. This will be super slow and I am looking for more performant way of organizing the date or constructing a query. Right now it runs a subquery for every row it selects from accident table, because each row has different accident date. I am ok doing few queries if there is easy way of doing within a single query.
Example
user table
id
version
name
vehicle_id
effective_date
1
1
A
1
01/10/2021
1
2
A
2
02/10/2021
2
1
B
1
03/10/2021
2
2
B
2
04/10/2021
accident:
id
description
vehicle_id
acc_date
1
hit1
1
03/5/2021
2
hit2
1
03/15/2021
Result:
user_id
user_version
acc_id
vehicle_id
acc_date
1
1
1
1
03/5/2021
2
1
2
1
03/15/2021
thanks for your help

To get the latest user at the time of the accident you can use ROW_NUMBER() sorting by descending effective_date. With this ordering the first user listed for each accident is the responsible one.
For example:
select *
from (
select *,
row_number() over(partition by u.vehicle_id
order by effective_date desc) as rn
from user u
join accident a on a.vehicle_id = u.vehicle_id
where u.effective_date <= a.acc_date
) x
where rn = 1

Select user_id, user_version,
acc_id, vehicle_id, acc_date from(
Select rownumber() over
(Partition by a.id, a vehicle_id,
b.id) sn ,a.id
as user_id, a.version
as user_version,
b.id as acc_id, a.vehicle_id,
acc_date from user a
Inner Join
Accident b on
a.vehicle_id = b.vehicle_id) a
where sn = 1

Related

Checking conditions per group, and ranking most recent row?

I'm handling a table like so:
Name
Status
Date
Alfred
1
Jan 1 2023
Alfred
2
Jan 2 2023
Alfred
3
Jan 2 2023
Alfred
4
Jan 3 2023
Bob
1
Jan 1 2023
Bob
3
Jan 2 2023
Carl
1
Jan 5 2023
Dan
1
Jan 8 2023
Dan
2
Jan 9 2023
I'm trying to setup a query so I can handle the following:
I'd like to pull the most recent status per Name,
SELECT MAX(Date), Status, Name
FROM test_table
GROUP BY Status, Name
Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2, regardless of if the most recent one is 2 or not
WITH has_2_table AS (
SELECT DISTINCT Name, TRUE as has_2
FROM test_table
WHERE Status = 2 )
And then maybe joining the above on a left join on Name?
But having these as two seperate queries and joining them feels clunky to me, especially since I'd like to add additional columns and other checks. Is there a better way to set this up in one singular query, or is this the most effecient way?
You said, "I'd like to add additional columns" so I interpret that to mean you would like to Select the entire most recent record and add an 'ever-2' column.
You can either do this by joining two queries, or use window functions. Not knowing Snowflake Cloud Data, I cannot tell you which is more efficient.
Join 2 Queries
Select A.*,Coalesce(B.Ever2,"No") as Ever2
From (
Select * From testable x
Where date=(Select max(date) From test_table y
Where x.name=y.name)
) A Left Outer Join (
Select name,"Yes" as Ever2 From test_table
Where status=2
Group By name
) B On A.name=B.name
The first subquery can also be written as an Inner Join if correlated subqueries are implemented badly on your platform.
use of Window Functions
Select * From (
Select row_number() Over (Partition by name, order by date desc, status desc) as bestrow,
A.*,
Coalesce(max(Case When status=2 Then "Yes" End) Over (Partition By name Rows Unbounded Preceding And Unbounded Following), "No") as Ever2
From test_table A
)
Where bestrow=1
This second query type always reads and sorts the entire test_table so it might not be the most efficient.
Given that you have a different partitioning on the two aggregations, you could try going with window functions instead:
SELECT DISTINCT Name,
MAX(Date) OVER(
PARTITION BY Name, Status
) AS lastdate,
MAX(CASE WHEN Status = 2 THEN 1 ELSE 0 END) OVER(
PARTITION BY Name
) AS status2
FROM tab
I'd like to pull the most recent status per name […] Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2.
Snowflake has sophisticated aggregate functions.
Using group by, we can get the latest status with arrays and check for a given status with boolean aggregation:
select name, max(date) max_date,
get(array_agg(status) within group (order by date desc), 0) last_status,
boolor_agg(status = 2) has_status2
from mytable
group by name
We could also use window functions and qualify:
select name, date as max_date,
status as last_status,
boolor_agg(status = 2) over(partition by name) has_status2
from mytable
qualify rank() over(order by name order by date desc) = 1

Retrieve last record in a group based on string - DB2

I have a table with transactional data in a DB2 database that I want to retrieve the last record, per location and product. The date is unfortunately stored as a YYYYMMDD string. There is not a transaction id or similar field I can key in on. There is no primary key.
DATE
LOCATION
PRODUCT
QTY
20210105
A
P1
4
20210106
A
P1
3
20210112
A
P1
7
20210104
B
P1
3
20210105
B
P1
1
20210103
A
P2
6
20210105
A
P2
5
I want to retrieve results showing the last transaction per location, per product, so the results should be:
DATE
LOCATION
PRODUCT
QTY
20210112
A
P1
7
20210105
B
P1
1
20210105
A
P2
5
I've looked at answers to similar questions but for some reason can't make the jump from an answer that addresses a similar question to code that works in my environment.
Edit: I've tried the code below, taken from an answer to this question. It returns multiple rows for a single location/part combination. I've tried the other answers in that question to, but have not had luck getting them to execute.
SELECT *
FROM t
WHERE DATE > '20210401' AND DATE in (SELECT max(DATE)
FROM t GROUP BY LOCATION) order by PRODUCT desc
Thank you!
You can use ROW_NUMBER(). For example, if your table is called t you can do:
select *
from (
select *,
row_number() over(partition by location, product
order by date desc) as rn
from t
) x
where rn = 1
You can use lead() to get the last row before a change:
select t.*
from (select t.*,
lead(date) over (partition by location, product order by date) as next_lp_date,
lead(date) over (order by date) as next_date
from t
) t
where next_lp_date is null or next_lp_date <> next_date
It looks like you just needed to match your keys within the subselect.
SELECT *
FROM t T1
WHERE DATE > '20210401'
AND DATE in (SELECT max(DATE) FROM t T2 WHERE T2.Location = T1.Location and T2.Product=T1.Product)

Updating Data having same id but different Data in Row

I have a record with same ID but different data in both rows
While updating the final result should be the last record of that ID present in data.
Example
ID | Name | PermanentAddrss | CurrentLocation
1 | R1 | INDIA | USA
1 | R1 | INDIA | UK
Now for ID 1 the record which will be loaded in database
1|R1|INDIA|UK
How this can be done in SQL server for multiple records?
Please understand that SQL server does not store or fetch data in order of data insertion, so to find the latest/last record you should have some way to order the records.
This is typically a timestamp column like last_modified_date. Your current table is prime candidate for a slow changing dimension type 2; and you should consider implementing it.
See explanation on Kimball's group site.
If you are really not affected by any order and just need a row for each id you can try below query.
select
ID,
Name,
PermanentAddress,
CurrentLocation
from
(select
*,
row_number() over(partition by id order by (select null)) r
from yourtable)t
where r=1
You can identify the latest ID value by:
SELECT B.ID, A.NAME, A.PERMANENTADDRS, A.CURRENTLOCATION
FROM
(SELECT ID, NAME, PERMANENTADDRS, CURRENTLOCATION, MAX(RNUM) AS LATEST_ID FROM
(SELECT ID, NAME, PERMANENTADDRS, CURRENTLOCATION, ROW_NUMBER() OVER (PARTITION BY ID) AS RNUM FROM YOUR_TABLE)
GROUP BY ID, NAME, PERMANENTADDRS, CURRENTLOCATION) A
INNER JOIN
YOUR_TABLE B
ON A.LATEST_ID = B.ID;
This will take the last populated record for a given ID value. If the logic for latest record is different, it can be appropriately incorporated in the query.

Optimize SQL Script: getting range value from another table

My script I believe should be running but it may not be that 'efficient' and the main problem is I guess it's taking too long to run hence when I run it at work, the whole session is being aborted before it finishes.
I have basically 2 tables
Table A - contains every transactions a person do
Person's_ID Transaction TransactionDate
---------------------------------------
123 A 01/01/2017
345 B 04/06/2015
678 C 13/07/2015
123 F 28/10/2016
Table B - contains person's ID and GraduationDate
What I want to do is check if a person is active.
Active = if there is at least 1 transaction done by the person 1 month before his GraduationDate
The run time is too long because imagine if I have millions of persons and each persons do multiple transactions and these transactions are recorded line by line in Table A
SELECT
PERSON_ID
FROM
(SELECT PERSON_ID, TRANSACTIONDATE FROM TABLE_A) A
LEFT JOIN
(SELECT CIN, GRAD_DATE FROM TABLE_B) B
ON A.PERSON_ID = B.PERSON_ID
AND TRANSACTIONDATE <= GRAD_DATE
WHERE TRANSACTIONDATE BETWEEN GRAD_DATE - INTERVAL '30' DAY AND GRAD_DATE;
*Table A and B are products of joined tables hence they are subqueried.
If you just want active customers, I would try exists:
SELECT PERSON_ID
FROM TABLE_A A
WHERE EXISTS (SELECT 1
FROM TABLE_B B
WHERE A.PERSON_ID = B.PERSON_ID AND
A.TRANSACTIONDATE BETWEEN B.GRAD_DATE - INTERVAL '30' DAY AND GRAD_DATE
);
The performance, though, is likely to be similar to your query. If the tables were really tables, I would suggest indexes. In reality, you will probably need to understand the views (so you can create better indexes) or perhaps use temporary tables.
A non-equi-join might be quite inefficient (no matter if it's coded as join or a Not Exists), but the logic can be rewritten to:
SELECT
PERSON_ID
FROM
( -- combine both Selects
SELECT 0 AS flag -- indicating source table
PERSON_ID, TRANSACTIONDATE AS dt
FROM TABLE_A
UNION ALL
SELECT 1 AS flag,
PERSON_ID, GRAD_DATE
FROM TABLE_B
) A
QUALIFY
flag = 1 -- only return a row from table B
AND Min(dt) -- if the previous row (from table A) is within 30 days
Over (PARTITION BY PERSON_ID
ORDER BY dt, flag
ROWS BETWEEN 1 Preceding AND 1 Preceding) >= dt - 30
This assumes that there's only one row from table A per person, otherwise the MIN has to be changed to:
AND MAX(CASE WHEN flag = 1 THEN dt END) -- if the previous row (from table A) is within 30 days
Over (PARTITION BY PERSON_ID
ORDER BY dt, flag
ROWS UNBOUNDED Preceding) >= dt - 30

Oracle - With a one to many relationship, select distinct rows based on a min value

This question is the same as In one to many relationship, return distinct rows based on MIN value with the exception that I'd like to see what the answer looks like in other dialects, particularly in Oracle.
Reposting from the original description:
Let's say a patient makes many visits. I want to write a query that returns distinct patient rows based on their earliest visit. For example, consider the following rows.
patients
-------------
id name
1 Bob
2 Jim
3 Mary
visits
-------------
id patient_id visit_date reference_number
1 1 6/29/14 09f3be26
2 1 7/8/14 34c23a9e
3 2 7/10/14 448dd90a
What I want to see returned by the query is:
id name first_visit_date reference_number
1 Bob 6/29/14 09f3be26
2 Jim 7/10/14 448dd90a
In the other question, using postgresql, the best solution seemed to be to use distinct on, but that is not available in other dialects.
Typically, one uses row_number():
select id, name, visit_date as first_visit_date, reference_number
from (select v.id, p.name, v.visit_date, v.reference_number,
row_number() over (partition by p.id order by v.visit_date desc) as seqnum
from visits v join
patients p
on v.patient_id p.id
) t
where seqnum = 1;