Strange behavior from lag(), skipping over certain rows - sql

I have a table we'll call service, toy version is:
CREATE TABLE service (
service_id SERIAL PRIMARY KEY,
client_id INTEGER NOT NULL REFERENCES client (client_id),
service_date TIMESTAMP(0) NOT NULL CHECK (service_date::DATE <= CURRENT_DATE),
-- system fields
added_by INTEGER NOT NULL REFERENCES staff (staff_id),
added_at TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP,
changed_by INTEGER NOT NULL REFERENCES staff (staff_id),
changed_at TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP,
is_deleted BOOLEAN NOT NULL DEFAULT FALSE
)
The last five system fields are for database bookkeeping, and to enable soft (front end) deletion.
There's another table service_log which, together with service allows us to view a full revision history for each record in service. Fields of service_log are the same as service, plus a few other fields that facilitate the revision history. On update or insert to service, the state of the data is recorded as an insert to service_log. Obviously service_log.service_id is not a primary key nor is it unique, since each service record may have multiple revisions.
Our database has security features preventing hard (database-level) deletion of records. However, while investigating a bug I wanted to rule out the possibility that there may have been records hard-deleted from service but with evidence still present in service_log. The following query looking for service_log records without service_id present in service returns ~800 records (about 0.5% of all services since 2020):
WITH services_in_context AS (
SELECT service_id,
lag(service_id) OVER (ORDER BY service_id) AS prev_id
FROM service
WHERE service_date::DATE >= '1/1/2020'::DATE
)
SELECT DISTINCT log.service_id
FROM service_log log
INNER JOIN services_in_context s
ON log.service_id BETWEEN s.prev_id + 1 AND s.service_id - 1
ORDER BY log.service_id DESC;
However, the anomaly goes away when I restructure the query to use GENERATE_SERIES(). And the "missing" rows from service are actually there when I query them individually.
All this leads me to believe that lag(service_id) OVER (ORDER BY service_id) is skipping some records in the table. Which makes me think maybe the index postgres created for the primary key is corrupt? Is this the likely culprit, and if so what is the best way to fix it? Is there potentially a different reason lag() is missing some primary keys?

What is reason for the WHERE statement and what is the service_date for the records returned? Your WHERE statement can easily create a gap.
WITH services_in_context AS (
SELECT service_id,
lag(service_id) OVER (ORDER BY service_id) AS prev_id
FROM service
)
SELECT DISTINCT log.service_id
FROM service_log log
INNER JOIN services_in_context s
ON log.service_id BETWEEN s.prev_id + 1 AND s.service_id - 1
ORDER BY log.service_id DESC;
Your requirement can also be more easily satisfied by a simple LEFT JOIN.
SELECT sl.*
FROM service_log sl
LEFT JOIN service s ON s.service_id = sl.service_id
WHERE s.service_id IS NULL

Related

SSIS incremental data load error

I am trying to perform incremental insert from staging table (cust_reg_dim_stg) to the warehouse table (dim_cust_reg). This is the query I am using.
insert into dim_cust_reg WITH(TABLOCK)
(
channel_id
,cust_reg_id
,cust_id
,status
,date_created
,date_activated
,date_archived
,custodian_id
,reg_type_id
,reg_flags
,acc_name
,acc_number
,sr_id
,sr_type
,as_of_date
,ins_timestamp
)
select channel_id
,cust_reg_id
,cust_id
,status
,date_created
,date_activated
,date_archived
,reg_type_id
,reg_flags
,acc_name
,acc_number
,sr_id
,sr_type
,as_of_date
,getdate() ins_timestamp
from umpdwstg..cust_reg_dim_stg stg with(nolock)
join lookup_channel ch with(nolock) on stg.channel_name = ch.channel_name
where not exists
(select * from dim_cust_reg dest
where dest.cust_reg_id=stg.cust_reg_id
and dest.sr_id=stg.sr_id
and dest.channel_id=ch.channel_id )
Here channel_id is not there in the staging table and is taken using a channel lookup table (lookup_channel). On running this query I am getting the following error.
Violation of PRIMARY KEY constraint 'PK__dim_cust__4A293521A789A5FA'.
Cannot insert duplicate key in object 'dbo.dim_cust_reg'.
What is wrong with the query? channel_id,sr_id and cust_reg_id forms the unique key combination. There seems to be no data error.
There are 2 areas where you will need to troubleshoot:-
In this code below:
join lookup_channel ch with(nolock) on stg.channel_name = ch.channel_name
The incoming channel_name in the staging table may have a different channel name as compared to the record in the destination dimension.
OR
it may be because of this join condition inside the NOT EXISTS condition:
and dest.sr_id=stg.sr_id
and dest.channel_id=ch.channel_id
Here, again the incoming channel_id may be different when you compare the staged data to the one in the destination. So, suggestion is to ignore the channel id once and try to troubleshoot. Once this data is loaded in the target you can get the exact reason whether error was because of the channel_id.
Happy troubleshooting!
If there is already a duplicate entries in the table - custr_regr_dim_stg - then the SELECT query will produce both those records and will try to insert the same into the dim_cust_reg table. So use DISTINCT in the SELECT statement.

PostgreSQL: Get last updates by joining 2 tables

I have 2 tables that I need to join to get the last/latest update in the 2nd table based on valid rows in the 1st table.
Code below is en example.
Table 1: Registered users
This table contains a list of users registered in the system.
When a user gets registered it gets added into this table. A user is registered with a name, and a registration time.
A user can get de-registered from the system. When this is done, the de-registration column gets updated to the time that the user was removed. If this value is NULL, it means that the user is still registered.
CREATE TABLE users (
entry_idx SERIAL PRIMARY KEY,
name TEXT NOT NULL,
reg_time TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
dereg_time TIMESTAMP WITH TIME ZONE DEFAULT NULL
);
Table 2: User updates
This table contains updates on the users. Each time a user changes a property (example position) the change gets stored in this table. No updates must be removed since there is a requirement to keep history in the table.
CREATE TABLE user_updates (
entry_idx SERIAL PRIMARY KEY,
name TEXT NOT NULL,
position INTEGER NOT NULL,
time TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Required output
So given the above information, I need to get a new table that contains only the last update for the current registered users.
Test Data
The following data can be used as test data for the above tables:
-- Register 3 users
INSERT INTO users(name) VALUES ('Person1');
INSERT INTO users(name) VALUES ('Person2');
INSERT INTO users(name) VALUES ('Person3');
-- Add some updates for all users
INSERT INTO user_updates(name, position) VALUES ('Person1', 0);
INSERT INTO user_updates(name, position) VALUES ('Person1', 1);
INSERT INTO user_updates(name, position) VALUES ('Person1', 2);
INSERT INTO user_updates(name, position) VALUES ('Person2', 1);
INSERT INTO user_updates(name, position) VALUES ('Person3', 1);
-- Unregister the 2nd user
UPDATE users SET dereg_time = NOW() WHERE name = 'Person2';
From the above, I want the last updates for Person 1 and Person 3.
Failed attempt
I have tried using joins and other methods but the results are not what I am looking for. The question is almost the same as one asked here. I have used the solution in answer 1 and it does give the correct answer, but it takes too long to get too the answer in my system.
Based on the above link I have created the following query that 'works':
SELECT
t1.*
, t2.*
FROM
users t1
JOIN (
SELECT
t.*,
row_number()
OVER (
PARTITION BY
t.name
ORDER BY t.entry_idx DESC
) rn
FROM user_updates t
) t2
ON
t1.name = t2.name
AND
t2.rn = 1
WHERE
t1.dereg_time IS NULL;
Problem
The problem with the above query is that it takes very long to complete. Table 1 contains a small list of users, while table 2 contains a huge amount of updates. I think that the query might be inefficient in the way that it handles the 2 tables (based on my limited understanding of the query). From pgAdmin's explain it does a lot of sorting and aggregation on the updates 1st before joining with the registered table.
Question
How can I formulate a query to efficiently and fast get the latest updates for registered users?
PostgreSQL have a special distinct on syntax for such type of queries:
select distinct on(t1.name)
--it's better to specify columns explicitly, * just for example
t1.*, t2.*
from users as t1
left outer join user_updates as t2 on t2.name = t1.name
where t1.dereg_time is null
order by t1.name, t2.entry_idx desc
sql fiddle demo
you can try it, but for me your query should work fine too.
I am using q1 to get the last update of each user. Then joining with users to remove entries that have been deregistered. Then joining with q2 to get rest of user_update fields.
select users.*,q2.* from users
join
(select name,max(time) t from user_updates group by name) q1
on users.name=q1.name
join user_updates q2 on q1.t=q2.time and q1.name=q2.name
where
users.dereg_time is null
(I haven't tested it. have edited some things)

SQL Database Design for SSIS

OK my first question so here goes.
Currently users are using a huge Access Application. They wanted a web application with some functionality based off of the Access data and with some modifications.
Ok no problem. I used the Access to SQL migration assistant to convert the data over and then wrote some SSIS packages which are executed from the web end to allow the application to be updated as needed. All here is good.
Here is where I am kind of stumped. There are 2 types of imports, quarterly and yearly. The quarterly is fine but the yearly import is causing issues. The yearly import can be for an adopted budget or for a proposed budget (each is held in a separate Access db). I have one SSIS package for each type of yearly import. The table where the information goes is as follows..
CREATE TABLE Budget
(
BudgetID uniqueidentifier NOT NULL,
ProjectNumber int NOT NULL,
SubProjectNumber varchar(6) NOT NULL,
FiscalYearBegin int NOT NULL,
FiscalYearEnd int NOT NULL,
Sequence int NULL,
QuarterImportDate datetime NULL,
ProposedBudget money NULL,
AdoptedBudget money NULL,
CONSTRAINT PK_Budget PRIMARY KEY CLUSTERED
(
BudgetID ASC
),
CONSTRAINT uc_Budget UNIQUE NONCLUSTERED
(
ProjectNumber ASC,
SubProjectNumber ASC,
FiscalYearBegin ASC,
FiscalYearEnd ASC,
Sequence ASC
)
)
Also, there can be multiple versions of the budget for the specific year in terms of Project, SubProject, FiscalYearBegin, and FiscalYearEnd. Thats is why there is a sequence number.
So the problem becomes, since I have 2 different SSIS packages, each of which is an update statement on 1 specific column (either ProposedBudget or AdoptedBudget), I have no effective way of keeping track of the correct sequence.
Please let me know if I can make this any clearer, and any advice would be great!
Thanks.
Something like this will get you the next item with an empty AdoptedBudget, but I think you will need a cursor when there are multiple AdoptedBudgets. I was thinking of doing a nested subquery with an update, but that won't work when there are multiple AdoptedBudgets. It sounds like in the source application they should be selecting a ProposedBudget whenever they add the AdoptedBudget so that a relationship can be created. This way it is clear which AdoptedBudget goes with which ProposedBudget, and it would be a simple join. I have almost the same scenario, but the difference is that I don't keep all the versions. I only have to keep the most current "ProposedBudget" and most current "AdoptedBudget". It's a little bit more difficult trying to sequence them all.
--get the smallest SequenceId with an unfilled AdoptedBudget
Select min(SequenceID),
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
From Budgets b
Where AdoptedBudget is null
Group By
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
--This won't work I don't believe
Update Budgets
Set AdoptedBudget = BudgetAmount
From Budgets b
Inner Join SourceAdoptedBudgets ab on
b.ProjectNumber = ab.ProjectNumber
b.FiscalYearBegin = ab.FiscalYearBegin
b.FiscalYearEnd = ab.FiscalYearEnd
Inner Join
(
--get the smallest SequenceId with an unfilled AdoptedBudget
Select min(SequenceID),
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
From Budgets b
Where AdoptedBudget is null
Group By
ProjectNumber,
FiscalYearBegin,
SubProjectNumber --any other fields needed for the join
) as nextBudgets
on --the join fields again
Something like this using the BudgetType. Of course you'd probably create a code table for these or a IsAdopted bit field. But you get the idea.
Select
budgets.*
,row_number() over(partition by
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
order by QuarterImportDate) as SequenceNumber
From
(
Select
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
,QuarterImportDate
,'Proposed' as BudgetType
,ProposedBudget as Budget
From sourceProposed
Union
Select
ProjectNumber
,SubProjectNumber
,FiscalYearBegin
,FiscalYearEnd
,QuarterImportDate
,'Adopted' as BudgetType
,AdoptedBudget as Budget
From sourceAdopted
) as budgets

How can I calculate the top % daily price changes using MySQL?

I have a table called prices which includes the closing price of stocks that I am tracking daily.
Here is the schema:
CREATE TABLE `prices` (
`id` int(21) NOT NULL auto_increment,
`ticker` varchar(21) NOT NULL,
`price` decimal(7,2) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `ticker` (`ticker`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=2200 ;
I am trying to calculate the % price drop for anything that has a price value greater than 0 for today and yesterday. Over time, this table will be huge and I am worried about performance. I assume this will have to be done on the MySQL side rather than PHP because LIMIT will be needed here.
How do I take the last 2 dates and do the % drop calculation in MySQL though?
Any advice would be greatly appreciated.
One problem I see right off the bat is using a timestamp data type for the date, this will complicate your sql query for two reasons - you will have to use a range or convert to an actual date in your where clause, but, more importantly, since you state that you are interested in today's closing price and yesterday's closing price, you will have to keep track of the days when the market is open - so Monday's query is different than tue - fri, and any day the market is closed for a holiday will have to be accounted for as well.
I would add a column like mktDay and increment it each day the market is open for business. Another approach might be to include a 'previousClose' column which makes your calculation trivial. I realize this violates normal form, but it saves an expensive self join in your query.
If you cannot change the structure, then you will do a self join to get yesterday's close and you can calculate the % change and order by that % change if you wish.
Below is Eric's code, cleaned up a bit it executed on my server running mysql 5.0.27
select
p_today.`ticker`,
p_today.`date`,
p_yest.price as `open`,
p_today.price as `close`,
((p_today.price - p_yest.price)/p_yest.price) as `change`
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.`date`) = date(p_yest.`date`) + INTERVAL 1 DAY
and p_today.price > 0
and p_yest.price > 0
and date(p_today.`date`) = CURRENT_DATE
order by `change` desc
limit 10
Note the back-ticks as some of your column names and Eric's aliases were reserved words.
Also note that using a where clause for the first table would be a less expensive query - the where get's executed first and only has to attempt to self join on the rows that are greater than zero and have today's date
select
p_today.`ticker`,
p_today.`date`,
p_yest.price as `open`,
p_today.price as `close`,
((p_today.price - p_yest.price)/p_yest.price) as `change`
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.`date`) = date(p_yest.`date`) + INTERVAL 1 DAY
and p_yest.price > 0
where p_today.price > 0
and date(p_today.`date`) = CURRENT_DATE
order by `change` desc
limit 10
Scott brings up a great point about consecutive market days. I recommend handling this with a connector table like:
CREATE TABLE `market_days` (
`market_day` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`date` DATE NOT NULL DEFAULT '0000-00-00',
PRIMARY KEY USING BTREE (`market_day`),
UNIQUE KEY USING BTREE (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0
;
As more market days elapse, just INSERT new date values in the table. market_day will increment accordingly.
When inserting prices data, lookup the LAST_INSERT_ID() or corresponding value to a given date for past values.
As for the prices table itself, you can make storage, SELECT and INSERT operations much more efficient with a useful PRIMARY KEY and no AUTO_INCREMENT column. In the schema below, your PRIMARY KEY contains intrinsically useful information and isn't just a convention to identify unique rows. Using MEDIUMINT (3 bytes) instead of INT (4 bytes) saves an extra byte per row and more importantly 2 bytes per row in the PRIMARY KEY - all while still affording over 16 million possible dates and ticker symbols (each).
CREATE TABLE `prices` (
`market_day` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`ticker_id` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`price` decimal (7,2) NOT NULL DEFAULT '00000.00',
PRIMARY KEY USING BTREE (`market_day`,`ticker_id`),
KEY `ticker_id` USING BTREE (`ticker_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
;
In this schema each row is unique across each pair of market_day and ticker_id. Here ticker_id corresponds to a list of ticker symbols in a tickers table with a similar schema to the market_days table:
CREATE TABLE `tickers` (
`ticker_id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`ticker_symbol` VARCHAR(5),
`company_name` VARCHAR(50),
/* etc */
PRIMARY KEY USING BTREE (`ticker_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0
;
This yields a similar query to others proposed, but with two important differences: 1) There's no functional transformation on the date column, which destroys MySQL's ability to use keys on the join; in the query below MySQL will use part of the PRIMARY KEY to join on market_day. 2) MySQL can only use one key per JOIN or WHERE clause. In this query MySQL will use the full width of the PRIMARY KEY (market_day and ticker_id) whereas in the previous query it could only use one (MySQL will usually pick the more selective of the two).
SELECT
`market_days`.`date`,
`tickers`.`ticker_symbol`,
`yesterday`.`price` AS `close_yesterday`,
`today`.`price` AS `close_today`,
(`today`.`price` - `yesterday`.`price`) / (`yesterday`.`price`) AS `pct_change`
FROM
`prices` AS `today`
LEFT JOIN
`prices` AS `yesterday`
ON /* uses PRIMARY KEY */
`yesterday`.`market_day` = `today`.`market_day` - 1 /* this will join NULL for `today`.`market_day` = 0 */
AND
`yesterday`.`ticker_id` = `today`.`ticker_id`
INNER JOIN
`market_days` /* uses first 3 bytes of PRIMARY KEY */
ON
`market_days`.`market_day` = `today`.`market_day`
INNER JOIN
`tickers` /* uses KEY (`ticker_id`) */
ON
`tickers`.`ticker_id` = `today`.`ticker_id`
WHERE
`today`.`price` > 0
AND
`yesterday`.`price` > 0
;
A finer point is the need to also join against tickers and market_days in order to display the actual ticker_symbol and date, but these operations are very fast since they make use of keys.
Essentially, you can just join the table to itself to find the given % change. Then, order by change descending to get the largest changers on top. You could even order by abs(change) if you want the largest swings.
select
p_today.ticker,
p_today.date,
p_yest.price as open,
p_today.price as close,
--Don't have to worry about 0 division here
(p_today.price - p_yest.price)/p_yest.price as change
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.date) = date(date_add(p_yest.date interval 1 day))
and p_today.price > 0
and p_yest.price > 0
and date(p_today.date) = CURRENT_DATE
order by change desc
limit 10

What's the best way to store (and access) historical 1:M relationships in a relational database?

Hypothetical example:
I have Cars and Owners. Each Car belongs to one (and only one) Owner at a given time, but ownership may be transferred. Owners may, at any time, own zero or more cars. What I want is to store the historical relationships in a MySQL database such that, given an arbitrary time, I can look up the current assignment of Cars to Owners.
I.e. At time X (where X can be now or anytime in the past):
Who owns car Y?
Which cars (if any) does owner Z own?
Creating an M:N table in SQL (with a timestamp) is simple enough, but I'd like to avoid a correlated sub-query as this table will get large (and, hence, performance will suffer). Any ideas? I have a feeling that there's a way to do this by JOINing such a table with itself, but I'm not terribly experienced with databases.
UPDATE: I would like to avoid using both a "start_date" and "end_date" field per row as this would necessitate a (potentially) expensive look-up each time a new row is inserted. (Also, it's redundant).
Make a third table called CarOwners with a field for carid, ownerid and start_date and end_date.
When a car is bought fill in the first three and check the table to make sure no one else is listed as the owner. If there is then update the record with that data as the end_date.
To find current owner:
select carid, ownerid from CarOwner where end_date is null
To find owner at a point in time:
select carid, ownerid from CarOwner where start_date < getdate()
and end_date > getdate()
getdate() is MS SQL Server specific, but every database has some function that returns the current date - just substitute.
Of course if you also want additional info from the other tables, you would join to them as well.
select co.carid, co.ownerid, o.owner_name, c.make, c.Model, c.year
from CarOwner co
JOIN Car c on co.carid = c.carid
JOIN Owner o on o.ownerid = co.ownerid
where co.end_date is null
I've found that the best way to handle this sort of requirement is to just maintain a log of VehicleEvents, one of which would be ChangeOwner. In practice, you can derive the answers to all the questions posed here - at least as accurately as you are collecting the events.
Each record would have a timestamp indicating when the event occurred.
One benefit of doing it this way is that the minimum amount of data can be added in each event, but the information about the Vehicle can accumulate and evolve.
Also, with the timestamp, events can be added after the fact (as long as the timestamp accurately reflects when the event occurred.
Trying to maintain historical state for something like this in any other way I've tried leads to madness. (Maybe I'm still recovering. :D)
BTW, the distinguishing characteristic here is probably that it's a Time Series or Event Log, not that it's 1:m.
Given your business rule that each car belongs to at least one owner (ie. owners exist before they are assigned to a a car) and your operational constraint that the table may grow large, I'd design the schema as follows:
(generic sql 92 syntax:)
CREATE TABLE Cars
(
CarID integer not null default autoincrement,
OwnerID integer not null,
CarDescription varchar(100) not null,
CreatedOn timestamp not null default current timestamp,
Primary key (CarID),
FOREIGN KEY (OwnerID ) REFERENCES Owners(OwnerID )
)
CREATE TABLE Owners
(
OwnerID integer not null default autoincrement,
OwnerName varchar(100) not null,
Primary key(OwnerID )
)
CREATE TABLE HistoricalCarOwners
(
CarID integer not null,
OwnerID integer not null,
OwnedFrom timestamp null,
Owneduntil timestamp null,
primary key (cardid, ownerid),
FOREIGN KEY (OwnerID ) REFERENCES Owners(OwnerID ),
FOREIGN KEY (CarID ) REFERENCES Cars(CarID )
)
I personally would not touch the third table from my client application but would simply let the database do the work - and maintain data integrity - with ON UPDATE AND ON DELETE triggers on the Cars table to populate the HistoricalCarOwners table whenever a car changes owners (i.e whenever an UPDATE is committed on the OwnerId column) or a car is deleted.
With the above schema, selecting the current car owner is trivial and selecting historical car owners is a simple as
select ownerid, ownername from owners o inner join historicalcarowners hco
on hco.ownerid = o.ownerid
where hco.carid = :arg_id and
:arg_timestamp between ownedfrom and owneduntil
order by ...
HTH, Vince
If you really do not want to have a start and end date you can use just a single date and do a query like the following.
SELECT * FROM CarOwner co
WHERE co.CarId = #CarId
AND co.TransferDate <= #AsOfDate
AND NOT EXISTS (SELECT * FROM CarOwner co2
WHERE co2.CarId = #CarId
AND co2.TransferDate <= #AsOfDate
AND co2.TransferDate > co.Transferdate)
or a slight variation
SELECT * FROM Car ca
JOIN CarOwner co ON ca.Id = co.CarId
AND co.TransferDate = (SELECT MAX(TransferDate)
FROM CarOwner WHERE CarId = #CarId
AND TransferDate < #AsOfDate)
WHERE co.CarId = #CarId
These solution are functionally equivalent to Javier's suggestion but depending on the database you are using one solution may be faster than the other.
However, depending on your read versus write ratio you may find the performance better if you redundantly update the end date in the associative entity.
Why not have a transaction table? Which would contain the car ID, the FROM owner, the TO owner and the date the transaction occcured.
Then all you do is find the first transaction for a car before the desired date.
To find cars owned by Owner 253 on March 1st:
SELECT * FROM transactions WHERE ownerToId = 253 AND date > '2009-03-01'
cars table can have an id called ownerID, YOu can then simply
1.select car from cars inner join owners on car.ownerid=owner.ownerid where ownerid=y
2.select car from cars where owner=z
Not the exact syntax but simple pseudo code.