What's the best way to store (and access) historical 1:M relationships in a relational database? - sql

Hypothetical example:
I have Cars and Owners. Each Car belongs to one (and only one) Owner at a given time, but ownership may be transferred. Owners may, at any time, own zero or more cars. What I want is to store the historical relationships in a MySQL database such that, given an arbitrary time, I can look up the current assignment of Cars to Owners.
I.e. At time X (where X can be now or anytime in the past):
Who owns car Y?
Which cars (if any) does owner Z own?
Creating an M:N table in SQL (with a timestamp) is simple enough, but I'd like to avoid a correlated sub-query as this table will get large (and, hence, performance will suffer). Any ideas? I have a feeling that there's a way to do this by JOINing such a table with itself, but I'm not terribly experienced with databases.
UPDATE: I would like to avoid using both a "start_date" and "end_date" field per row as this would necessitate a (potentially) expensive look-up each time a new row is inserted. (Also, it's redundant).

Make a third table called CarOwners with a field for carid, ownerid and start_date and end_date.
When a car is bought fill in the first three and check the table to make sure no one else is listed as the owner. If there is then update the record with that data as the end_date.
To find current owner:
select carid, ownerid from CarOwner where end_date is null
To find owner at a point in time:
select carid, ownerid from CarOwner where start_date < getdate()
and end_date > getdate()
getdate() is MS SQL Server specific, but every database has some function that returns the current date - just substitute.
Of course if you also want additional info from the other tables, you would join to them as well.
select co.carid, co.ownerid, o.owner_name, c.make, c.Model, c.year
from CarOwner co
JOIN Car c on co.carid = c.carid
JOIN Owner o on o.ownerid = co.ownerid
where co.end_date is null

I've found that the best way to handle this sort of requirement is to just maintain a log of VehicleEvents, one of which would be ChangeOwner. In practice, you can derive the answers to all the questions posed here - at least as accurately as you are collecting the events.
Each record would have a timestamp indicating when the event occurred.
One benefit of doing it this way is that the minimum amount of data can be added in each event, but the information about the Vehicle can accumulate and evolve.
Also, with the timestamp, events can be added after the fact (as long as the timestamp accurately reflects when the event occurred.
Trying to maintain historical state for something like this in any other way I've tried leads to madness. (Maybe I'm still recovering. :D)
BTW, the distinguishing characteristic here is probably that it's a Time Series or Event Log, not that it's 1:m.

Given your business rule that each car belongs to at least one owner (ie. owners exist before they are assigned to a a car) and your operational constraint that the table may grow large, I'd design the schema as follows:
(generic sql 92 syntax:)
CREATE TABLE Cars
(
CarID integer not null default autoincrement,
OwnerID integer not null,
CarDescription varchar(100) not null,
CreatedOn timestamp not null default current timestamp,
Primary key (CarID),
FOREIGN KEY (OwnerID ) REFERENCES Owners(OwnerID )
)
CREATE TABLE Owners
(
OwnerID integer not null default autoincrement,
OwnerName varchar(100) not null,
Primary key(OwnerID )
)
CREATE TABLE HistoricalCarOwners
(
CarID integer not null,
OwnerID integer not null,
OwnedFrom timestamp null,
Owneduntil timestamp null,
primary key (cardid, ownerid),
FOREIGN KEY (OwnerID ) REFERENCES Owners(OwnerID ),
FOREIGN KEY (CarID ) REFERENCES Cars(CarID )
)
I personally would not touch the third table from my client application but would simply let the database do the work - and maintain data integrity - with ON UPDATE AND ON DELETE triggers on the Cars table to populate the HistoricalCarOwners table whenever a car changes owners (i.e whenever an UPDATE is committed on the OwnerId column) or a car is deleted.
With the above schema, selecting the current car owner is trivial and selecting historical car owners is a simple as
select ownerid, ownername from owners o inner join historicalcarowners hco
on hco.ownerid = o.ownerid
where hco.carid = :arg_id and
:arg_timestamp between ownedfrom and owneduntil
order by ...
HTH, Vince

If you really do not want to have a start and end date you can use just a single date and do a query like the following.
SELECT * FROM CarOwner co
WHERE co.CarId = #CarId
AND co.TransferDate <= #AsOfDate
AND NOT EXISTS (SELECT * FROM CarOwner co2
WHERE co2.CarId = #CarId
AND co2.TransferDate <= #AsOfDate
AND co2.TransferDate > co.Transferdate)
or a slight variation
SELECT * FROM Car ca
JOIN CarOwner co ON ca.Id = co.CarId
AND co.TransferDate = (SELECT MAX(TransferDate)
FROM CarOwner WHERE CarId = #CarId
AND TransferDate < #AsOfDate)
WHERE co.CarId = #CarId
These solution are functionally equivalent to Javier's suggestion but depending on the database you are using one solution may be faster than the other.
However, depending on your read versus write ratio you may find the performance better if you redundantly update the end date in the associative entity.

Why not have a transaction table? Which would contain the car ID, the FROM owner, the TO owner and the date the transaction occcured.
Then all you do is find the first transaction for a car before the desired date.
To find cars owned by Owner 253 on March 1st:
SELECT * FROM transactions WHERE ownerToId = 253 AND date > '2009-03-01'

cars table can have an id called ownerID, YOu can then simply
1.select car from cars inner join owners on car.ownerid=owner.ownerid where ownerid=y
2.select car from cars where owner=z
Not the exact syntax but simple pseudo code.

Related

exclude return values from select clause

i have the below table called reserves in mysql server 8.0:
RESERVES (
res_id INT NOT NULL AUTO INCREMENT,
product_id INT NOT NULL,
start_date DATE NOT NULL,
finish_date DATE NOT NULL,
PRIMARY KEY(res_id),
FOREIGN KEY(product_id) REFERENCES PRODUCT(product_id) ON UPDATE CASCADE ON DELETE RESTRICT
);
the customer will fill a form to specify the dates he want it (start_res_date and finish_res_date), so i have to make a query to check if it is available on this time period.
i am stuck here, mainly because there can be multiple reservations for this specific product.
so, something like this:
SELECT DISTINCT product_id FROM RESERVES
WHERE start_res_date >= finish_date OR finish_res_date <= start_date
won't work, because it will return the product_id if it is ok with the dates of one of its reservetions.
what i want, is to reject its product_id, if it is anavailable for, at least, one reservetion in the table.
any ideas for how to approach it? thanks.
A product is reserved during the time window if any part of its reservation window overlaps with your window. This is easiest to see in graphical form:
i.e. the green bars represent those other reservations which conflict with your booking window / the red bars are bookings which are fully outside your window, so no conflicts occur.
Related to this graphic is a blog post going into a lot more detail:.
You should use SQL such as below to check if a product is available in a given time slot:
select top 1 1 ProductIsNotAvailable
from Reserves
where product_id = #productToBeReserved
and start_res_date < #newReservationFinishDate
and finish_res_date => #newReservationStartDate
i.e. only allow the booking if you don't get the response 1 to the above query.

DB: How to set up a many to many table(s) to handle multiple selectable conditions

I am working on a search filter for a website that will help users find a venue(for get-togethers and ceremonies) that meets their needs. Filters would include such things as: style, amenities, event type, etc. Multiple options in a category can apply to a venue, so a user can select multiple options from style, amenities and event type categories when searching.
My issue is in how I should approach the table design in the database. Currently I have a Venue table with a unique id and basic information, and a number of tables representing each category (style, amenities, etc) where they contain an id and name field.
I know that I need an intermediary table to hold foreign keys, so each option applicable to a category is associated to the venue.
Option 1: Create for each category table a many to many intermediary table with foreign keys to that category and the venue.
Option 2: Create one large intermediary table with foreign keys for every category, as well as the Venue
i.e.
fk_venue
fk_style
fk_amenities
...
I am trying to decide what is more efficient and less of a problem in coding for. Option 1 would require a query to each table which may become complicated to work with, where as option 2 seems easier to query but might have a much larger number of records to handle a venue with many amenities AND event types for example.
This doesn't seem like a new problem but I have had trouble finding resources that detail how best to approach this. We are currently using MSSQL for the DB and are building the site using .net core.
Go with option one. Create a join table to record the many-to-many relationships of each available feature of a venue. Option 2 is very wasteful in terms of storage. Consider a case where you have a venue with only one amenity, when 50 amenities types are available. Also, as I understand what you are suggesting for option 2, you would have to update your database design each time you add an amenity, event_type, or style. That would be a very difficult thing support wise.
In the case of Option 1, some of the tables would be:
Table Name: venue_amenities
Columns: venue_id, amenity_id
Table Name: venue_event_types
Columns: venue_id, event_type_id
Table Name: venue_styles
Columns: venue_id, style_id
When you query everything with a filter, you could query it like:
select distinct
v.venue_id
from venues v
inner join venue_amenities va on v.venue_id = va.venue_id
inner join venue_event_types vet on v.venue_id = vet.venue_id
inner join venue_styles vs on v.venue_id = vs.venue_id
where va.amenity_id in ([selected amenities])
and vet.event_type_id in ([selected event types])
and vs.venue_style in ([selected styles])
Option 3: You could start out with a meta data design. This would allow you to have multiple records per item or entity.
Often these things evolve with the development of tasks, or the evolution of the process and learning the data or the customer understanding some of the finer details that are drawn out as time goes on.
I've seen similar things where people design for hashtags or white lists, searching for that might get you closer to what you are looking for. Here is a working example to get you started.
declare #venue as table(
VenueID int identity(1,1) not null primary key clustered
, Name_ nvarchar(255) not null
, Address_ nvarchar(255) null
);
declare #venueType as table (
VenueTypeID int identity(1,1) not null primary key clustered
, VenueType nvarchar(255) not null
);
declare #venueStuff as table (
VenueStuffID int identity(1,1) not null primary key clustered
, VenueID int not null -- constraint back to venueid
, VenueTypeID int not null -- constraint to dim or lookup table for ... attribute types
, AttributeValue nvarchar(255) not null
);
insert into #venue (Name_)
select 'Bob''s Funhouse'
insert into #venueStuff (VenueID, VenueTypeID, AttributeValue)
select 1, 1, 'Scarrrrry' union all
select 1, 2, 'Food Avaliable' union all
select 1, 3, 'Game tables provided' union all
select 1, 4, 'Creepy';
insert into #venueType (VenueType)
select 'Haunted House Theme' union all
select 'Gaming' union all
select 'Concessions' union all
select 'post apocalyptic';
select a.Name_
, b.AttributeValue
, c.VenueType
from #venue a
join #venueStuff b
on a.VenueID = b.VenueID
join #venueType c
on c.VenueTypeID = b.VenueTypeID

How to design this table schema (and how to query it)

I have a table that stores a list of members - for the sake of simplicity, I will use a simple real-world case that models my use case.
Let's use the analogy of a sports club or gym.
The membership of the gym changes every three months (for example) - with some old members leaving, some new members joining and some members staying unchanged.
I want to run a query on the table - spanning a multi-time period and return the average weight of all of the members in the club.
These are the tables I have come up with so far:
-- A table containing all members the gym has ever had
-- current members have their leave_date field left at NULL
-- departed members have their leave_date field set to the days they left the gym
CREATE TABLE IF NOT EXISTS member (
id PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
join_date DATE NOT NULL,
-- set to NULL if user has not left yet
leave_date DATE DEFAULT NULL
);
-- A table of members weights.
-- This table is populated DAILY,after the weights of CURRENT members
-- has been recorded
CREATE TABLE IF NOT EXISTS current_member_weight (
id PRIMARY KEY NOT NULL,
calendar_date DATE NOT NULL,
member_id INTEGER REFERENCES member(id) NOT NULL,
weight REAL NOT NULL
);
-- I want to write a query that returns the AVERAGE daily weight of
-- CURRENT members of the gym. The query should take a starting_date
-- and an ending_date between which to calculate the daily
-- averages. The aver
-- PSEUDO SQL BELOW!
SELECT calendar_date, AVG(weight)
FROM member, current_member_weight
WHERE calendar_date BETWEEN(starting_date, ending_date);
I have two questions:
can the schema above be improved - if yes, please illustrate
How can I write an SQL* query to return the average weights calculated for all members in the gym during a specified period (t1, t2), where (t1,t2) spans a period that members have joined/left the gym?
[[Note about SQL]]
Preferably, any SQL shown would be database anagnostic, however if a particular flavour of SQL is to be used, I'd prefer PostgreSQL, since that this is the database I'm using.
Below SQL would work as long as the data in the gym_member table is consistent with the joining and leaving date of each member (i.e. for any member, the gym_member table should not have rows with calendar_date less his joining date or with calendar_date greater than his leaving date)
SELECT
gm.calendar_date,
AVG(gm.weight) avg_weight
FROM
member m,
gym_member gm
WHERE
m.id = gm.member_id
AND
gm.calendar_date >= '1-Jan-2017'
AND
gm.calendar_date <= '31-Dec-2017'
GROUP BY
gm.calendar_date

I want to make SQL tables that are updated daily yet retain every single day's contents for later lookup. What is the best practice for this?

Basically I'm trying to create a database schema based around multiple unrelated tables that will not need to reference each other AFAIK.
Each table will be a different "category" that will have the same columns in each table - name, date, two int values and then a small string value.
My issue is that each one will need to be "updated" daily, but I want to keep a record of the items for every single day.
What's the best way to go about doing this? Would it be to make the composite key the combination of the date and the name? Or use something called a "trigger"?
Sorry I'm somewhat new to database design, I can be more specific if I need to be.
Yes, you have to create a trigger for each category table
I'm assuming name is PK for each table? If isnt the case, you will need create a PK.
Lets say you have
table categoryA
name, date, int1, int2, string
table categoryB
name, date, int1, int2, string
You will create another table to store changes log.
table category_history
category_table, name, date, int1, int2, string, changeDate
You create two trigger, one for each category table
Where you save what table gerate the update and what time was made.
create trigger before update for categoryA
INSERT INTO category_history VALUES
('categoryA', OLD.name, OLD.date, OLD.int1, Old.int2, OLD.string, NOW());
This is pseudo code, you need write trigger using your rdbms syntaxis, and check how get system date now().
As has already been pointed out, it is poor design to have different identical tables for each category. Better would be a Categories table with one entry for each category and then a Dailies table with the daily information.
create table Categories(
ID smallint not null auto_generated,
Name varchar( 20 ) not null,
..., -- other information about each category
constraint UQ_Category_Name unique( Name ),
constraint PK_Categories( ID )
);
create table Dailies(
CatID smallint not null,
UpdDate date not null,
..., -- Daily values
constraint PK_Dailies( CatID, UpdDate ),
constraint FK_Dailies_Category foreign key( CatID )
references Categories( ID )
);
This way, adding a new category involves inserting a row into the Categories table rather than creating an entirely new table.
If the database has a Date type distinct from a DateTime -- no time data -- then fine. Otherwise, the time part must be removed such as by Oracle's trunc function. This allows only one entry for each category per day.
Retrieving all the values for all the posted dates is easy:
select C.Name as Category, d.UpdDate, d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID;
This can be made into a view, DailyHistory. To see the complete history for Category Cat1:
select *
from DailyHistory
where Name = 'Cat1';
To see all the category information as it was updated on a specific date:
select *
from DailyHistory
where UpdDate = date '2014-05-06';
Most queries will probably be interested in the current values -- that is, the last update made (assuming some categories are not updated every day). This is a little more complicated but still very fast if you are worried about performance.
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID );
Of course, if every category is updated every day, the query is simplified:
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate = <today's date>;
This can also be made into a view. To see today's (or the latest) updates for Category Cat1:
select *
from DailyCurrent
where Name = 'Cat1';
Suppose now that updates are not necessarily made every day. The history view would show all the updates that were actually made. So the query shown for all categories as they were on a particular day would actually show only those categories that were actually updated on that day. What if you wanted to show the data that was "current" as of a particular date, even if the actual update was several days before?
That can be provided with a small change to the "current" query (just the last line added):
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID
and UpdDate <= date '2014-05-06' );
Now this shows all categories with the data updated on that date if it exists otherwise the latest update made previous to that date.
As you can see, this is a very flexible design which allows access the data just about any way desired.

How to relate 3 tables depending on event

I have a table that have information about different types of events that can be done by persons in two categories civil and worker
so for each one of them I have their respective tables
civil{ civil_id, name, age,telephone...} the primary key is civil_id
worker{ worker_id, name, duty, department...} the primary key is worker_id
then the event table has a list of all possible events
event {type_of_event} the primary key is type_of_event
then I am planing to store information in other table
with eventype, the person that did the job (worker or civil)
id event_type date person
-----------------------------------
1 type1 12-12-12 x
2 type1 05-12-10 y
3 type2 02-12-12 y
Now in this design I do not know how to relate whose person did the job if, I would had only a kind of person (aka civil) i would only store the civil_id in person field in this last table....but how to know if it was civil or worker, do I need other intermediate table?
Generally, there are to ways to model this type of situation...
Using Exclusive Foreign Keys
In the event, both civil_id and worker_id are NULL-able, but there is also a constraint ensuring exactly one of them is non-NULL at any given time:
CHECK (
(civil_id IS NOT NULL AND worker_id IS NULL)
OR (civil_id IS NULL AND worker_id IS NOT NULL)
)
Using Inheritance1
For more on inheritance, take a look at "Subtype Relationships" chapter in the ERwin Methods Guide and at this post.
1 Aka. category, subtyping, subclassing, generalization hierarchy...
In this can, you cannot set up a foreign key because you have multiple parent. In order to do fast search or to avoid full table scan define an index on column person on table event and join the table using LEFT JOIN. eg,
SELECT ....,
COALESCE(b.name, c.name) AS personname
FROM event a
LEFT JOIN civil b
ON a.person = b.civil_id
LEFT JOIN worker c
ON a.person = c.worker_ID
Adding INDEX
ALTER TABLE event ADD INDEX (person)