I want to make SQL tables that are updated daily yet retain every single day's contents for later lookup. What is the best practice for this? - sql

Basically I'm trying to create a database schema based around multiple unrelated tables that will not need to reference each other AFAIK.
Each table will be a different "category" that will have the same columns in each table - name, date, two int values and then a small string value.
My issue is that each one will need to be "updated" daily, but I want to keep a record of the items for every single day.
What's the best way to go about doing this? Would it be to make the composite key the combination of the date and the name? Or use something called a "trigger"?
Sorry I'm somewhat new to database design, I can be more specific if I need to be.

Yes, you have to create a trigger for each category table
I'm assuming name is PK for each table? If isnt the case, you will need create a PK.
Lets say you have
table categoryA
name, date, int1, int2, string
table categoryB
name, date, int1, int2, string
You will create another table to store changes log.
table category_history
category_table, name, date, int1, int2, string, changeDate
You create two trigger, one for each category table
Where you save what table gerate the update and what time was made.
create trigger before update for categoryA
INSERT INTO category_history VALUES
('categoryA', OLD.name, OLD.date, OLD.int1, Old.int2, OLD.string, NOW());
This is pseudo code, you need write trigger using your rdbms syntaxis, and check how get system date now().

As has already been pointed out, it is poor design to have different identical tables for each category. Better would be a Categories table with one entry for each category and then a Dailies table with the daily information.
create table Categories(
ID smallint not null auto_generated,
Name varchar( 20 ) not null,
..., -- other information about each category
constraint UQ_Category_Name unique( Name ),
constraint PK_Categories( ID )
);
create table Dailies(
CatID smallint not null,
UpdDate date not null,
..., -- Daily values
constraint PK_Dailies( CatID, UpdDate ),
constraint FK_Dailies_Category foreign key( CatID )
references Categories( ID )
);
This way, adding a new category involves inserting a row into the Categories table rather than creating an entirely new table.
If the database has a Date type distinct from a DateTime -- no time data -- then fine. Otherwise, the time part must be removed such as by Oracle's trunc function. This allows only one entry for each category per day.
Retrieving all the values for all the posted dates is easy:
select C.Name as Category, d.UpdDate, d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID;
This can be made into a view, DailyHistory. To see the complete history for Category Cat1:
select *
from DailyHistory
where Name = 'Cat1';
To see all the category information as it was updated on a specific date:
select *
from DailyHistory
where UpdDate = date '2014-05-06';
Most queries will probably be interested in the current values -- that is, the last update made (assuming some categories are not updated every day). This is a little more complicated but still very fast if you are worried about performance.
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID );
Of course, if every category is updated every day, the query is simplified:
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate = <today's date>;
This can also be made into a view. To see today's (or the latest) updates for Category Cat1:
select *
from DailyCurrent
where Name = 'Cat1';
Suppose now that updates are not necessarily made every day. The history view would show all the updates that were actually made. So the query shown for all categories as they were on a particular day would actually show only those categories that were actually updated on that day. What if you wanted to show the data that was "current" as of a particular date, even if the actual update was several days before?
That can be provided with a small change to the "current" query (just the last line added):
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID
and UpdDate <= date '2014-05-06' );
Now this shows all categories with the data updated on that date if it exists otherwise the latest update made previous to that date.
As you can see, this is a very flexible design which allows access the data just about any way desired.

Related

Create a field in Firebird which displays data from another table

I didn't find a working solution for creating a "lookup column" in a Firebird database.
Here is an example:
Table1: Orders
[OrderID] [CustomerID] [CustomerName]
Table2: Customers
[ID] [Name]
When I run SELECT * FROM ORDERS I want to get OrderID, CustomerID and CustomerName....but CustomerName should automatically be computed by looking for the "CustomerID" in the "ID" column of "Customer" Table, returning the content of the "Name" column.
Firebird has calculated fields (generated always as/computed by), and these allow selecting from other tables (contrary to an earlier version of this answer, which stated that Firebird doesn't support this).
However, I suggest you use a view instead, as I think it performs better (haven't verified this, so I suggest you test this if performance is important).
Use a view
The common way would be to define a base table and an accompanying view that gathers the necessary data at query time. Instead of using the base table, people would query from the view.
create view order_with_customer
as
select orders.id, orders.customer_id, customer.name
from orders
inner join customer on customer.id = orders.customer_id;
Or you could just skip the view and use above join in your own queries.
Alternative: calculated fields
I label this as an alternative and not the main solution, as I think using a view would be the preferable solution.
To use calculated fields, you can use the following syntax (note the double parentheses around the query):
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name generated always as ((select name from customer where id = customer_id))
)
Updates to the customer table will be automatically reflected in the orders table.
As far as I'm aware, the performance of this option is less than when using a join (as used in the view example), but you might want to test that for yourself.
FB3+ with function
With Firebird 3, you can also create calculated fields using a trigger, this makes the expression itself shorter.
To do this, create a function that selects from the customer table:
create function lookup_customer_name(customer_id integer)
returns varchar(50)
as
begin
return (select name from customer where id = :customer_id);
end
And then create the table as:
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name generated always as (lookup_customer_name(customer_id))
);
Updates to the customer table will be automatically reflected in the orders table. This solution can be relatively slow when selecting a lot of records, as the function will be executed for each row individually, which is a lot less efficient than performing a join.
Alternative: use a trigger
However if you want to update the table at insert (or update) time with information from another table, you could use a trigger.
I'll be using Firebird 3 for my answer, but it should translate - with some minor differences - to earlier versions as well.
So assuming a table customer:
create table customer (
id integer generated by default as identity primary key,
name varchar(50) not null
);
with sample data:
insert into customer(name) values ('name1');
insert into customer(name) values ('name2');
And a table orders:
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name varchar(50) not null
)
You then define a trigger:
create trigger orders_bi_bu
active before insert or update
on orders
as
begin
new.customer_name = (select name from customer where id = new.customer_id);
end
Now when we use:
insert into orders(customer_id) values (1);
the result is:
id customer_id customer_name
1 1 name1
Update:
update orders set customer_id = 2 where id = 1;
Result:
id customer_id customer_name
1 2 name2
The downside of a trigger is that updating the name in the customer table will not automatically be reflected in the orders table. You would need to keep track of these dependencies yourself, and create an after update trigger on customer that updates the dependent records, which can lead to update/lock conflicts.
No need here a complex lookup field.
No need to add a persistant Field [CustomerName] on Table1.
As Gordon said, a simple Join is enough :
Select T1.OrderID, T2.ID, T2.Name
From Customers T2
Join Orders T1 On T1.IDOrder = T2.ID
That said, if you want to use lookup Fields (as we do it on a Dataset) with SQL you can use some thing like :
Select T1.OrderID, T2.ID,
( Select T3.YourLookupField From T3 where (T3.ID = T2.ID) )
From Customers T2 Join Orders T1 On T1.IDOrder = T2.ID
Regards.

Inserting multiple records in database table using PK from another table

I have DB2 table "organization" which holds organizations data including the following columns
organization_id (PK), name, description
Some organizations are deleted so lot of "organization_id" (i.e. rows) doesn't exist anymore so it is not continuous like 1,2,3,4,5... but more like 1, 2, 5, 7, 11,12,21....
Then there is another table "title" with some other data, and there is organization_id from organization table in it as FK.
Now there is some data which I have to insert for all organizations, some title it is going to be shown for all of them in web app.
In total there is approximately 3000 records to be added.
If I would do it one by one it would look like this:
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
VALUES
(
'This is new title',
XXXX,
CURRENT TIMESTAMP,
1,
1,
1
);
where XXXX represent "organization_id" which I should get from table "organization" so that insert do it only for existing organization_id.
So only "organization_id" is changing matching to "organization_id" from table "organization".
What would be best way to do it?
I checked several similar qustions but none of them seems to be equal to this?
SQL Server 2008 Insert with WHILE LOOP
While loop answer interates over continuous IDs, other answer also assumes that ID is autoincremented.
Same here:
How to use a SQL for loop to insert rows into database?
Not sure about this one (as question itself is not quite clear)
Inserting a multiple records in a table with while loop
Any advice on this? How should I do it?
If you seriously want a row for every organization record in Title with the exact same data something like this should work:
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
SELECT
'This is new title' as name,
o.organization_id,
CURRENT TIMESTAMP as datetime_added,
1 as added_by,
1 as special_fl,
1 as title_type_id
FROM
organizations o
;
you shouldn't need the column aliases in the select but I am including for readability and good measure.
https://www.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafymultrow.htm
and for good measure in case you process errors out or whatever... you can also do something like this to only insert a record in title if that organization_id and title does not exist.
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
SELECT
'This is new title' as name,
o.organization_id,
CURRENT TIMESTAMP as datetime_added,
1 as added_by,
1 as special_fl,
1 as title_type_id
FROM
organizations o
LEFT JOIN Title t
ON o.organization_id = t.organization_id
AND t.name = 'This is new title'
WHERE
t.organization_id IS NULL
;

Underlying rows in Group By

I have a table with a certain number of columns and a primary key column (suppose OriginalKey). I perform a GROUP BY on a certain sub-set of those columns and store them in a temporary table with primary key (suppose GroupKey). At a later stage, I may need to get more details about one or more of those groupings (which can be found in the temporary table) i.e. I need to know which were the rows from the original table that formed that group. Simply put, I need to know the mappings between GroupKey and OriginalKey. What's the best way to do this? Thanks in advance.
Example:
Table Student(
StudentID INT PRIMARY KEY,
Level INT, --Grade/Class/Level depending on which country you are from)
HomeTown TEXT,
Gender CHAR)
INSERT INTO TempTable SELECT HomeTown, Gender, COUNT(*) AS NumStudents FROM Student GROUP BY HomeTown, Gender
On a later date, I would like to find out details about all towns that have more than 50 male students and know details of every one of them.
How about joining the 2 tables using the GroupKey, which, you say, are the same?
Or how about doing:
select * from OriginalTable where
GroupKey in (select GroupKey from my_temp_table)
You'd need to store the fields you grouped on in your temporary table, so you can join back to the original table. e.g. if you grouped on fieldA, fieldB, and fieldC, you'd need something like:
select original.id
from original
inner join temptable on
temptable.fieldA = original.fieldA and
temptable.fieldB = original.fieldB and
temptable.fieldC = original.fieldC

SQL Query for range of dates

Lets frame the question again---
table1{date, bID, sName, fID}
{11/05,B1,A1,P1}
{12/05,B2,A2,P2}
{13/05,B1,A3,P1}
{15/05,B3,A4,P1}
{16/05,B1,A5,P2}
{19/05,B1,A6,P2}
This is the table and the data stored in the table is also specified...
Now the query that i want is that:-
Depending of fId (lets say, P1 is selected) it should display the data from table say from 11/05-17/05 (no date should be missed). The data retrieved is as follows:-
11/05,B1,A1
12/05,--,--
13/05,B1,A3
14/05,--,--
15/05,B3,A4
16/05,--,--
17/05,--,--
The data retrieved for a particular fID(say P1) is displayed.. Explaning the result...
1) it displayed all data from 11/05-17/05 where fId is P1, if there is no date in the database, then also it should display null value for that date (i.e.14/05 date was not there in database, but still it displayed with a null value)..
2) if fId for that particular date is not P1, then also it store a null value in result set..
Atlast the data is retrieved in result set,, and processed further..
So i want to write the query for this problemm,, is it possible..
No code here, just my thoughts.
You need to create a temporary table with dates ranging from your begin date to an end date, inclusive. And then left join table1 with that temporary table on date column plus add where fID = ?.
As the other answer here mentions, a table with all the dates in it, and a LEFT JOIN is what you need.
Say you have this table:
CREATE TABLE table1
{
date DATETIME
bID VARCHAR(10),
sName VARCHAR(10),
fID VARCHAR(10)
}
and then this date-table:
CREATE TABLE dates
(
dt DATETIME
)
and in this table you need to have all the dates for the range you want to display. Usually you populate it with a couple of years in both directions, but that's up to you.
Note: For simplicity, I did not bother with primary keys in either table. You should of course make sure you have a primary key, and in the case of the dates table, it could be the dt column.
Then to display the results you want:
SELECT
dt,
bID,
sName
FROM
dates
LEFT JOIN table1 ON dt = date AND fld = 'P1'
ORDER BY
dt
Note that the selection of only P1 rows is done in the JOIN criteria. If you add a WHERE clause to do the same, you'll loose all dates that have no data.

What's the best way to store (and access) historical 1:M relationships in a relational database?

Hypothetical example:
I have Cars and Owners. Each Car belongs to one (and only one) Owner at a given time, but ownership may be transferred. Owners may, at any time, own zero or more cars. What I want is to store the historical relationships in a MySQL database such that, given an arbitrary time, I can look up the current assignment of Cars to Owners.
I.e. At time X (where X can be now or anytime in the past):
Who owns car Y?
Which cars (if any) does owner Z own?
Creating an M:N table in SQL (with a timestamp) is simple enough, but I'd like to avoid a correlated sub-query as this table will get large (and, hence, performance will suffer). Any ideas? I have a feeling that there's a way to do this by JOINing such a table with itself, but I'm not terribly experienced with databases.
UPDATE: I would like to avoid using both a "start_date" and "end_date" field per row as this would necessitate a (potentially) expensive look-up each time a new row is inserted. (Also, it's redundant).
Make a third table called CarOwners with a field for carid, ownerid and start_date and end_date.
When a car is bought fill in the first three and check the table to make sure no one else is listed as the owner. If there is then update the record with that data as the end_date.
To find current owner:
select carid, ownerid from CarOwner where end_date is null
To find owner at a point in time:
select carid, ownerid from CarOwner where start_date < getdate()
and end_date > getdate()
getdate() is MS SQL Server specific, but every database has some function that returns the current date - just substitute.
Of course if you also want additional info from the other tables, you would join to them as well.
select co.carid, co.ownerid, o.owner_name, c.make, c.Model, c.year
from CarOwner co
JOIN Car c on co.carid = c.carid
JOIN Owner o on o.ownerid = co.ownerid
where co.end_date is null
I've found that the best way to handle this sort of requirement is to just maintain a log of VehicleEvents, one of which would be ChangeOwner. In practice, you can derive the answers to all the questions posed here - at least as accurately as you are collecting the events.
Each record would have a timestamp indicating when the event occurred.
One benefit of doing it this way is that the minimum amount of data can be added in each event, but the information about the Vehicle can accumulate and evolve.
Also, with the timestamp, events can be added after the fact (as long as the timestamp accurately reflects when the event occurred.
Trying to maintain historical state for something like this in any other way I've tried leads to madness. (Maybe I'm still recovering. :D)
BTW, the distinguishing characteristic here is probably that it's a Time Series or Event Log, not that it's 1:m.
Given your business rule that each car belongs to at least one owner (ie. owners exist before they are assigned to a a car) and your operational constraint that the table may grow large, I'd design the schema as follows:
(generic sql 92 syntax:)
CREATE TABLE Cars
(
CarID integer not null default autoincrement,
OwnerID integer not null,
CarDescription varchar(100) not null,
CreatedOn timestamp not null default current timestamp,
Primary key (CarID),
FOREIGN KEY (OwnerID ) REFERENCES Owners(OwnerID )
)
CREATE TABLE Owners
(
OwnerID integer not null default autoincrement,
OwnerName varchar(100) not null,
Primary key(OwnerID )
)
CREATE TABLE HistoricalCarOwners
(
CarID integer not null,
OwnerID integer not null,
OwnedFrom timestamp null,
Owneduntil timestamp null,
primary key (cardid, ownerid),
FOREIGN KEY (OwnerID ) REFERENCES Owners(OwnerID ),
FOREIGN KEY (CarID ) REFERENCES Cars(CarID )
)
I personally would not touch the third table from my client application but would simply let the database do the work - and maintain data integrity - with ON UPDATE AND ON DELETE triggers on the Cars table to populate the HistoricalCarOwners table whenever a car changes owners (i.e whenever an UPDATE is committed on the OwnerId column) or a car is deleted.
With the above schema, selecting the current car owner is trivial and selecting historical car owners is a simple as
select ownerid, ownername from owners o inner join historicalcarowners hco
on hco.ownerid = o.ownerid
where hco.carid = :arg_id and
:arg_timestamp between ownedfrom and owneduntil
order by ...
HTH, Vince
If you really do not want to have a start and end date you can use just a single date and do a query like the following.
SELECT * FROM CarOwner co
WHERE co.CarId = #CarId
AND co.TransferDate <= #AsOfDate
AND NOT EXISTS (SELECT * FROM CarOwner co2
WHERE co2.CarId = #CarId
AND co2.TransferDate <= #AsOfDate
AND co2.TransferDate > co.Transferdate)
or a slight variation
SELECT * FROM Car ca
JOIN CarOwner co ON ca.Id = co.CarId
AND co.TransferDate = (SELECT MAX(TransferDate)
FROM CarOwner WHERE CarId = #CarId
AND TransferDate < #AsOfDate)
WHERE co.CarId = #CarId
These solution are functionally equivalent to Javier's suggestion but depending on the database you are using one solution may be faster than the other.
However, depending on your read versus write ratio you may find the performance better if you redundantly update the end date in the associative entity.
Why not have a transaction table? Which would contain the car ID, the FROM owner, the TO owner and the date the transaction occcured.
Then all you do is find the first transaction for a car before the desired date.
To find cars owned by Owner 253 on March 1st:
SELECT * FROM transactions WHERE ownerToId = 253 AND date > '2009-03-01'
cars table can have an id called ownerID, YOu can then simply
1.select car from cars inner join owners on car.ownerid=owner.ownerid where ownerid=y
2.select car from cars where owner=z
Not the exact syntax but simple pseudo code.