How to design this table schema (and how to query it) - sql

I have a table that stores a list of members - for the sake of simplicity, I will use a simple real-world case that models my use case.
Let's use the analogy of a sports club or gym.
The membership of the gym changes every three months (for example) - with some old members leaving, some new members joining and some members staying unchanged.
I want to run a query on the table - spanning a multi-time period and return the average weight of all of the members in the club.
These are the tables I have come up with so far:
-- A table containing all members the gym has ever had
-- current members have their leave_date field left at NULL
-- departed members have their leave_date field set to the days they left the gym
CREATE TABLE IF NOT EXISTS member (
id PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
join_date DATE NOT NULL,
-- set to NULL if user has not left yet
leave_date DATE DEFAULT NULL
);
-- A table of members weights.
-- This table is populated DAILY,after the weights of CURRENT members
-- has been recorded
CREATE TABLE IF NOT EXISTS current_member_weight (
id PRIMARY KEY NOT NULL,
calendar_date DATE NOT NULL,
member_id INTEGER REFERENCES member(id) NOT NULL,
weight REAL NOT NULL
);
-- I want to write a query that returns the AVERAGE daily weight of
-- CURRENT members of the gym. The query should take a starting_date
-- and an ending_date between which to calculate the daily
-- averages. The aver
-- PSEUDO SQL BELOW!
SELECT calendar_date, AVG(weight)
FROM member, current_member_weight
WHERE calendar_date BETWEEN(starting_date, ending_date);
I have two questions:
can the schema above be improved - if yes, please illustrate
How can I write an SQL* query to return the average weights calculated for all members in the gym during a specified period (t1, t2), where (t1,t2) spans a period that members have joined/left the gym?
[[Note about SQL]]
Preferably, any SQL shown would be database anagnostic, however if a particular flavour of SQL is to be used, I'd prefer PostgreSQL, since that this is the database I'm using.

Below SQL would work as long as the data in the gym_member table is consistent with the joining and leaving date of each member (i.e. for any member, the gym_member table should not have rows with calendar_date less his joining date or with calendar_date greater than his leaving date)
SELECT
gm.calendar_date,
AVG(gm.weight) avg_weight
FROM
member m,
gym_member gm
WHERE
m.id = gm.member_id
AND
gm.calendar_date >= '1-Jan-2017'
AND
gm.calendar_date <= '31-Dec-2017'
GROUP BY
gm.calendar_date

Related

due date estimating

my "insurance_pay_dtl"(insurance table) consist 'ins_paid_dt'(insurance paid date) column,
i need to select all members whoever not paid insurance amount before due date,
due date is 1 year(365 days)
how do i do..?
You need to link insurance_pay_dtl table with insurance_farmer_hdr with its primary key, for example:
Select member_id, member_name from insurance_farmer_hdr ifd, insurance_pay_dtl ipd
where ifd.insurance_rec_id = ipd.insruance_rec_id and trunc(sysdate) > ifd.due_date
change the columns in above query as per your table columns and try.

Displaying entire row of max() value from nested table

My table CUSTOMER_TABLE has a nested table of references toward ACCOUNT_TABLE. Each account in ACCOUNT_TABLE has a reference toward a branch: branch_ref.
CREATE TYPE account AS object(
accid integer,
acctype varchar2(15),
balance number,
rate number,
overdraft_limit integer,
branch_ref ref branch,
opendate date
) final;
CREATE TYPE customer as object(
custid integer,
infos ref type_person,
accounts accounts_list
);
create type branch under elementary_infos(
bid integer
) final;
All tables are inherited from these object types.
I want to select the account with the highest balance per branch. I arrived to do that with this query:
select MAX(value(a).balance), value(a).branch_ref.bid
from customer_table c, table(c.accounts) a
group by value(a).branch_ref.bid
order by value(a).branch_ref.bid;
Which returns:
MAX(VALUE(A).BALANCE) VALUE(A).BRANCH_REF.BID
--------------------------------------- ---------------------------------------
176318.88 0
192678.14 1
190488.19 2
196433.93 3
182909.84 4
However, how to select as well others attribues from the max accounts displayed ? I would like to display the name of the owner plus the customer's id. The id is directly an attribute of customer. But the name is stored with a reference toward person_table. So I have to select as well c.id & deref(c.infos).names.surname.
How to select these other attributes with my MAX() query ?
Thank you
I generally use analytic functions to achieve that kind of functionality. With analytic functions, you can add aggregate columns to your query without losing the original rows. In your particular case it would be something like:
select
-- select interesting fields
accid,
acctype,
balance,
rate,
overdraft_limit,
branch_ref,
opendate,
max_balance
from (
select
-- propagate original fields to outer query
value(a).accid accid,
value(a).acctype acctype,
value(a).balance balance,
value(a).rate rate,
value(a).overdraft_limit overdraft_limit,
value(a).branch_ref branch_ref,
value(a).opendate opendate,
-- add max(balance) of our branch_ref to the row
max(value(a).balance) over (partition by value(a).branch_ref.bid) max_balance
from customer_table c, table(c.accounts) a
) data
where
-- we are only interested in rows with balance equal to the max
-- (NOTE: there might be more than one, you should fine tune the filtering!)
data.balance = data.max_balance
-- order by branch
order by data.branch_ref.bid;
I don't have any Oracle instance available right now to test this, but that would be the idea, unless there is some kind of incompatibility between analytic functions and collection columns, you should be able to have that working with little effort.

I want to make SQL tables that are updated daily yet retain every single day's contents for later lookup. What is the best practice for this?

Basically I'm trying to create a database schema based around multiple unrelated tables that will not need to reference each other AFAIK.
Each table will be a different "category" that will have the same columns in each table - name, date, two int values and then a small string value.
My issue is that each one will need to be "updated" daily, but I want to keep a record of the items for every single day.
What's the best way to go about doing this? Would it be to make the composite key the combination of the date and the name? Or use something called a "trigger"?
Sorry I'm somewhat new to database design, I can be more specific if I need to be.
Yes, you have to create a trigger for each category table
I'm assuming name is PK for each table? If isnt the case, you will need create a PK.
Lets say you have
table categoryA
name, date, int1, int2, string
table categoryB
name, date, int1, int2, string
You will create another table to store changes log.
table category_history
category_table, name, date, int1, int2, string, changeDate
You create two trigger, one for each category table
Where you save what table gerate the update and what time was made.
create trigger before update for categoryA
INSERT INTO category_history VALUES
('categoryA', OLD.name, OLD.date, OLD.int1, Old.int2, OLD.string, NOW());
This is pseudo code, you need write trigger using your rdbms syntaxis, and check how get system date now().
As has already been pointed out, it is poor design to have different identical tables for each category. Better would be a Categories table with one entry for each category and then a Dailies table with the daily information.
create table Categories(
ID smallint not null auto_generated,
Name varchar( 20 ) not null,
..., -- other information about each category
constraint UQ_Category_Name unique( Name ),
constraint PK_Categories( ID )
);
create table Dailies(
CatID smallint not null,
UpdDate date not null,
..., -- Daily values
constraint PK_Dailies( CatID, UpdDate ),
constraint FK_Dailies_Category foreign key( CatID )
references Categories( ID )
);
This way, adding a new category involves inserting a row into the Categories table rather than creating an entirely new table.
If the database has a Date type distinct from a DateTime -- no time data -- then fine. Otherwise, the time part must be removed such as by Oracle's trunc function. This allows only one entry for each category per day.
Retrieving all the values for all the posted dates is easy:
select C.Name as Category, d.UpdDate, d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID;
This can be made into a view, DailyHistory. To see the complete history for Category Cat1:
select *
from DailyHistory
where Name = 'Cat1';
To see all the category information as it was updated on a specific date:
select *
from DailyHistory
where UpdDate = date '2014-05-06';
Most queries will probably be interested in the current values -- that is, the last update made (assuming some categories are not updated every day). This is a little more complicated but still very fast if you are worried about performance.
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID );
Of course, if every category is updated every day, the query is simplified:
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate = <today's date>;
This can also be made into a view. To see today's (or the latest) updates for Category Cat1:
select *
from DailyCurrent
where Name = 'Cat1';
Suppose now that updates are not necessarily made every day. The history view would show all the updates that were actually made. So the query shown for all categories as they were on a particular day would actually show only those categories that were actually updated on that day. What if you wanted to show the data that was "current" as of a particular date, even if the actual update was several days before?
That can be provided with a small change to the "current" query (just the last line added):
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID
and UpdDate <= date '2014-05-06' );
Now this shows all categories with the data updated on that date if it exists otherwise the latest update made previous to that date.
As you can see, this is a very flexible design which allows access the data just about any way desired.

SQL getting count in a date range

I'm looking for input on getting a COUNT of records that were 'active' in a certain date range.
CREATE TABLE member {
id int identity,
name varchar,
active bit
}
The scenario is one where "members" number fluctuate over time. So I could have linear growth where I have 10 members at the beginning of the month and 20 at the end. Currently We go off the number of CURRENTLY ACTIVE (as marked by an 'active' flag in the DB) AT THE TIME OF REPORT. - this is hardly accurate and worse, 6 months from now, my "members" figure may be substantially different than now. and Since I'm doing averages per user, if I run a report now, and 6 months from now - the figures will probably be different.
I don't think a simple "dateActive" and "dateInactive" will do the trick... due to members coming and going and coming back etc. so:
JOE may be active 12-1 and deactivated 12-8 and activated 12-20
so JOE counts as being a 'member' for 8 days and then 11 days for a total of 19 days
but the revolving door status of members means keeping a separate table (presumably) of UserId, status, date
CREATE TABLE memberstatus {
member_id int,
status bit, -- 0 for in-active, 1 for active
date date
} (adding this table would make the 'active' field in members obsolete).
In order to get a "good" Average members per month (or date range) - it seems I'd need to get a daily average, and do an average of averages over 'x' days. OR is there some way in SQL to do this already.
This extra "status" table would allow an accurate count going back in time. So in a case where you have a revenue or cost figure, that DOESN'T change or is not aggregate, it's fixed, that when you want cost/members for last June, you certainly don't want to use your current members count, you want last Junes.
Is this how it's done? I know it's one way, but it the 'better' way...
#gordon - I got ya, but I guess I was looking at records like this:
Members
1 Joe
2 Tom
3 Sue
MemberStatus
1 1 '12-01-2014'
1 0 '12-08-2014'
1 1 '12-20-2014'
In this way I only need the last record for a user to get their current status, but I can track back and "know" their status on any give day.
IF I'm understanding your method it might look like this
CREATE TABLE memberstatus {
member_id int,
active_date,
inactive_date
}
so on the 1-7th the record would look like this
1 '12-01-2014' null
and on the 8th it would change to
1 '12-01-2014' '12-08-2014'
the on the 20th
1 '12-01-2014' '12-08-2014'
1 '12-20-2014' null
Although I can get the same data out, it seems more difficult without any benefit - am i missing something?
You could also use a 2 table method to have a one-to-many relationship for working periods. For example you have a User table
User
UserID int, UserName varchar
and an Activity table that holds ranges
Activity
ActivityID int, UserID int, startDate date, (duration int or endDate date)
Then whenever you wanted information you could do something like (for example)...
SELECT User.UserName, count(*) from Activity
LEFT OUTER JOIN User ON User.UserID = Activity.UserID
WHERE startDate >= '2014-01-01' AND startDate < '2015-01-01'
GROUP BY User.UserID, User.UserName
...to get a count grouped by user (and labeled by username) of the times they were became active in 2014
I have used two main ways to accomplish what you want. First would be something like this:
CREATE TABLE [MemberStatus](
[MemberID] [int] NOT NULL,
[ActiveBeginDate] [date] NOT NULL,
[ActiveEndDate] [date] NULL,
CONSTRAINT [PK_MemberStatus] PRIMARY KEY CLUSTERED
(
[MemberID] ASC,
[ActiveBeginDate] ASC
)
Every time a member becomes active, you add an entry, and when they become inactive you update their ActiveEndDate to the current date.
This is easy to maintain, but can be hard to query. Another option is to do basically what you are suggesting. You can create a scheduled job to run at the end of each day to add entries to the table .
I recommend setting up your tables so that you store more data, but in exchange the structure supports much simpler queries to achieve the reporting you require.
-- whenever a user's status changes, we update this table with the new "active"
-- bit, and we set "activeLastModified" to today.
CREATE TABLE member {
id int identity,
name varchar,
active bit,
activeLastModified date
}
-- whenever a user's status changes, we insert a new record here
-- with "startDate" set to the current "activeLastModified" field in member,
-- and "endDate" set to today (date of status change).
CREATE TABLE memberStatusHistory {
member_id int,
status bit, -- 0 for in-active, 1 for active
startDate date,
endDate date,
days int
}
As for the report you're trying to create (average # of actives in a given month), I think you need yet another table. Pure SQL can't calculate that based on these table definitions. Pulling that data from these tables is possible, but it requires programming.
If you ran something like this once-per-day and stored it in a table, then it would be easy to calculate weekly, monthly and yearly averages:
INSERT INTO myStatsTable (date, activeSum, inactiveSum)
SELECT
GETDATE(), -- based on DBMS, eg., "current_date" for Postgres
active.count,
inactive.count
FROM
(SELECT COUNT(id) FROM member WHERE active = true) active
CROSS JOIN
(SELECT COUNT(id) FROM member WHERE active = true) inactive

What's the best way to store (and access) historical 1:M relationships in a relational database?

Hypothetical example:
I have Cars and Owners. Each Car belongs to one (and only one) Owner at a given time, but ownership may be transferred. Owners may, at any time, own zero or more cars. What I want is to store the historical relationships in a MySQL database such that, given an arbitrary time, I can look up the current assignment of Cars to Owners.
I.e. At time X (where X can be now or anytime in the past):
Who owns car Y?
Which cars (if any) does owner Z own?
Creating an M:N table in SQL (with a timestamp) is simple enough, but I'd like to avoid a correlated sub-query as this table will get large (and, hence, performance will suffer). Any ideas? I have a feeling that there's a way to do this by JOINing such a table with itself, but I'm not terribly experienced with databases.
UPDATE: I would like to avoid using both a "start_date" and "end_date" field per row as this would necessitate a (potentially) expensive look-up each time a new row is inserted. (Also, it's redundant).
Make a third table called CarOwners with a field for carid, ownerid and start_date and end_date.
When a car is bought fill in the first three and check the table to make sure no one else is listed as the owner. If there is then update the record with that data as the end_date.
To find current owner:
select carid, ownerid from CarOwner where end_date is null
To find owner at a point in time:
select carid, ownerid from CarOwner where start_date < getdate()
and end_date > getdate()
getdate() is MS SQL Server specific, but every database has some function that returns the current date - just substitute.
Of course if you also want additional info from the other tables, you would join to them as well.
select co.carid, co.ownerid, o.owner_name, c.make, c.Model, c.year
from CarOwner co
JOIN Car c on co.carid = c.carid
JOIN Owner o on o.ownerid = co.ownerid
where co.end_date is null
I've found that the best way to handle this sort of requirement is to just maintain a log of VehicleEvents, one of which would be ChangeOwner. In practice, you can derive the answers to all the questions posed here - at least as accurately as you are collecting the events.
Each record would have a timestamp indicating when the event occurred.
One benefit of doing it this way is that the minimum amount of data can be added in each event, but the information about the Vehicle can accumulate and evolve.
Also, with the timestamp, events can be added after the fact (as long as the timestamp accurately reflects when the event occurred.
Trying to maintain historical state for something like this in any other way I've tried leads to madness. (Maybe I'm still recovering. :D)
BTW, the distinguishing characteristic here is probably that it's a Time Series or Event Log, not that it's 1:m.
Given your business rule that each car belongs to at least one owner (ie. owners exist before they are assigned to a a car) and your operational constraint that the table may grow large, I'd design the schema as follows:
(generic sql 92 syntax:)
CREATE TABLE Cars
(
CarID integer not null default autoincrement,
OwnerID integer not null,
CarDescription varchar(100) not null,
CreatedOn timestamp not null default current timestamp,
Primary key (CarID),
FOREIGN KEY (OwnerID ) REFERENCES Owners(OwnerID )
)
CREATE TABLE Owners
(
OwnerID integer not null default autoincrement,
OwnerName varchar(100) not null,
Primary key(OwnerID )
)
CREATE TABLE HistoricalCarOwners
(
CarID integer not null,
OwnerID integer not null,
OwnedFrom timestamp null,
Owneduntil timestamp null,
primary key (cardid, ownerid),
FOREIGN KEY (OwnerID ) REFERENCES Owners(OwnerID ),
FOREIGN KEY (CarID ) REFERENCES Cars(CarID )
)
I personally would not touch the third table from my client application but would simply let the database do the work - and maintain data integrity - with ON UPDATE AND ON DELETE triggers on the Cars table to populate the HistoricalCarOwners table whenever a car changes owners (i.e whenever an UPDATE is committed on the OwnerId column) or a car is deleted.
With the above schema, selecting the current car owner is trivial and selecting historical car owners is a simple as
select ownerid, ownername from owners o inner join historicalcarowners hco
on hco.ownerid = o.ownerid
where hco.carid = :arg_id and
:arg_timestamp between ownedfrom and owneduntil
order by ...
HTH, Vince
If you really do not want to have a start and end date you can use just a single date and do a query like the following.
SELECT * FROM CarOwner co
WHERE co.CarId = #CarId
AND co.TransferDate <= #AsOfDate
AND NOT EXISTS (SELECT * FROM CarOwner co2
WHERE co2.CarId = #CarId
AND co2.TransferDate <= #AsOfDate
AND co2.TransferDate > co.Transferdate)
or a slight variation
SELECT * FROM Car ca
JOIN CarOwner co ON ca.Id = co.CarId
AND co.TransferDate = (SELECT MAX(TransferDate)
FROM CarOwner WHERE CarId = #CarId
AND TransferDate < #AsOfDate)
WHERE co.CarId = #CarId
These solution are functionally equivalent to Javier's suggestion but depending on the database you are using one solution may be faster than the other.
However, depending on your read versus write ratio you may find the performance better if you redundantly update the end date in the associative entity.
Why not have a transaction table? Which would contain the car ID, the FROM owner, the TO owner and the date the transaction occcured.
Then all you do is find the first transaction for a car before the desired date.
To find cars owned by Owner 253 on March 1st:
SELECT * FROM transactions WHERE ownerToId = 253 AND date > '2009-03-01'
cars table can have an id called ownerID, YOu can then simply
1.select car from cars inner join owners on car.ownerid=owner.ownerid where ownerid=y
2.select car from cars where owner=z
Not the exact syntax but simple pseudo code.