VRP with score dependens on more than one job in optaplanner - optaplanner

We are trying to solve a VRP with Optaplanner.
The score calculation runs via constraint streams.
Now I have two vehicles (A and B) and want to schedule two jobs (J1 and J2).
The construction heuristic (FIRST_FIT_DECREASING) schedules J1 to A and J2 to B, what is correct so far.
Now the two jobs also have an attribute "customer", and I want to assign a penalty if the customer of the two jobs is the same but the vehicles are different.
For this purpose, I have created a constraint in the ConstraintProvider that filters all jobs via groupBy that have the same customer but different vehicles.
If I now switch on the FULL_ASSERT_MODE, an IllegalStateException occurs after scheduling J2, because the score that is calculated incrementally is different from the score for the complete calculation.
I suspect this is because the VariableListener, which recalculates the times of the jobs, only tells the ScoreDirector about a change to Job J2 for my shadowvariables and therefore only changes the score part that is related to it.
How can I tell Optaplanner that the score for J1 must also be recalculated? I can't get to job J1 via the VariableListener to tell the ScoreDirector that the score has to be changed here.
Or does this problem require a different approach?

This is a problem that is a bit hard to explain fully. TLDR version: constraint streams only react to changes to objects which are coming from either from(), join() or ifExists(). Changes on objects not coming through these statements will not be caught, and therefore causing score corruptions. Longer explanation follows.
Consider a hypothetical Constraint Stream like this:
constraintFactory.from(Shift.class)
.join(Shift.class)
.filter((shift1, shift2) -> shift1.getEmployee() == shift2.getEmployee())
...
This constraint stream will work just fine, because if you change Shift by setting a different employee, the Shifts will be re-evaluated. They enter the stream via from() and join(), which is how CS knows to re-evaluate Shifts when they change.
Now consider this constraint stream instead:
constraintFactory.from(Shift.class)
.filter(shift -> shift.getEmployee().getName() == "Lukas")
...
This constraint stream will be re-evaluated, if Shift changes. But when the name of Employee changes, the constraint stream will not be re-evaluated; Employee is neither in from() nor in join(), changes to Employee will not trigger re-evaluation of the constraint stream.
In your particular situation, you need to ensure several things:
Variable listeners mark everything as changed that actually changes.
If you modify problem facts, you need to make sure your variable listeners handle that too.
Objects that you want your constraint stream to react to are coming in through from() or a join().

Related

insertLogical in Optaplanner VRP with tank volumes using Constraint Streams

I'm trying to convert a simple VRP project using drl to constraint streams, and I'm not sure how to replicate the functionality of "insertLogical".
The "demand" for each delivery is determined by the number of days from the previous delivery. (The demand is for liquid in tanks being consumed at a known rate.) In drl I'd pair the deliveries and insertLogical, and also insertLogical for the firstDelivery based on the current state.
Without insertLogical, I'm joining deliveries and using a groupBy for paired deliveries, but I can't see how to do an "Outer Join" to include the first delivery.
I also tried creating a continuous planning style "pre-schedule" delivery and then omitting those from the "planning value range". Which would mean the pairs would always exist, but I have a kludgy mess for preventing a Customer from using a pre-schedule vehicle.
So, is there a way to "insertLogical" or to do an outer join in constraint streams?
No and no.
You could build a constraint collector; see UniConstraintCollector interface or its bi/tri/... variants. This allows you to implement any custom logic in your groups.
Or you could create a shadow variable that would keep track of the first delivery. (In fact, with the new planning list variable, that may be even easier.)

Optaplanner - Multiple planning entities with same blockId- How to "move all" or "chain" or "shadow" to the same planning variable?

I am trying to assign Timeslots to planningEntities
(containing room, groups, persons handled by constraint streams).
Some of these planning entities has a blockId.
When entity has a blockId the goal is to share timeslot with other entities with the same blockId.
I defined a constraint for this, but I can see that the solver does extremely many unnecessary moves.
public Constraint groupBlockConstraint(ConstraintFactory constraintFactory){
return constraintFactory.forEachUniquePair(Lesson.class,
Joiners.equal(Lesson::getSequenceGroup),
Joiners.filtering((a, b) ->
!Lesson.withoutBlock(a, b)
&& !Lesson.sameTimeslot(a,b)))
.penalize("BlockSequence not in same timeslot", HardSoftScore.ofHard(15));
}
Is there a way to handle this more efficiently?
Constraints do not determine which moves the solver will be trying. Constraints are only used to score solutions which are achieved once moves are already performed.
Therefore if you're seeing moves which in your opinion should not be performed, you need to configure your selectors. Using tabu search could, perhaps, also help here.
That said, without a more detailed question I can not provide a less generic answer.

Optaplanner - continuous planning with changing constraints, that do not set prior assignments invalid

we are using Optaplanner 7.0 beta + Graphhopper for a calculation of shortest paths in a warehouse, where goods have to be collected into boxes by workers (vrptw). Since the business is about collecting online-ordered goods approx. 70% of the items to collect are added to the problem during the day. We use ProblemFactChange to add the incoming order items and already completed order items in the chain are set to immovable (these 'restarts' are performed each full hour). So far everything works.
The question now is about changing restrictions/conditions, that can occur due to unbalanced workload over warehouse-zones. The warehouse is logically divided into areas, to avoid that all workers have to serve all areas (I know your opinion about segmentation of planning problems, but this is, how the work currently is organised). The limited assignment of items to available workers within one zone is currently defined by a hard-constraint.
The new requirement that we are confronted with is, that a worker should be temporary assigned to a different zone, if the workload there is higher compared to his actual zone. Afterwards he can switch back to his original zone. To my understanding an update of the constraint condition would result in hard constraint violations for the previous assigned, locked items, which should be avoided. Are there mechanisms to support temporary changing restrictions or would a SelectionFilter for items help ? (btw: we are using drools).
Hints are welcome, Thank you
Michael
If there are 2 different tenants, each with their own set of employees, tasks, etc and each in their own Solver, then the Borrow Pattern can be used, especially if the employee borrowing involves some human interaction (usually paperwork or a phone call between managers):
Suppose tenant A has an employee called John and tenant B wants to borrow him. Then assign one or more entities from B to John and make them immovable (usually a boolean borrowLocked). Then add the same entities to tenant A. Neither the solver of A nor B will be able to move them (so they won't change), but both of them will take them into account: tenant A won't give John work when he's working for tenant B and tenant B will agree that those entities are assigned (and it won't try to assign John to other entities as it's no in it's value range).

Opinions on sensor / reading / alert database design

I've asked a few questions lately regarding database design, probably too many ;-) However I beleive I'm slowly getting to the heart of the matter with my design and am slowly boiling it down. I'm still wrestling with a couple of decisions regarding how "alerts" are stored in the database.
In this system, an alert is an entity that must be acknowledged, acted upon, etc.
Initially I related readings to alerts like this (very cut down) : -
[Location]
LocationId
[Sensor]
SensorId
LocationId
UpperLimitValue
LowerLimitValue
[SensorReading]
SensorReadingId
Value
Status
Timestamp
[SensorAlert]
SensorAlertId
[SensorAlertReading]
SensorAlertId
SensorReadingId
The last table is associating readings with the alert, because it is the reading that dictate that the sensor is in alert or not.
The problem with this design is that it allows readings from many sensors to be associated with a single alert - whereas each alert is for a single sensor only and should only have readings for that sensor associated with it (should I be bothered that the DB allows this though?).
I thought to simplify things, why even bother with the SensorAlertReading table? Instead I could do this:
[Location]
LocationId
[Sensor]
SensorId
LocationId
[SensorReading]
SensorReadingId
SensorId
Value
Status
Timestamp
[SensorAlert]
SensorAlertId
SensorId
Timestamp
[SensorAlertEnd]
SensorAlertId
Timestamp
Basically I'm not associating readings with the alert now - instead I just know that an alert was active between a start and end time for a particular sensor, and if I want to look up the readings for that alert I can do.
Obviously the downside is I no longer have any constraint stopping me deleting readings that occurred during the alert, but I'm not sure that the constraint is neccessary.
Now looking in from the outside as a developer / DBA, would that make you want to be sick or does it seem reasonable?
Is there perhaps another way of doing this that I may be missing?
Thanks.
EDIT:
Here's another idea - it works in a different way. It stores each sensor state change, going from normal to alert in a table, and then readings are simply associated with a particular state. This seems to solve all the problems - what d'ya think? (the only thing I'm not sure about is calling the table "SensorState", I can't help think there's a better name (maybe SensorReadingGroup?) : -
[Location]
LocationId
[Sensor]
SensorId
LocationId
[SensorState]
SensorStateId
SensorId
Timestamp
Status
IsInAlert
[SensorReading]
SensorReadingId
SensorStateId
Value
Timestamp
There must be an elegant solution to this!
Revised 01 Jan 11 21:50 UTC
Data Model
I think your Data Model should look like this:▶Sensor Data Model◀. (Page 2 relates to your other question re History).
Readers who are unfamiliar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.
Business (Rules Developed in the Commentary)
I did identify some early business Rules, which are now obsolete, so I have deleted them
These can be "read" in the Relations (read adjacent to the Data Model). The Business Rules and all implied Referential and Data Integrity can be implemented in, and thus guaranteed by, RULES, CHECK Constraints, in any ISO SQL database. This is a demonstration of IDEF1X, in the development of both the Relational keys, and the Entities and Relations. Note the Verb Phrases are more than mere flourish.
Apart from three Reference tables, the only static, Identifying entities are Location, NetworkSlave, and User. Sensor is central to the system, so I ahve given it its own heading.
Location
A Location contains one-to-many Sensors
A Location may have one Logger
NetworkSlave
A NetworkSlave collects Readings for one-to-many NetworkSensors
User
An User may maintain zero-to-many Locations
An User may maintain zero-to-many Sensors
An User may maintain zero-to-many NetworkSlaves
An User may perform zero-to-many Downloads
An User may make zero-to-many Acknowledgements, each on one Alert
An User may take zero-to-many Actions, each of one ActionType
Sensor
A SensorType is installed as zero-to-many Sensors
A Logger (houses and) collects Readings for one LoggerSensor
A Sensor is either one NetworkSensor or one LoggerSensor
A NetworkSensor records Readings collected by one NetworkSlave
.
A Logger is periodically Downloaded one-to-many times
A LoggerSensor records Readings collected by one Logger
.
A Reading may be deemed in Alert, of one AlertType
An AlertType may happen on zero-to-many Readings
.
An Alert may be one Acknowledgement, by one User
.
An Acknowledgement may be closed by one Action, of one ActionType, by one User
An ActionType may be taken on zero-to-many Actions
Responses to Comments
Sticking Id columns on everything that moves, interferes with the determination of Identifiers, the natural Relational keys that give your database relational "power". They are Surrogate Keys, which means an additional Key and Index, and it hinders that relational power; which results in more joins than otherwise necessary. Therefore I use them only when the Relational key becomes too cumbersome to migrate to the child tables (and accept the imposed extra join).
Nullable keys are a classic symptom of an Unnormalised database. Nulls in the database is bad news for performance; but Nulls in FKs means each table is doing too many things, has too many meanings, and results is very poor code. Good for people who like to "refactor" their databases; completely unnecessary for a Relational database.
Resolved: An Alert may be Acknowledged; An Acknowledgement may be Actioned.
The columns above the line are the Primary Key (refer Notation document). SensorNo is a sequential number within LocationId; refer Business Rules, it is meaningless outside a Location; the two columns together form the PK. When you are ready to INSERT a Sensor (after you have checked that the attempt is valid, etc), it is derived as follows. This excludes LoggerSensors, which are zero:INSERT Sensor VALUES (
#LocationId,
SensorNo = ( SELECT ISNULL(MAX(SensorNo), 0) + 1
FROM Sensor
WHERE LocationId = #LocationId
)
#SensorCode
)
For accuracy or improved meaning, I have changed NetworkSlave monitors NetworkSensor to NetworkSlave collects Readings from NetworkSensor.
Check Constraints. The NetworkSensor and LoggerSensor are exclusive subtypes of Sensor, and their integrity can be set by CHECK constraints. Alerts, Acknowledgements and Actions are not subtypes, but their integrity is set by the same method, so I will list them together.
Every Relation in the Data Model is implemented as a CONSTRAINT in the child (or subtype) as FOREIGN KEY (child_FK_columns) REFERENCES Parent (PK_columns)
A Discriminator is required to identify which subtype a Sensor is. This is SensorNo = 0 for LoggerSensors; and non-zero for NetworkSensors.
The existence of NetworkSensors and LoggerSensors are constrained by the FK CONSTRAINTS to NetworkSlave and Logger, respectively; as well as to Sensor.
In NetworkSensor, include a CHECK constraint to ensure SensorNo is non-zero
In LoggerSensor, include a CHECK constraint to ensure SensorNo is zero
The existence of Acknowledgements and Actions are constrained by the identified FK CONSTRAINTS (An Acknowledgement cannot exist without an Alert; an Action cannot exist without an Acknowledgement). Conversely, an Alert with no Acknowledgement is in an unacknowledged state; an Alert with and Acknowledgementbut no Action is in an acknowledged but un-actioned state.
.
Alerts. The concept in a design for this kind of (live monitoring and alert) application is many small programs, running independently; all using the database as the single version of the truth. Some programs insert rows (Readings, Alerts); other programs poll the db for existence of such rows (and send SMS messages, etc; or hand-held units pick up Alerts relevant to the unit only). In that sense, the db is a may be described as an message box (one program puts rows in, which another program reads and actions).
The assumption is, Readings for Sensors are being recorded "live" by the NetworkSlave, and every minute or so, a new set of Readings is inserted. A background process executes periodically (every minute or whatever), this is the main "monitor" program, it will have many functions within its loop. One such function will be to monitor Readings and produce Alerts that have occurred since the last iteration (of the program loop).
The following code segment will be executed within the loop, one for each AlertType. It is a classic Projection:
-- Assume #LoopDateTime contains the DateTime of the last iteration
INSERT Alert
SELECT LocationId,
SensorNo,
ReadingDtm,
"L" -- AlertType "Low"
FROM Sensor s,
Reading r
WHERE s.LocationId = r.LocationId
AND s.SensorNo = r.SensorNo
AND r.ReadingDtm > #LoopDtm
AND r.Value < s.LowerLimit
INSERT Alert
SELECT LocationId,
SensorNo,
ReadingDtm,
"H" -- AlertType "High"
FROM Sensor s,
Reading r
WHERE s.LocationId = r.LocationId
AND s.SensorNo = r.SensorNo
AND r.ReadingDtm > #LoopDtm
AND r.Value > s.UpperLimit
So an Alert is definitely a fact, that exists as a row in the database. Subsequently that may be Acknowledged by an User (another row/fact), and Actioned with an ActionType by an User.
Other that this (the creation by Projection act), ie. the general and unvarying case, I would refer to Alert only as a row in Alert; a static object after creation.
Concerns re Changing Users. That is taken care of already, as follows. At the top of my (revised yesterday) Answer, I state that the major Identifying elements are static. I have re-sequenced the Business Rules to improve clarity.
For the reasons you mention, User.Name is not a good PK for User, although it remains an Alternate Key (Unique) and the one that is used for human interaction.
User.Name cannot be duplicated, there cannot be more than one Fred; there can be in terms of FirstName-LastName; two Fred Bloggs, but not in terms of User.Name. Our second Fred needs to choose another User.Name. Note the identified Indices.
UserId is the permanent record, and it is already the PK. Never delete User, it has historical significance. In fact the FK constraints will stop you (never use CASCADE in a real database, that is pure insanity). No need for code or triggers, etc.
Alternately (to delete Users who never did anything, and thus release User.Name for use) allow Delete as long as there are no FK violations (ie. UserId is not referenced in Download, Acknowledgement, Action).
To ensure that only Users who are Current perform Actions, add an IsObsolete boolean in User (DM Updated), and check that column when that table is interrogated for any function (except reports) You can implement a View UserCurrent which returns only those Users.
Same goes for Location and NetworkSlave. If you need to differentiate current vs historical, let me know, I will add IsObsolete to them as well.
I don't know: you may purge the database of ancient Historical data periodically, delete rows that are (eg) over 10 years old. That has to be done from the bottom (tables) first, working up the Relations.
Feel free to ask Questions.
Note the IDEF1 Notation document has been expanded.
Here are my two cents on the problem.
AlertType table holds all possible types of alerts. AlertName may be something like high temperate, low pressure, low water level, etc.
AlertSetup table allows for setup of alert thresholds from a sensor for a specific alert type.
For example, TresholdLevel = 100 and TresholdType = 'HI' should trigger alert for readings over 100.
Reading table holds sensor readings as they are streamed into the server (application).
Alert table holds all alerts. It keeps links to the first reading that triggered the alert and the last one that finished it (FirstReadingId, LastReadingId). IsActive is true if there is an active alert for the (SensorId, AlertTypeId) combination. IsActive can be set to false only by reading going below the alert threshold. IsAcknowledged means that an operator has acknowledged the alert.
The application layer inserts the new reading into the Reading table, captures the ReadingId.
Then application checks the reading against alert setups for each (SensorId, AlertTypeId) combination. At this point a collection of objects {SensorId, AlertTypeId, ReadingId, IsAlert} is created and the IsAlert flag is set for each object.
The Alert table is then checked for active alerts for each object {SensorId, AlertTypeId, ReadingId, IsAlert} from the collection.
If the IsAlert is TRUE and there are no active alerts for the (SensorId, AlertTypeId) combination, a new row is added to the Alert table with the FirstReadingID pointing to the current ReadingId. The IsActive is set to TRUE, the IsAcknowledged to FALSE.
If the IsAlert is TRUE and there is an active alert for the (SensorId, AlertTypeId) combination, that row is updated by setting the LastReadingID pointing to the current ReadingId.
If the IsAlert is FALSE and there is an active alert for the (SensorId, AlertTypeId) combination, that row is updated by setting the IsActive FALSE.
If the IsAlert is FALSE and there are no active alerts for the (SensorId, AlertTypeId) combination, the Alert table is not modified.
The main "triangle" you have to deal with here is Sensor, [Sensor]Reading, and Alert. Presuming you have to track activity as it is occuring (as opposed to a "load it all at once" design), your third solution is similar to something we did recently. A few tweaks and it would look like:
[Location]
LocationId
[Sensor]
SensorId
LocationId
CurrentSensorState -- Denormalized data!
[SensorReading]
SensorReadingId
SensorState
Value
Timestamp
[SensorStateLog]
SensorId
Timestamp
SensorState
Status -- Does what?
IsInAlert
(Primary key is {SensorId, Timestamp})
"SensorState" could be SensorStateId, with an associated lookup table listing (and constraining) all possible states.
The idea is, you Sensor contains one row per sensor and shows it's current state. SensorReading is updated continuously with sensor readings. If and when a given sensors current state changes (i.e. new Reading's state differs from Sensor's current state), you change the current state and add a row to the SensorStateLog showing the change in state. (Optionally, you could update the "prior" entry for that sensor with a "state ended" timestamp, but that's fussy code to write.)
CurrentSensorState in the Sensor table is denormalized data, but if properly maintained (and if you have millions of rows) it will make querying current state vastly more efficient and so may be worth the effort.
The obvious downside of all this is that Alerts are no longer an entity, and they become that much harder to track and identify. If these must be readily and immediately identifiable and usable, your third scheme won't do what you need it to do.

Optional Database Entities

ORIGINAL (see UPDATED QUESTION below)
I am designing a new laboratory database that tests a wide variety of tests on a wide variety of sample types.
The following list is my current candidate for the list of main entities to best model the laboratory work.
For each entity, a 1-to-many relationship exists from that entity to the entity below. In other words, every entity (except REQ) has at least columns for entity_id and parent_id.
Main Entities:
REQ: Request (a form)
SAM: Sample (the material)
TST: Test (requested procedures)
SUB: ** Sub-Test (part of standard test)
TRI: ** Trial (single instance: usually for mean,range, and stddev)
MEA: Measurement (a measured number)
** Not all tests have subtests, and not all tests have trials.
Sub-tests are a set of tests grouped together by a single name for easy referencing. For example, a lot acceptance test (LAT) for a particular product is defined as the following tests: viscosity, %-nitrogen, pH, and density.
A trial is a single experiment performed multiple times for product assurance. For example, fifty bullets might be shot, and each shot is a trial. The accuracy of each bullet might be required to fall within a certain range, and the average accuracy of all fifty bullets might be required to be in a tighter range.
Question: How should I model cases when sub-tests and/or trials are not needed?
Option 1: Use a "blank" sub-test (or trial) if not needed.
Option 2: Consider sub-tests and trials to be tests (and have a test_id as a parent), so that measurements always have a test as a parent.
Option 3: Optional parents for measurement (trial, sub-test, or test) and trials (sub-test or test).
Option x: Any other option worth considering.
FYI: If required to answer the question, I will be using Oracle.
UPDATED QUESTION
In general, my schema is a heirarchy of entities where each entity (except top) must have ONE parent and (except bottom) must have at least one child. What is the best way to handle cases where an internal entity is unnecessary in certain situation, or what is the benefit/drawback to using a particular option?
Option 1 (Dummy): Use a "dummy" entry to indicate entity does not apply in this case.
Option 2 (Rollup): Roll-up optional entities into next higher parent entity.
Option 3 (Pick-a-Parent): Entity (C) below optional entity (B) with required entity (A) must have ONE parent but the parent can be either the optional entity (B) or the next higher one (A).
Option x: Any other option worth considering.
Addressing your simplified question:
Given a hierarchy as you've described, if I found that some levels in the hierarchy were optional, I would question whether a hierarchy really mapped well to my domain. I would consider drawing my relations differently, or redefining the entities in my schema.
I don't think a more detailed answer to the general question is possible in a short space like this, since figuring out the best representation of a domain is a) hard, and b) very specific to the particular domain.
Use Outer Joins. (RIGHT OUTER JOIN and LEFT OUTER JOIN).
They were made specifically for this.
< Edit > This is my first post. Based on the comments, I'll be adding a second post.
Here's my take on an architectural first pass. This stuff generally requires a LOT of back-and-forth with the subject matter experts to get right.
"Test" means one of:
- Take an action, measure results
- Take several actions (subtests), measure results for each
- Make no tests whatsoever (yet you can still have measurements -- ?)
I'd configure this as a "parent" Test table and a child "SubTest" table, where Test can have 0 or more related SubTests, and every SubTest must be related with one and only one Test. (If a test has only one SubTest, enter it in its own table, don't try and track SubTests in the Test table.)
Trials can only exist if there are SubTests. Therefore, Trials are a child of the SubTest table; SubTests can have zero or more Trials, and Trials must be related with one and only one SubTest.
Measures only exist if there are Trials. Therefore, repeat the above, with Measures as a child of Trials.
Can there be SubTests without Trials (or Tests)? If so, then don't enter any Trials.
Can there be Measures without Trials? If no, you don't need any Trials (or SubTests). If yes (?), once again enter some properly labeled dummy/placholder SubTests or Trials as necessary.
Again, this is rudimentary, and more interviews with the folks driving requirements is required.
As others have remarked it is hard for us to give a definitive answer without understanding more about your domain. You have attempted to distill a lot of business rules into a couple of paragraphs but some important information has been lost. Specifically, it is not possible to be sure whether two entities are genuinely distinct without knowing their attributes. Having said all which, let's have a go.
A TEST is a single procedure. Despite containing the word "test" a LAT is not a TEST in its own right but is rather a pre-defined set of such procedures. I would model this scenario as an entity TEST with an optional parent entity, which I would prefer to call TEST_GROUP (as that is what it is) but it is best to use the domain name, SUB_TEST.
A TRIAL appears to be distinct from a TEST, so model it as a separate entity. Therefore you have a choice when it comes to MEASUREMENT: you can have one entity with two optional foreign keys or you can have TEST_MEASUREMENT and TRIAL_MEASUREMENT. Choosing which road to go depends on the characteristics and usage profile.
The following is an initial stab at the entity relationships. This would be the point in the project when the user goes, "Oh no, that is not what I meant at all."
create table sample (
sample_id number not null
, constraint samp_pk primary key (sample_id)
)
/
create table sub_test (
sub_test_id number not null
, sample_id number not null
, constraint subt_pk primary key (sub_test_id)
, constraint subt_samp_fk foreign key (sample_id)
references sample (sample_id)
)
/
create table test (
test_id number not null
, sample_id number not null
, sub_test_id number
, constraint tst_pk primary key (test_id)
, constraint tst_samp_fk foreign key (sample_id)
references sample (sample_id)
, constraint tst_subt_fk foreign key (sub_test_id)
references sub_test (sub_test_id)
)
/
create table trial (
trial_id number not null
, test_id number not null
, constraint trl_pk primary key (trial_id)
, constraint trl_tst_fk foreign key (test_id)
references test (test_id)
)
/
create table measurement (
measurement_id number not null
, trial_id number
, test_id number
, constraint meas_pk primary key (measurement_id)
, constraint meas_tst_fk foreign key (test_id)
references test (test_id)
, constraint meas_trl_fk foreign key (trial_id)
references trial (trial_id)
, constraint measurement_ck check (
(test_id is not null and trial_id is null)
or (test_id is null and trial_id is not null)
)
/
Edit
Addressing your more generic question.
Option 1 (Dummy)
Never use a dummy record. It's is like using a magic value instead of a null. The solution is worse than the problem it solves.
Option 2 (Rollup)
This can work when the parent and the child have the same attributes. But it is not a viable solution if they have different columns, or if they are different dependencies. Even if they have identical data structures but different business uses it may still be a problem.
Option 3 (Pick-a-Parent)
This would be my preferred solution. The snag is the need for a check constraint to ensure that one (and only one) of the eligible foreign keys has been populated. You also need to guard against allowing too many parents/grandparents/great-grandparents into the mix.
I am not entirely sure I understand the details of your question, but it sounds like you should have the following:
Table Test
test_id, request, sample, test
Table SubTest
subtest_id, test_id (foreign key to Test)
Table Trial
trial_id, trial_name, measurement, subtest_id
So, Test is a collection of subtests (possibly just one subtest), and a subtest is a collection of Trials (possibly just one trial)
I'm not entirely certain I understand your domain, but could you do something like this?
Tests has a parent_test_id column, which can be NULL (when set, this is a subtest).
Trials has a test_id column. (All tests have at least one trial, since you did a thing and had at least one measurement, right?)
Measurements has a trial_id column.
This does seem to violate your premise, since it stipulates that all tests have at least one trial, so it's possible I misunderstand the requirements. How can you have a test with no trials?
Anyway, if necessary, you could put both a trial_id and a test_id on Measurements, possibly with a constraint that one or the other must be NULL (and the other must be set).
I'll take a second stab at this one, based on the feedback from my first post. The key thing to understand is that design and architecture can be highly iterative, and I doubt you'll get the ideal model without a lot of back-and-forth--something that doesn't play out to well on Stack Overflow. Odds are you'll take the ideas posted (APC has some good ones), bounce them around with the people you work with, and come up with something that'll work.
My goal these days when designing databases is to try and produce a fully normalized model. Once you've got that, if it doesn't seem reasonable or practical you can denormalize for efficiency, expediency, or whatever -- but the key thing is you denormalize after you've found the ideal model. If you stop normalization before you get to fully normalized, you haven't denormalized, you've just got a sloppy model.
Here's the entities I see to-date:
What you've labeled as the top-level test, for purposes of clarity here I'm going to call an Exam. You define an exam and all its contents (below), and people contact your laboratory to run these exams on their problems.
For any given exam performed for a customer, you run a bunch of Tests. Any given test may be used by (required by?) any number of exams.
Often, you get a set of Tests that are done together for more than one Exam. If there are properties that apply to the specific set of Tests, you might want to identify each set as its own entity. Call these TestGroups. However, if these are only used associate a specific set of Tests with one or more Exams, you might not get any particular benefit our of defining them as their own entity. (These are your SubTests.)
So, an Exam "has" or "contains" one or more Tests. Alternatively, Exams are related with one or more TestGroups. However, trying to relate an Exam with zero or more TestGroups and zero or more individual Tests will produce an overly complex model (let alone physical implmentation), and I'd really want to avoid that. Perhaps a TestGroup can contain a single Test, so Exams only reference TestGroups? Maybe an Exam can only be related to one TestGroup -- in which case that'd be the "many to many" table relating Exams with Tests. This depends on further discussion of requirements with the subject matter experts.
So you have Exams -- Exam definitions, really -- related somehow or other with multiple Tests. Next up, you have a "paid instance" of an Exam (customer X comes in and pays you to test his Widgets). Call this a CustomerExam; it has all the contact and billing info, identifies the Exam to be run, and thus is related to the Tests to be performed for the customer. (There's probably a Customer entity out there too...?)
Trials are perfomed for the Tests that are part of a CustomerExam. They don't relate with the Exam or the Test, they are an instance of the Trial being performed. (Seems safe to assume that the "meaning/definition" of a Trial would actually be part of a Test--for example, if Test = Is gun accurate, then the work required by a Trial for that Test = fire gun 50 times and measure). So as Trials are performed for the Tests of a given CustomerExam. Are they performed once, or more than once? (Is a trial to fire the gun 50 times, or is each shot counted as a trial? What if they do two rounds of 50 shots?) Whatever, the attributes of the Trial event are stored here -- when it happened, who did it, special notes/circumstances, whatever.
Measures are produced by (or for?) Trials. The meaning/definition of each measure is actually part of the definition of a Trial (which is part of the definition of a Test); the event of the Trial produces specific values for the defined/anticipated Measures. The assumption is that a Trial will generate zero (?) or more Measures, so Measures are their own entity.
Looking back at this, it seems like there's some form of implicit double stucture: a set of tables to define available Exams, Tests, Trials, and Measures (what can be examined, how can it be tested, what shall we measure) and a companion set of tables to track specific instances of each (who wanted it, who did the work, when did they do it, what were the results)
I've got to have way over-anazled this problem. The key thing here is, as with all design sessions, in posing ideas and asking questions, did they generate your own ideas, questions, or answers?