Optaplanner, by default, will values in PlanningVariable range be repeated - optaplanner

As usual my PlanningSolution class has a list of PlanningEntity, the PlanningEntity class has a Integer PlanningVariable member variable. Now I have 2 elements in the PlanningEntity list, and the PlanningVariable's range can be either 1 or 2. For sure (1,2) and (2,1) are potential solutions, my question is, by default, will (1,1) and (2,2) be potential solutions as well? if they are not, how can I make (1,1) and (2,2) be potential solutions?

Yes, OptaPlanner will use values from a value range repeatedly.
Notice that the requirement to use a value just once is a constraint that makes sense in some use cases but not in all of them. People that need this behavior can express such requirement as a hard constraint using Constraint Streams.
OptaPlanner works with no constraints out of the box so that it can explore the search space freely.
In your case you don't have to take any extra steps to allow repeating values. All of the following are solutions that OptaPlanner will eventually find:
(1, 1)
(1, 2)
(2, 1)
(2, 2)

Related

Two questions in SQL Server

How do I define in SQL Server in datatype field with fixed data (like SELECT in html) of my choice?
How to set column, which does not allow a number greater than 50, in Management Studio??
Thanks!
Even though this post contains two questions (Which is usually a problem in Q & A websites), I believe that the answer to both question can be the same (otherwise I wouldn't post this answer to begin with) - but that's assuming I understand the first question - which is basically how to define a column that can only contain a limited set of options (like a dropdown or an enum).
Based on that assumption, and also assuming we're not talking about a very large list of values, both questions can be answered by using a check constraint.
In SQL Server, a check constraint validates data that is attempted to be entered into a table (either by insert or by update), and throws an exception if the check returns false.
So, Limiting the set of values a column can take in is a good candidate for a check constraint - unless we're talking about a very large list of values, which makes it cumbersome to write and maintain and is usually better covered using a foreign key.
So, assuming I want a table that have two columns - one a string that can only have the values yes, no, black and white, and the other an int that can have values between 0 and 50 (inclusive) I would write it like this:
CREATE TABLE DemoTable
(
Enum varchar(5),
CONSTRAINT CHK_DemoTable_Enum CHECK (Enum IN('Yes', 'No', 'Black', 'White')),
Number tinyint,
CONSTRAINT CHK_DemoTable_Number CHECK (Number >= 0 AND Number <= 50)
);
You can see a live demo on rextster.

Optaplanner: how efficiently enforce one-to-many constraint?

I prepare example with easyScoreCalculatorClass and incrementalScoreCalculatorClass (Java score counting) to solve problem with power consumers and power suppliers (phones and chargers, where voltage must be equal and consumer required ampere must be not greater then supplier provided ampere, and each supplier has cost).
Optaplanner make solution for sample with 2 consumers and one supplier which is not desired as I expected that one supplier can't operate on two consumers in one time and solution must end with at least -1 value fro hard score.
To enforce one-to-namy constraint I see possibility by maintaining map with count how many times suppliers assigned to consumers. Is that one possible solution to enforce constraint in Java code?
Is that possible to organize data or mark fields with annotations so this constraint automatically enforced?
I look to optaplanner: how to enforce planning variables values to be used only once but don't understand how to map Drool code to Java...
Several of the examples have uniqueness constraint that you describe. For example, in the n-queens example all rows must be different. That example has a java score impl too.

What reason to use sequences in databases?

I new to DB/Hibernate and found code:
#SequenceGenerator(name = "entSeq", allocationSize = 5, sequenceName = "CODE_SEQ")
...
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "entSeq")
which set sequence for primary key.
Why was sequences used for values of primary key? Which goals was addressed:
increase performance
add constraints, some checks
limit possible value range of integer values of ID, why to do so??
why to start counting from 1?
I read about syntax and usage in:
http://msdn.microsoft.com/en-us/library/ff878091.aspx
http://www.techonthenet.com/oracle/sequences.php
but doesn't found answer for my question.
UPDATE:
I enjoyed reading:
http://www.oracle.com/technetwork/products/rdb/0307-sequences-130053.pdf
Guide to Using SQL: Sequence Number Generator
where shown that there is problem in DB theory how to get unique ID for primary keys. That mean that I can make insert into table without providing value for primary key from my own:
INSERT INTO suppliers
(supplier_id, supplier_name)
VALUES
(supplier_seq.nextval, 'Kraft Foods');
But I expect that this feature must be present in all DB without forcing me to supply primary key values...
Do I think right?
UPDATE2:
Answer for why use START WITH:
This clause can be useful when adding sequences to existing databases. When an older scheme was in
use by the application and has already consumed some values from the legal range this clause can be
used to skip those consumed values. MINVALUE and MAXVALUE are used to specify the legal range but
START WITH would initiate the sequence usage within that range so that previously generated values
would not reappear.
UPDATE3: *sequences* provide http://en.wikipedia.org/wiki/Surrogate_key
Historically, there were two main reasons.
Avoid performance problems with ON UPDATE CASCADE in big tables.
Avoid performance problems with joins on wide, natural keys.
Oracle doesn't even support ON UPDATE CASCADE, so updating a value that's used in foreign key references is more troublesome than on other platforms.
Those performance "problems" are much less severe nowadays than they were 20 years ago, given tables of the same size. (Hardware's a lot faster now.) But we seem to deal with much bigger tables now than we did 20 years ago.
There are some undesirable side-effects for this kind of performance tuning.
You typically need many more joins than you might with carefully chosen natural keys and ON UPDATE CASCADE. You might need so many that the joins are more costly than the disk read.
It's easier to get lost in the joins when you have 20 or 30 of them.
Rows are harder to quickly understand. (A row that reads {1, 7, 13, 255, 438} is harder to understand than a row that reads {1, library, checkout, 255, 'A book is your friend'}.)
Often, a database designer assigns an ID number as the primary key, but doesn't set any other UNIQUE constraints. That makes the ID number a row identifier, not an identifier for the real-world thing the row represents. That can be a big problem.
You cannot rely on auto generated keys for all the databases. Unlike most other databases, Oracle does not provide an auto-incrementing datatype that can be used to generate a sequential primary key value.
However, the same effect can be achieved with a sequence and a trigger.

Optional Database Entities

ORIGINAL (see UPDATED QUESTION below)
I am designing a new laboratory database that tests a wide variety of tests on a wide variety of sample types.
The following list is my current candidate for the list of main entities to best model the laboratory work.
For each entity, a 1-to-many relationship exists from that entity to the entity below. In other words, every entity (except REQ) has at least columns for entity_id and parent_id.
Main Entities:
REQ: Request (a form)
SAM: Sample (the material)
TST: Test (requested procedures)
SUB: ** Sub-Test (part of standard test)
TRI: ** Trial (single instance: usually for mean,range, and stddev)
MEA: Measurement (a measured number)
** Not all tests have subtests, and not all tests have trials.
Sub-tests are a set of tests grouped together by a single name for easy referencing. For example, a lot acceptance test (LAT) for a particular product is defined as the following tests: viscosity, %-nitrogen, pH, and density.
A trial is a single experiment performed multiple times for product assurance. For example, fifty bullets might be shot, and each shot is a trial. The accuracy of each bullet might be required to fall within a certain range, and the average accuracy of all fifty bullets might be required to be in a tighter range.
Question: How should I model cases when sub-tests and/or trials are not needed?
Option 1: Use a "blank" sub-test (or trial) if not needed.
Option 2: Consider sub-tests and trials to be tests (and have a test_id as a parent), so that measurements always have a test as a parent.
Option 3: Optional parents for measurement (trial, sub-test, or test) and trials (sub-test or test).
Option x: Any other option worth considering.
FYI: If required to answer the question, I will be using Oracle.
UPDATED QUESTION
In general, my schema is a heirarchy of entities where each entity (except top) must have ONE parent and (except bottom) must have at least one child. What is the best way to handle cases where an internal entity is unnecessary in certain situation, or what is the benefit/drawback to using a particular option?
Option 1 (Dummy): Use a "dummy" entry to indicate entity does not apply in this case.
Option 2 (Rollup): Roll-up optional entities into next higher parent entity.
Option 3 (Pick-a-Parent): Entity (C) below optional entity (B) with required entity (A) must have ONE parent but the parent can be either the optional entity (B) or the next higher one (A).
Option x: Any other option worth considering.
Addressing your simplified question:
Given a hierarchy as you've described, if I found that some levels in the hierarchy were optional, I would question whether a hierarchy really mapped well to my domain. I would consider drawing my relations differently, or redefining the entities in my schema.
I don't think a more detailed answer to the general question is possible in a short space like this, since figuring out the best representation of a domain is a) hard, and b) very specific to the particular domain.
Use Outer Joins. (RIGHT OUTER JOIN and LEFT OUTER JOIN).
They were made specifically for this.
< Edit > This is my first post. Based on the comments, I'll be adding a second post.
Here's my take on an architectural first pass. This stuff generally requires a LOT of back-and-forth with the subject matter experts to get right.
"Test" means one of:
- Take an action, measure results
- Take several actions (subtests), measure results for each
- Make no tests whatsoever (yet you can still have measurements -- ?)
I'd configure this as a "parent" Test table and a child "SubTest" table, where Test can have 0 or more related SubTests, and every SubTest must be related with one and only one Test. (If a test has only one SubTest, enter it in its own table, don't try and track SubTests in the Test table.)
Trials can only exist if there are SubTests. Therefore, Trials are a child of the SubTest table; SubTests can have zero or more Trials, and Trials must be related with one and only one SubTest.
Measures only exist if there are Trials. Therefore, repeat the above, with Measures as a child of Trials.
Can there be SubTests without Trials (or Tests)? If so, then don't enter any Trials.
Can there be Measures without Trials? If no, you don't need any Trials (or SubTests). If yes (?), once again enter some properly labeled dummy/placholder SubTests or Trials as necessary.
Again, this is rudimentary, and more interviews with the folks driving requirements is required.
As others have remarked it is hard for us to give a definitive answer without understanding more about your domain. You have attempted to distill a lot of business rules into a couple of paragraphs but some important information has been lost. Specifically, it is not possible to be sure whether two entities are genuinely distinct without knowing their attributes. Having said all which, let's have a go.
A TEST is a single procedure. Despite containing the word "test" a LAT is not a TEST in its own right but is rather a pre-defined set of such procedures. I would model this scenario as an entity TEST with an optional parent entity, which I would prefer to call TEST_GROUP (as that is what it is) but it is best to use the domain name, SUB_TEST.
A TRIAL appears to be distinct from a TEST, so model it as a separate entity. Therefore you have a choice when it comes to MEASUREMENT: you can have one entity with two optional foreign keys or you can have TEST_MEASUREMENT and TRIAL_MEASUREMENT. Choosing which road to go depends on the characteristics and usage profile.
The following is an initial stab at the entity relationships. This would be the point in the project when the user goes, "Oh no, that is not what I meant at all."
create table sample (
sample_id number not null
, constraint samp_pk primary key (sample_id)
)
/
create table sub_test (
sub_test_id number not null
, sample_id number not null
, constraint subt_pk primary key (sub_test_id)
, constraint subt_samp_fk foreign key (sample_id)
references sample (sample_id)
)
/
create table test (
test_id number not null
, sample_id number not null
, sub_test_id number
, constraint tst_pk primary key (test_id)
, constraint tst_samp_fk foreign key (sample_id)
references sample (sample_id)
, constraint tst_subt_fk foreign key (sub_test_id)
references sub_test (sub_test_id)
)
/
create table trial (
trial_id number not null
, test_id number not null
, constraint trl_pk primary key (trial_id)
, constraint trl_tst_fk foreign key (test_id)
references test (test_id)
)
/
create table measurement (
measurement_id number not null
, trial_id number
, test_id number
, constraint meas_pk primary key (measurement_id)
, constraint meas_tst_fk foreign key (test_id)
references test (test_id)
, constraint meas_trl_fk foreign key (trial_id)
references trial (trial_id)
, constraint measurement_ck check (
(test_id is not null and trial_id is null)
or (test_id is null and trial_id is not null)
)
/
Edit
Addressing your more generic question.
Option 1 (Dummy)
Never use a dummy record. It's is like using a magic value instead of a null. The solution is worse than the problem it solves.
Option 2 (Rollup)
This can work when the parent and the child have the same attributes. But it is not a viable solution if they have different columns, or if they are different dependencies. Even if they have identical data structures but different business uses it may still be a problem.
Option 3 (Pick-a-Parent)
This would be my preferred solution. The snag is the need for a check constraint to ensure that one (and only one) of the eligible foreign keys has been populated. You also need to guard against allowing too many parents/grandparents/great-grandparents into the mix.
I am not entirely sure I understand the details of your question, but it sounds like you should have the following:
Table Test
test_id, request, sample, test
Table SubTest
subtest_id, test_id (foreign key to Test)
Table Trial
trial_id, trial_name, measurement, subtest_id
So, Test is a collection of subtests (possibly just one subtest), and a subtest is a collection of Trials (possibly just one trial)
I'm not entirely certain I understand your domain, but could you do something like this?
Tests has a parent_test_id column, which can be NULL (when set, this is a subtest).
Trials has a test_id column. (All tests have at least one trial, since you did a thing and had at least one measurement, right?)
Measurements has a trial_id column.
This does seem to violate your premise, since it stipulates that all tests have at least one trial, so it's possible I misunderstand the requirements. How can you have a test with no trials?
Anyway, if necessary, you could put both a trial_id and a test_id on Measurements, possibly with a constraint that one or the other must be NULL (and the other must be set).
I'll take a second stab at this one, based on the feedback from my first post. The key thing to understand is that design and architecture can be highly iterative, and I doubt you'll get the ideal model without a lot of back-and-forth--something that doesn't play out to well on Stack Overflow. Odds are you'll take the ideas posted (APC has some good ones), bounce them around with the people you work with, and come up with something that'll work.
My goal these days when designing databases is to try and produce a fully normalized model. Once you've got that, if it doesn't seem reasonable or practical you can denormalize for efficiency, expediency, or whatever -- but the key thing is you denormalize after you've found the ideal model. If you stop normalization before you get to fully normalized, you haven't denormalized, you've just got a sloppy model.
Here's the entities I see to-date:
What you've labeled as the top-level test, for purposes of clarity here I'm going to call an Exam. You define an exam and all its contents (below), and people contact your laboratory to run these exams on their problems.
For any given exam performed for a customer, you run a bunch of Tests. Any given test may be used by (required by?) any number of exams.
Often, you get a set of Tests that are done together for more than one Exam. If there are properties that apply to the specific set of Tests, you might want to identify each set as its own entity. Call these TestGroups. However, if these are only used associate a specific set of Tests with one or more Exams, you might not get any particular benefit our of defining them as their own entity. (These are your SubTests.)
So, an Exam "has" or "contains" one or more Tests. Alternatively, Exams are related with one or more TestGroups. However, trying to relate an Exam with zero or more TestGroups and zero or more individual Tests will produce an overly complex model (let alone physical implmentation), and I'd really want to avoid that. Perhaps a TestGroup can contain a single Test, so Exams only reference TestGroups? Maybe an Exam can only be related to one TestGroup -- in which case that'd be the "many to many" table relating Exams with Tests. This depends on further discussion of requirements with the subject matter experts.
So you have Exams -- Exam definitions, really -- related somehow or other with multiple Tests. Next up, you have a "paid instance" of an Exam (customer X comes in and pays you to test his Widgets). Call this a CustomerExam; it has all the contact and billing info, identifies the Exam to be run, and thus is related to the Tests to be performed for the customer. (There's probably a Customer entity out there too...?)
Trials are perfomed for the Tests that are part of a CustomerExam. They don't relate with the Exam or the Test, they are an instance of the Trial being performed. (Seems safe to assume that the "meaning/definition" of a Trial would actually be part of a Test--for example, if Test = Is gun accurate, then the work required by a Trial for that Test = fire gun 50 times and measure). So as Trials are performed for the Tests of a given CustomerExam. Are they performed once, or more than once? (Is a trial to fire the gun 50 times, or is each shot counted as a trial? What if they do two rounds of 50 shots?) Whatever, the attributes of the Trial event are stored here -- when it happened, who did it, special notes/circumstances, whatever.
Measures are produced by (or for?) Trials. The meaning/definition of each measure is actually part of the definition of a Trial (which is part of the definition of a Test); the event of the Trial produces specific values for the defined/anticipated Measures. The assumption is that a Trial will generate zero (?) or more Measures, so Measures are their own entity.
Looking back at this, it seems like there's some form of implicit double stucture: a set of tables to define available Exams, Tests, Trials, and Measures (what can be examined, how can it be tested, what shall we measure) and a companion set of tables to track specific instances of each (who wanted it, who did the work, when did they do it, what were the results)
I've got to have way over-anazled this problem. The key thing here is, as with all design sessions, in posing ideas and asking questions, did they generate your own ideas, questions, or answers?

Should I use an ENUM for primary and foreign keys?

An associate has created a schema that uses an ENUM() column for the primary key on a lookup table. The table turns a product code "FB" into it's name "Foo Bar".
This primary key is then used as a foreign key elsewhere. And at the moment, the FK is also an ENUM().
I think this is not a good idea. This means that to join these two tables, we end up with four lookups. The two tables, plus the two ENUM(). Am I correct?
I'd prefer to have the FKs be CHAR(2) to reduce the lookups. I'd also prefer that the PKs were also CHAR(2) to reduce it completely.
The benefit of the ENUM()s is to get constraints on the values. I wish there was something like: CHAR(2) ALLOW('FB', 'AB', 'CD') that we could use for both the PK and FK columns.
What is: Best PracticeYour preference
This concept is used elsewhere too. What if the ENUM()'s values are longer? ENUM('Ding, dong, dell', 'Baa baa black sheep'). Now the ENUM() is useful from a space point-of-view. Should I only care about this if there are several million rows using the values? In which case, the ENUM() saves storage space.
ENUM should be used to define a possible range of values for a given field. This also implies that you may have multiple rows which have the same value for this perticular field.
I would not recommend using an ENUM for a primary key type of foreign key type.
Using an ENUM for a primary key means that adding a new key would involve modifying the table since the ENUM has to be modified before you can insert a new key.
I am guessing that your associate is trying to limit who can insert a new row and that number of rows is limited. I think that this should be achieved through proper permission settings either at the database level or at the application and not through using an ENUM for the primary key.
IMHO, using an ENUM for the primary key type violates the KISS principle.
but when you only trapped with differently 10 or less rows that wont be a problem
e.g's
CREATE TABLE `grade`(
`grade` ENUM('A','B','C','D','E','F') PRIMARY KEY,
`description` VARCHAR(50) NOT NULL
)
This table it is more than diffecult to get a DML
We've had more discussion about it and here's what we've come up with:
Use CHAR(2) everywhere. For both the PK and FK. Then use mysql's foreign key constraints to disallow creating an FK to a row that doesn't exist in the lookup table.
That way, given the lookup table is L, and two referring tables X and Y, we can join X to Y without any looking up of ENUM()s or table L and can know with certainty that there's a row in L if (when) we need it.
I'm still interested in comments and other thoughts.
Having a lookup table and a enum means you are changing values in two places all the time. Funny... We spent to many years using enums causing issues where we need to recompile to add values. In recent years, we have moved away from enums in many situations an using the values in our lookup tables. The biggest value I like about lookup tables is that you add or change values without needing to compile. Even with millions of rows I would stick to the lookup tables and just be intelligent in your database design