Can anyone supply some tips/pointers/links on how to implement a temporal state-table with NHibernate? I.e., each entity table has start_date and end_date columns that describe the time interval in which this this row is considered valid.
When a user inserts a new entity, the start_date receives 'now' and the end_date will be null (or a date far into the future, I haven't decided yet).
When updating, I'd like to change the UPDATE query into the following:
UPDATE end_date for this entity's row, and
INSERT a new row with the current date/time and a null end_date.
I tried using event listeners to manually write an UPDATE query for 1, but can't seem to figure out how to implement a new INSERT query for 2.
Is this the proper way to go? Or am I completely off-mark here?
Actually we have a working solution were i work but it effectively kills a part of the nhibernate's mechanism.
For 'temporal entities' NH acts only as an insert/select engine. Deletes and updates are done by a different utility where the ORM part of NH comes handy.
If you only have a handful of temporal entities you may only use nhibernate but be prepared to write your own code to ensure state relations are valid.
We went that route in our first try and after the number of temporal entities started adding up the mechanism was effectively broken.
Now, inserts don't need no special tooling, just place the values in the appropriate datetime properties and you're set. We implement selects with Filters (definitely check 16.1 of NH ref as it has an example, but the condition must not use a BETWEEN) although if you go that way you will have to modify the NH source code to apply filters on all kinds of selects.
Check my post at http://savale.blogspot.com/2010/01/enabling-filters-on-mapped-entities.html for doing that.
It might also work if you specify the "where" clause on the mapping (instead of filters) but i haven't tried or tested it yet, and it is my understanding that the mapped "where" on the mapping do not support parameters (at least not officially).
As i side note, the reason for using a custom tool for updates/deletes will become clear once you have read Richard Snodgrass's books on temporal databases http://www.cs.arizona.edu/~rts/publications.html
To directly answer your question, both the NULL _end value and a value far in the future will work (but prefer the NOT-NULL solution it will make your queries easier as you will not have to check with ISNULL).
For updates you effectively make a clone of the original entity then set the original entity's _end to now, and then go to the cloned and change the relevant properties, change _start to now, _end to the far-in-the-future value
I suggest the excellent timeNarrative from Martin Fowler.
I think the best approach is to have something like a Java map (sorry, I'm a Java programmer) and let NHibernate to map that. The map would map something like a Period, with a "start" and "end" fields, to a value. You can the write a UserType to map the Period to two different database columns
While this article is a few years old, the pattern still seems valid. I have not yet used this but I need a similar solution in a current project and will be modeling it the way it is described here.
Related
Scenario:
Entity 1 can have 0 or more Entity 2.
What trying to do:
When a field in Entity 1 is updated, a field in Entity 2 is consecutively updated.
What I'm doing:
Update field in Entity 1 by update sql, then querying related Entity 2 records (using SELECT ATTR FROM ENTITY2 WHERE ENTITY1.ID = ENTITY2.ENT1_ID) just to get the old value of ENTITY2 attr before doing an update on that records. Type of update (e.g. Subtract or add) on ENTITY2 record is based on the update value on ENTITY1.
Alternative :
Using triggers to consecutively update these related records.
(I still plan to study to implement triggers but I am not sure if it is worth it.
Any help from this also please? Or links?)
Is it better to use triggers? Or just stick to my current solution (which I think is quite slow due to the number of sql executions but easier to track down).
There are people, such as Tom Kyte who believe triggers should be used as little as possible, if at all.
There are others, such as Toon Koppelaars who believe they should be used, if their use is considered carefully.
I am of the second camp and believe triggers may be used. However, this use should not be to 'automagically' cause cascade actions such as you are suggesting. Instead these triggers may be used to enforce integrity constraints that cannot be declared using the standard mechanism of a table constraint clause i.e. the triggers themselves do no DML other than SELECT from tables.
(note: there are other mechanisms by which these constraints may be enforced, including materialized views or the introduction of additional columns and the use of specific indexing strategies) Therefore, I would suggest another alternative. Create triggers - or use these alternative mechanisms - to ensure no data that breaks your integrity constraints can be committed. Then create APIs, using PL/SQL, that encapsulate the multi-table data amendments that are required to ensure the integrity constraints are not broken and use these as your update path.
In this way you can be assured that no invalid data exists in the database, but also that the actual DML required to achieve this is not hidden across the database in multiple program units and triggers but is stated explicitly in one place.
Tom Kyte is brilliant. But he is, at heart, still just a DBA. Always keep that in mind when considering his advice on table design.
Can triggers be overused? Of course. But here's the rub: anything can be overused. I lean toward triggers because there is just no way to guarantee that all data manipulation will go through your app or any single channel. Or, if possible, define a foreign key relationship and let "cascade update" take care of everything. Tricky, I admit, and could be problematic, but don't reject any solution out of hand.
Having said that, I don't know if a trigger for this purpose is called for. I don't know why you are duplicating the data to a field in a different table. Without knowing your overall design and what you are trying to accomplish, there is no way to judge. But consider keeping data in one field in one table then use a view to expose that field as part of a second "table." Change the data where it resides, viola, it is now changed wherever it appears.
Performance hit? Yes. But keeping duplicate data in different places and keeping them synchronized is a data integrity hit. Only you know (or are in a position to find out) which way this balance tilts.
Oh, can views be overused? Of course. But there's always that rub I mentioned; and besides, views are so chronically underused in most databases that overuse would be a long way away.
I have a field that should contain the place of a seminar. A seminar can be held:
In-house
In another city
In another country
At first, I had only one table, so I have used an enum. But now I have three tables which can't be merged and they all should have this information and customer wants this field to be customizable to add or remove options in the future. But the number of options will be limited they say, probably 5 or so.
Now, my question is, should I use an enum or a table for this field? More importantly, what would be the proper way to decide between an enum or a table?
PS: enum fields are dynamically retrieved from the database, they are not embedded in the code.
As a general rule of thumb, in an application, use an enum if:
Your values change infrequently, and
There are only a few values.
Use a lookup table if:
Values changes frequently, or
Your customer expects to be able to add, change, or delete values in realtime, or
There are a lot of values.
Use both if the prior criteria for an enum are met and:
You need to be able to use a tool external to your application to report on the data, or
You believe you may need to eliminate the enum in favor of a pure lookup table approach later.
You'll need to pick what you feel is a reasonable cut of for the number of values for your cutoff point, I often hear somewhere between 10 and 15 suggested.
If you are using an enum construct provided by your database engine, the first 2 groups of rules still apply.
In your specific case I'd go with the lookup table.
If the elements are fixed in that case enum should be used if in case of variable data then enum should not be used. In those cases table is more preferable.
I think in your case table should be ok.
I agree with the top answers about enums being lean, slighty faster and reduce complexity of your database when used reasonably (that is without a lot of and fastchanging values).
BUT: There are some enormous caveats to consider:
The ENUM data type isn't standard SQL, and beyond MySQL not many other DBMS's have native support for it. PostgreSQL, MariaDB, and Drizzle (the latter two are forks of MySQL anyway), are the only three that I know of. Should you or someone else want to move your database to another system, someone is going to have to add more steps to the migration procedure to deal with all of your clever ENUMs.
(Source and even more points on http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/)
ENUMS aren't supported by most ORMs such as Doctrine exactly because they're not SQL Standard. So if you ever consider using an ORM in your project or even use something as the Doctrine Migrations Bundle
you'll probably end up writing a complex extension bundle (which I've tried) or using an existing like this one which in Postgresql for example cannot even add more tha a pseudosupport for enums: by treating enums as string type with a check constraint. An example:
CREATE TABLE test (id SERIAL NOT NULL, pseudo_enum_type VARCHAR(255) CHECK(pseudo_enum_type IN ('value1', 'value2', 'value3')) , ...
So the gain of using Enums in a bit more complex setups really is below zero.
Seriously: If I don't absolutely have to (and I don't) I'd always avoid enums in a database.
I am learning SQL and DB design for a college class. One assignment that was given to us is to create a table with a derived attribute which is the SUM of some child attributes. For example:
ORDERS
orderID {PK}
/orderTotal /* derived from SUM of child itemTotals */
ITEMS
itemNo {PK}
orderID {FK}
itemTotal
Now, I am not even sure this is good practice. From some reading I've done on the web, derived values should not be stored, but rather calculated by user applications. I can understand that perspective, but in this instance my assignment is to store derived values and more importantly to maintain their integrity via triggers, which are relatively new to me so I am enjoying using them. I'd also imagine in some more complex cases that it really would be worth the saved processing time to store derived values. Here are the safeguards I've put in place which are NOT giving me problems:
A trigger which updates parent /orderTotal when new child item is inserted.
A trigger which updates parent /orderTotal when child item is deleted.
A trigger which updates parent /orderTotal when child itemTotal is modified.
However, there is another safeguard I want which I cannot figure out how to accomplish. Since the parent attribute /orderTotal is derived, it should never be manually modified. If somebody does attempt to manually modify it (to an erroneous value which is not actually the correct SUM), I want to either (a) prevent them from doing this or (b) revert it to its old value as soon as they are done.
Which is the better approach, and which is possible (and how)? I am not sure how to accomplish the former, and I tried to accomplish the latter via either a trigger or a constraint, but neither one seemed appropriate. The trigger method kept giving me ORA-04091 error for attempting to mutate the table which fired the trigger. The constraint method, I do not think is appropriate either since I'm not sure how to do such a specific thing inside a constraint check.
I am using Oracle SQL by the way, in SQL Developer.
Thanks!
"Now, I am not even sure thise is good practice."
Your intuition is right: this is bad practice. For some reason, a lot of college professors set their students the task of writing poor code; this wouldn't be so bad if they at least explained that it is bad practice and should never be used in the real world. But then I guess most professors have only a limited grasp on what matters in the real world. sigh.
Anyhoo, to answer your question. There are two approaches to this. One would be to use a trigger to "correct" i.e. swallow the change. This would be wrong because the user trying to modify the value would probably waste a lot of time trying to discover why their change wasn't sticking, without realising they were breaking a business rule. So, it's much better to hurl an exception.
This example uses Oracle syntax, because I'm guessing that's what you're using.
create or replace trigger order_header_trg
before insert or update
on order_header for each row
begin
if :new.order_total != :old.order_total
then
raise_application_error
( -20000, 'You are not allowed to modify the value of ORDER_TOTAL');
end if;
end;
The only problem with this approach is that it will prevent you inserting rows into ORDER_LINES and then deriving a new total for ORDER_HEADER.
This is one reason why denormalised totals are Bad Practice.
The error you're getting - ORA-04091 - says "mutating table". This happens when we attempt to write a trigger which selects from the table which owns the trigger. It almost always points to a poor data model, one which is insufficiently normalised. This is obviously the case here.
Given that you are stuck with the data model, the only workaround is a clunky implementation using multiple triggers and a package. The internet offers various slightly different solutions: here is one.
One solution might be to move all the logic for maintaining the orderTotal into an INSTEAD OF UPDATE trigger on the ORDERS table.
The triggers you already have in ITEMS can be simplified to update ORDERS without making a calculation - setting orderTotal to 0 or something like that. TheINSTEAD OF` trigger will run in place of the update statement.
If an attempt is made to manually update the order total, the INSTEAD OF trigger will fire and re-calculate the existing value.
PS - I don't have an Oracle DB to test on, but from what I can tell, an INSTEAD OF trigger will get around the ORA-04091 error - aplologies if this is wrong
EDIT
Whilst this solution would work in some other RDBMS systems, Oracle only supports INSTEAD OF triggers on views.
EDIT 2
If the assignment allows it, a view could be a suitable solution to this problem, since this would enable the order value column to be calculated.
What is the best, DBMS-independent way of generating an ID number that will be used immediately in an INSERT statement, keeping the IDs roughly in sequence?
DBMS independent? That's a problem. The two most common methods are auto incrementing columns, and sequences, and most DBMSes do one or the other but not both. So the database independent way is to have another table with one column with one value that you lock, select, update, and unlock.
Usually I say "to hell with DBMS independence" and do it with sequences in PostgreSQL, or autoincrement columns in MySQL. For my purposes, supporting both is better than trying to find out one way that works everywhere.
If you can create a Globally Unique Identifier (GUID) in your chosen programming language - consider that as your id.
They are harder to work with when troubleshooting (it is much easier to type in a where condition that is an INT) but there are also some advantages. By assigning the GUID as your key locally, you can easily build parent-child record relationships without first having to save the parent to the database and retrieve the id. And since the GUID, by definition, is unique, you don't have to worry about incrementing your key on the server.
There is auto increment or sequence
What is the point of this, that is the least of your worries?
How will you handle SQL itself?
MySQL has Limit,
SQL Server has Top,
Oracle has Rank
Then there are a million other things like triggers, alter table syntax etc etc
Yep, the obvious ways in raw SQL (and in my order of preference) are a) sequences b) auto-increment fields. The better, more modern, more DBMS-independent way is to not touch SQL at all, but to use a (good) ORM.
There's no universal way to do this. If there were, everyone would use it. SQL by definition abhors the idea - it's an antipattern for set-based logic (although a useful one, in many real-world cases).
The biggest problem you'd have trying to interpose an identity value from elsewhere is when a SQL statement involves several records, and several values must be generated simultaneously.
If you need it, then make it part of your selection requirements for a database to use with your application. Any serious DBMS product will provide its own mechanism to use, and it's easy enough to code around the differences in DML. The variations are pretty much all in the DDL.
I'd always go for the DB specific solution, but if you really have to the usual way of doing this is to implement your own sequence. Your RDBMS has to support transactions.
You create a sequence table which contains an int column and seed this with the first number, your transaction logic then looks something like this
begin transaction
update tblSeq set intID = intID + 1
select #myID = intID from tblSeq
inset into tblData (intID, ...) values (#myID, ...)
end transaction
The transaction forces a write lock such that the then next queued insert cannot update the tblSeq value before the record has been inserted into tblData. So long as all inserts go though this transaction then your generated ID is in sequence.
Use an auto-incrementing id column.
Is there really a reason that they have to be in sequence? If you're just using it as an ID, then you should just be able to use part of a UUID or the first couple digits of md5(now()).
You could take the time and massage it. It'd be the equivalent of something like
DateTime.Now.Ticks
So it be something like YYYYMMDDHHMMSSSS
It may be of a bit lateral approach, but a good ORM-type library will probably be able to at least hide the differences. For example, in Ruby there is ActiveRecord (commonly used in but not exclusively tied to the Ruby the Rails web framework) which has Migrations. Within a table definition, which is declared in platform-agnostic code, implementation details such as datatypes, sequential id generation and index creation are pushed down below your vision.
I have transparently developed a schema on SQLite, then implemented it on MS SQL Server and later ported to Oracle. Without ever changing the code that generates my schema definition.
As I say, it may not be what you're looking for, but the easiest way to encapsulate what varies is to use a library that has already done the encapsulation for you.
With only SQL, following could be one to the approaches:
Create a table to contain the starting id for your needs
When the application is deployed for the first time, the application should read the value in its context.
Thereafter, increment id (in thread-safe fashion) as required
3.1 Write the id to the database (in thread-safe fashion) which always keeps updated value
3.2 Don't write it to the database, just keep incrementing in the memory (thread-safe manner)
If for any reason server is going down, write the current id value to the database
When the server is up again it will pick from where it left, the last time.
I recently started working at a company with an enormous "enterprisey" application. At my last job, I designed the database, but here we have a whole Database Architecture department that I'm not part of.
One of the stranger things in their database is that they have a bunch of views which, instead of having the user provide the date ranges they want to see, join with a (global temporary) table "TMP_PARM_RANG" with a start and end date. Every time the main app starts processing a request, the first thing it does it "DELETE FROM TMP_PARM_RANG;" then an insert into it.
This seems like a bizarre way of doing things, and not very safe, but everybody else here seems ok with it. Is this normal, or is my uneasiness valid?
Update I should mention that they use transactions and per-client locks, so it is guarded against most concurrency problems. Also, there are literally dozens if not hundreds of views that all depend on TMP_PARM_RANG.
Do I understand this correctly?
There is a view like this:
SELECT * FROM some_table, tmp_parm_rang
WHERE some_table.date_column BETWEEN tmp_parm_rang.start_date AND tmp_parm_rang.end_date;
Then in some frontend a user inputs a date range, and the application does the following:
Deletes all existing rows from
TMP_PARM_RANG
Inserts a new row into
TMP_PARM_RANG with the user's values
Selects all rows from the view
I wonder if the changes to TMP_PARM_RANG are committed or rolled back, and if so when? Is it a temporary table or a normal table? Basically, depending on the answers to these questions, the process may not be safe for multiple users to execute in parallel. One hopes that if this were the case they would have already discovered that and addressed it, but who knows?
Even if it is done in a thread-safe way, making changes to the database for simple query operations doesn't make a lot of sense. These DELETEs and INSERTs are generating redo/undo (or whatever the equivalent is in a non-Oracle database) which is completely unnecessary.
A simple and more normal way of accomplishing the same goal would be to execute this query, binding the user's inputs to the query parameters:
SELECT * FROM some_table WHERE some_table.date_column BETWEEN ? AND ?;
If the database is oracle, it's possibly a global temporary table; every session sees its own version of the table and inserts/deletes won't affect other users.
There must be some business reason for this table. I've seen views with dates hardcoded that were actually a partioned view and they were using dates as the partioning field. I've also seen joining on a table like when dealing with daylights saving times imagine a view that returned all activity which occured during DST. And none of these things would ever delete and insert into the table...that's just odd
So either there is a deeper reason for this that needs to be dug out, or it's just something that at the time seemed like a good idea but why it was done that way has been lost as tribal knowledge.
Personally, I'm guessing that it would be a pretty strange occurance. And from what you are saying two methods calling the process at the same time could be very interesting.
Typically date ranges are done as filters on a view, and not driven by outside values stored in other tables.
The only justification I could see for this is if there was a multi-step process, that was only executed once at a time and the dates are needed for multiple operations, across multiple stored procedures.
I suppose it would let them support multiple ranges. For example, they can return all dates between 1/1/2008 and 1/1/2009 AND 1/1/2006 and 1/1/2007 to compare 2006 data to 2008 data. You couldn't do that with a single pair of bound parameters. Also, I don't know how Oracle does it's query plan caching for views, but perhaps it has something to do with that? With the date columns being checked as part of the view the server could cache a plan that always assumes the dates will be checked.
Just throwing out some guesses here :)
Also, you wrote:
I should mention that they use
transactions and per-client locks, so
it is guarded against most concurrency
problems.
While that may guard against data consistency problems due to concurrency, it hurts when it comes to performance problems due to concurrency.
Do they also add one -in the application- to generate the next unique value for the primary key?
It seems that the concept of shared state eludes these folks, or the reason for the shared state eludes us.
That sounds like a pretty weird algorithm to me. I wonder how it handles concurrency - is it wrapped in a transaction?
Sounds to me like someone just wasn't sure how to write their WHERE clause.
The views are probably used as temp tables. In SQL Server we can use a table variable or a temp table (# / ##) for this purpose. Although creating views are not recommended by experts, I have created lots of them for my SSRS projects because the tables I am working on do not reference one another (NO FK's, seriously!). I have to workaround deficiencies in the database design; that's why I am using views a lot.
With the global temporary table GTT approach that you comment is being used here, the method is certainly safe with regard to a multiuser system, so no problem there. If this is Oracle then I'd want to check that the system either is using an appropriate level of dynamic sampling so that the GTT is joined appropriately, or that a call to DBMS_STATS is made to supply statistics on the GTT.