Best practice: should I use FK on DB using nHibernate/FluentNhibernate? - sql

So far I always enforce my DB with FK relationship. Things changed yesterday while mapping some classes with FluentNhibernate. My mapping didn't work and I discovered that's the issue was because of the order FN create the query.
Now a question arise: should I keep enforcing data with FK or it's better to avoid it since I focus on domain classes instead of sql queries?
Thanks

To my knowledge, it will be far better to keep your database consistent,
cause you may not be the only one who works on this DB in future,
and maybe someone else have access to the DB and do sth that could corrupt your data consistency
and as a result your application also doesn't behavior in the way you expect because of assummed conditions that no longer hold.

Letting Fluent/NH create your database during development is fine, but when it goes into production you really should check all the foreign keys, index's, etc etc and then only do scripted changes there on after.

Keep your database consistent, maintain referential integrity.
If a tool you are using breaks as a result there is bound to be a workaround. However if you lose referential integrity to use nhibernate - what happens if you decide to use a different ORM? You will have a dodgy database and who's to say that the next ORM in line will like that?
Its like a separation-of-concerns question, each chunk of your application should be designed to be robust enough to survive if another chunk is changed or removed - so don't change good database practice simply to make a product that is layered above it play nicely.

Using a domain-driven approach , or model oriented approach where the DB is merely seen as an 'implementation-detail', does not mean that you should ignore the integrity of your data.
I see no reason why you should drop foreign-key (and other) constraints from your database.
The database is more then just a storage for your data. It's task is also to guard the integrity of it.
It is perfectly possible to combine the 2 worlds (domain driven and relational database) with NHibernate. Make sure that the 2 areas focus on what they're best at. And, the database is best at storing data and making sure that the data remains valid / integer.

Related

Is retrofitting Entity Framework into my app appropriate for my situation?

So, I have an app that uses a SQL Server express db. I have about 80ish tables all with a primary key but no foreign keys. (The reason we have no foreign keys is because of how we do our sql client-to-server replication. It's not true replication but it's a sync system that was in place when we took over the app. We have no guarantee what records are going to make it to the database first when a client syncs to the server so it is possible that a record would make it to the database with a foreign key that points to a nonexistant record).
We use a type-per-model convention. For each of our business objects there is a table in the db. We currently use stored procedures for every database transaction. This means for every new class there is at least 4 new stored procedures (crud). We have abstracted out our data access layer from our business objects. Each business object has a corresponding businessObjectDAO.
My question is, is entity framework feasible for me to move to? With no foreign key relationships I'm going to have to set up every association between tables manually. Is it worth the time to do this?
My biggest hang up right now is trying to figure out how I map my DAOs to the EF partial classes.
Should I be creating one big .edmx or multiple?
A lot of questions I know. This is my first big architectural type decision and I've been given the go ahead to make the change if I think it is beneficial and feasible.
Maybe I should try Linq-to-SQL? NHibernate is out because we're not allowed using open source products in production (stupid, I know).
Thanks
Cody
My personal recommendation is that if something is working, leave it. I am a big big fan of both LINQ-SQL and Entity Framework, and have managed to get my workplace to make use of the Linq-Sql. I realise that if you did bring one of these in place with your project, maintainability would probably be easier, but by the sounds of it the initial work will be more work than is worth it in the end.

Is it good to use check constraints for business rules

Currently we are using check constraints for business rules implementation, but I want to know if we should implement business rules in SQL or in the business logic layer (C#). I have searched on the net and found that check constraints are good to use.
Please let me know if someone knows more detailed information about it. One more thing is that the data can be pumped into my database using a mobile application and also using a web application.
YES it is good!
You should really always check your business rules both in your app code (in the business layer), but if ever possible also in your database.
Why? Imagine someone manages to submit some data to your database without using your app - if you have your checks only in the app, those checks are not being applied.
If you have your checks on the database as well, you can make sure the data in the database conforms to at least those simple checks that can be formulated in SQL CHECK CONSTRAINTS.
Definitely use those! You need to try and keep your data quality as high as possible - adding referential integrity, check constraints and unique constraints and so forth on the database helps you do that.
Do not rely on your app alone!
Yes, check constraints are a valid tool for business rules.
But are you sure you need to use check constraints, or use a supporting table with a foreign key relationship? If you find yourself defining similar check constraints in various places - the answer is yes, this should definitely be a supporting table.
Data integrity is key; there's not much value to a system that will allow a person to store something that is not per business rules if the application is circumvented. It also makes life a lot easier if the logic is in the database for situations where the original app is in C# and the higher-ups decided the market needs a Java/Ruby/Python/etc version.
You should definitely use CHECK constraints where possible, but I also wouldn't over do it. If there is no possibility of getting data into your database without using your applications, you can be safe with minimal CHECK constraints and heavy business validation.
It can be fairly difficult to define strict business rules in SQL. Stick to data validation in the database, and actual business rules in your application.
Also, try to arrange your schema in such a way that makes it difficult to enter bad data with foreign keys and the like.
As more "intelligent" is your database, more secure will be the integrity of the data it contains. So, yes, I think this is good and important to implement it.
This puts in lots of advantages: you can ensure that your data will be secure if there are more than one application modifying the data (ex: C# app + Web app + Mobile app ...) and it allow you to make less work in those "secundary" applications. If the database do all the work, apps are only a frontend for the database.
It will be easier in the future to migrate the applications, but will be more dificult to migrate the database. This is an important decision.
Depends on the constraints
It depends on the constraints
You should also try to avoid (if possible) having the same constraint checked in 2 places - this would imply there is duplicated code in your system, leading to unnecesary complexity.
There are some constraints that can and should be applied in the database, for example foreign key constraints and uniqueness. The database will be able to apply these quickly and efficently.
Other more complex "business" constraints are better applied in the business logic layer. Examples of these might be "customer must have a validated email address before allowing a purchase". These would be complicated and onerous to apply in the database - you'd run the risk of coding your system in SQL which is A Bad Idea.
C#. It's much easier to reuse logic in C# than SQL (in my experience) and generally maintain.

When is referential integrity not appropriate?

I understand the need to have referential integrity for limiting specific values on entry or possibly preventing them from removal upon a request of deletion. However, I am unclear as to a valid use case which would exclude this mechanism from always being used.
I guess this would fall into several sub-questions:
When is referential integrity not appropriate?
Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
Thoughts?
When is referential integrity not appropriate?
Referential intergrity if typically not used on Data Warehouses where the data is a read only copy of a transactional datbase. Another example of when you'd not need RI is when you want to log information which includes row ids; maintaining referential integrity for a read-only log table is a waste of database overhead.
Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
Sometimes you care more about capturing data than data quality. Imagine you are aggregating a large amount of data from disparate systems which each in their own right suffer from data quality issues. Sometimes you are after the greater good of data quality and having everything in one place even with broken keys etc. represents a starting point for moving towards true data quality. It's not ideal, but it does happen as the beenfits could outweigh the tradeoffs.
Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
Everything about systems development is centered around information security, and a key element of that is data integrity. The database structure should lean towards enforcing these things when possible, however you often are not dealing with modern database systems. Sometimes your data source is an old school AS400 with long-antiquated apps. Sometimes you have to build a data and business layer which provide for data integrity.
Just my thoughts.
The only case I have heard of is if you are going to load a vast amount of data into your database; in that case, it may make sense to turn referential integrity off, as long as you know for certain that the data is valid. Once your loading/migration is complete, referential integrity should be turned back on.
There are arguments about putting data validation rules in programming code vs. the database, and I think it depends on the use cases of your software. If a single application is the only path to the database, you could put validation into the program itself and probably be alright. But if several different programs are using the database at the same time (e.g. your application and your friend's application), you'll want business rules in the database so that your data is always valid.
By 'validation rules', I am talking about rules such as 'items in cart > 0'. You may or may not want validation rules. But I think that primary/foreign keys are always important (or you could find later on that you wish you had them). I think they are required if you want to do replication at some point.
When is referential integrity not appropriate?
Sometimes when you are copying lots
of records in bulk, or restoring
data from some sort of backup, it is
convenient to temporarily turn off
the constraints of referential
integrity.
Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
Duplicating data in this way goes
against the concept of
normalization. There are are
advantages and disadvantages to this
approach.
Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
I would consider it a schema design
decision. Think about the best way
to model your problem in relational
terms. Use the database in the way it
was intended.
Referential integrity would always be appropriate if it didn't come at the cost of performance, scalability, and/or other features.
In some applications, referential integrity may be traded for something more important than the quality of the data.
Never, though a few people in the NoSQL, the multi-value, and oo-db realms will feel differently. Don't listen to them, they're wrong.
Yes. For example, if a vehicle is identified uniquely as (lotid,vin) then lotid is a foreign key to the lot table. If you want to find all pictures for a lot you can join the vehicle_pictures table right to the lot table, by using a subset of the vehicle_pictures key (lotid in (lotid,vin)). Or, am I not understanding you?
Schema, interface comes second. If the schema is bad, having a nice interface is not a long term goal.

Upgrade strategies for bad DB schema designs

I've shown up at a new job and discovered database which is in dire need of some help. There are many many things wrong with it, including
No foreign keys...anywhere. They're faked by using ints and managing the relationship in code.
Practically every field can be NULL, which isn't really true
Naming conventions for tables and columns are practically non-existent
Varchars which are storing concatenated strings of relational information
Folks can argue, "It works", which it is. But moving forward, it's a total pain to manage all of this with code and opens us up to bugs IMO. Basically, the DB is being used as a flat file since it's not doing a whole lot of work.
I want to fix this. The issues I see now are:
We have a lot of data (migration, possibly tricky)
All of the DB logic is in code (with migration comes big code changes)
I'm also tempted to do something "radical" like moving to a schema-free DB.
What are some good strategies when faced with an existing DB built upon a poorly designed schema?
Enforce Foreign Keys: If a relationship exists in the domain, then it should have a Foreign Key.
Renaming existing tables/columns is fraught with danger, especially if there are many systems accessing the Database directly. Gotchas include tasks that run only periodically; these are often missed.
Of Interest: Scott Ambler's article: Introduction To Database Refactoring
and Catalog of Database Refactorings
Views are commonly used to transition between changing data models because of the encapsulation. A view looks like a table, but does not exist as a finite object in the database - you can change what column is being returned for a given column alias as desired. This allows you to setup your codebase to use a view, so you can move from the old table structure to the new one without the application needing to be updated. But it means the view has to return the data in the existing format. For example - your current data model has:
SELECT t.column --a list of concatenated strings, assuming comma separated
FROM TABLE t
...so the first version of the view would be the query above, but once you created the new table that uses 3NF, the query for the view would use:
SELECT GROUP_CONCAT(t.column SEPARATOR ',')
FROM NEW_TABLE t
...and the application code would never know that anything changed.
The problem with MySQL is that the view support is limited - you can't use variables within it, nor can they have subqueries.
The reality to the changes you wish to make is effectively rewriting the application from the ground up. Moving logic from the codebase into the data model will drastically change how the application gets the data. Model-View-Controller (MVC) is ideal to implement with changes like these, to minimize the cost of future changes like these.
I'd say leave it alone until you really understand it. Then make sure you don't start with one of the Things You Should Never Do.
Read Scott Ambler's book on Refactoring Databases. It covers a good many techniques for how to go about improving a database - including the transitional measures needed to allow both old and new programs to work with the changing design.
Create a completely new schema and make sure that it is fully normalized and contains any unique, check and not null constraints etc that are required and that appropriate data types are used.
Prepopulate each table that fills the parent role in a foreign key relationship with a single 'Unknown' record.
Create an ETL (Extract Transform Load) process (I can recommend SSIS (SQL Server Integration Services) but there are plenty of others) that you can use to refill the new schema from the existing one on a regular basis. Use the 'Unknown' record as the parent of any orphaned records - there will be plenty ;). You will need to put some thought into how you will consolidate duplicate records - this will probably need to be on a case by case basis.
Use as many iterations as are necessary to refine your new schema (ensure that the ETL Process is maintained and run regularly).
Create views over the new schema that match the existing schema as closely as possible.
Incrementally modify any clients to use the new schema making temporary use of the views where necessary. You should be able to gradually turn off parts of the ETL process and eventually disable it completely.
First see how bad the code is related to the DB if it is all mixed in no DAO layer you shouldn't think about a rewrite but if there is a DAO layer then it would be time to rewrite that layer and DB along with it. If possible make the migration tool based on using the two DAOs.
But my guess is there is no DAO so you need to find what areas of the code you are going to be changing and what parts of the DB that relates to hopefully you can cut it up into smaller parts that can be updated as you maintain. Biggest deal is to get FKs in there and start checking for proper indexes there is a good chance they aren't being done correctly.
I wouldn't worry too much about naming until the rest of the db is under control. As for the NULLs if the program chokes on a value being NULL don't let it be NULL but if the program can handle it I wouldn't worry about it at this point in the future if it is doing a default value move that to the DB but that is way down the line from the sound of things.
Do something about the Varchars sooner rather then later. If anything make that the first pure background fix to the program.
The other thing to do is estimate the effort of each areas change and then add that price to the cost of new development on that section of code. That way you can fix the parts as you add new features.

Linq to Sql and Non-PK, unique-FK relationship issues

I've recently been reading Louis Davidson's book on Sql Server Database Design and found it quite informative. I've picked up on alot of concepts that I didn't previously know alot (or anything) about. Primarily - I picked up on a way to set up database relationships that I hand't tried before.
Basically you use a surrogate key as the tables PK (an auto incremented id field) and then set up one or more Alternate Keys consisting of one or more Unique keys. Theese alternate kays would then be the values used for relationships (or the PK, if that makes more sense for the given relationship).
I remodelled an old database that was suffering from some data inconsitencies due to poor design to implement this, to me, new way of thinking.
On a database level it works great. Tha relationships function the way they're supposed to and the constraints are enforced in a consistent, reliable manner.
HOWEVER
I cannot get it to work properly in either the Entity Framework or in Linq to Sql classes. I read that in V1 of EF, it just flat out won't support this kind of relationship - so I moved to Linq to Sql to see if things would work out better. They seemingly did, as I got all the relationships automatically mapped out when I imported the classes from my database. The problem is that I can't save data to the database because of InvalidCastOperation exceptions as soon as I try to save data.
So I have a couple of questions:
Is this a limitation in Linq To Sql?
If so, is there a way to work around
it? Preferebly without implementing
sprocs for save, update and delete -
type safety is something I would
like to keep.
Is this way of designing database
relationships "correct" and/or a
good practice?
I hope someone can shed some light on this, as I'm getting quite frustrated about it. I can't really find any good material on the subject online - so hopefully someone here has an answer or can point me in the right direction.
Thanks alot!
EDIT - Solution.
What I ended up doing was this - I went back to using the Entity Framework in conjunction with a redesign of the database schema. I remodeled the relationships to rely on primary keys rather than alternate keys, in most cases. Where that was not an option - I made some modifications to the EF layout. I implemented the relationship that relied on the AK's - at which time EF complains. To get around that I had to delete the foreign key property on the many side of the relationship at which point EF accepts the relationship.
1) Yes.
2) If you can mark your alternate key as primary in the L2S model and unmark the real PK as PK then it will work.
3) From the db perspective there's nothing wrong, but as you have noticed it is not supported by L2S or EF. Personally I prefer to always have FKs pointing to the PK and only use AKs for lookups.