I've recently been reading Louis Davidson's book on Sql Server Database Design and found it quite informative. I've picked up on alot of concepts that I didn't previously know alot (or anything) about. Primarily - I picked up on a way to set up database relationships that I hand't tried before.
Basically you use a surrogate key as the tables PK (an auto incremented id field) and then set up one or more Alternate Keys consisting of one or more Unique keys. Theese alternate kays would then be the values used for relationships (or the PK, if that makes more sense for the given relationship).
I remodelled an old database that was suffering from some data inconsitencies due to poor design to implement this, to me, new way of thinking.
On a database level it works great. Tha relationships function the way they're supposed to and the constraints are enforced in a consistent, reliable manner.
HOWEVER
I cannot get it to work properly in either the Entity Framework or in Linq to Sql classes. I read that in V1 of EF, it just flat out won't support this kind of relationship - so I moved to Linq to Sql to see if things would work out better. They seemingly did, as I got all the relationships automatically mapped out when I imported the classes from my database. The problem is that I can't save data to the database because of InvalidCastOperation exceptions as soon as I try to save data.
So I have a couple of questions:
Is this a limitation in Linq To Sql?
If so, is there a way to work around
it? Preferebly without implementing
sprocs for save, update and delete -
type safety is something I would
like to keep.
Is this way of designing database
relationships "correct" and/or a
good practice?
I hope someone can shed some light on this, as I'm getting quite frustrated about it. I can't really find any good material on the subject online - so hopefully someone here has an answer or can point me in the right direction.
Thanks alot!
EDIT - Solution.
What I ended up doing was this - I went back to using the Entity Framework in conjunction with a redesign of the database schema. I remodeled the relationships to rely on primary keys rather than alternate keys, in most cases. Where that was not an option - I made some modifications to the EF layout. I implemented the relationship that relied on the AK's - at which time EF complains. To get around that I had to delete the foreign key property on the many side of the relationship at which point EF accepts the relationship.
1) Yes.
2) If you can mark your alternate key as primary in the L2S model and unmark the real PK as PK then it will work.
3) From the db perspective there's nothing wrong, but as you have noticed it is not supported by L2S or EF. Personally I prefer to always have FKs pointing to the PK and only use AKs for lookups.
Related
Is it any better? I heard the CodeFirst extension but is it ready for primetime. Please share your experience with development, any performance overheads, etc.
I think this is a timely question, as I was wondering the exact same thing. I am trying to create a serious e-commerce model and I am trying to keep my POCOs free of persistence concerns as well as trying to stay true to Domain Driven Design. So far, I am very wary, and I am on the fence about whether I should jump ship to NHibernate. The only thing keeping me from doing so is that I assume that Microsoft will improve (and quickly).
Some of the biggest problems so far:
Inability to finely control object materialization. EF calls the zero-arg constructor on your POCO, and this is a behavior you cannot change.
No enum support. The community has been screaming -- screaming! -- for this, and it hasn't happened. The workarounds are terrible, and pollute your domain model.
Weird mapping bugs when trying to control column names and relationships in the database. The main ones I can think of are with compound keys and many-to-many relationships. These can be worked around, and I assume these will be fixed by release time, but they are frustrating nonetheless.
Bad SQL. I also do DBA work, and the SQL that EF generates (with or without Code-First) is atrocious.
And this is just the tip of the iceberg: I am only starting to learn EF4 and I'm running into awful roadblocks. As I think of more reasons, I'll add them here. I'm still struggling through it.
(I wonder whether the community will give it another vote of "no confidence.")
More:
To add to the "Weird mapping bugs" problem: You cannot control the name of a column if it participates in a self-referencing relationship (for example, if you have a hierarchy). I assume this will be fixed in the final release.
Lack of batching, resulting in multiple roundtrips to the database. For example, how do you delete a bunch of items from a collection? Load all entities into memory and delete them one at a time. A smaller gripe is the number of DB hits when inserting into tables that participate in an inheritance relationship.
No intelligent way to deal with model changes. EF Code-First loves to completely drop your entire database if it needs to change the schema.
Few extensibility points. You can literally count on one hand the number of events that EF4 allows you to subscribe to (and Code-First doesn't provide much more).
As for me - I prefer EF but with some enhancements. Basically EF offers to you the following advantages:
Visual Model Editor
Database/Model Update wizard (instead of manual XML changes - what is terrible for me)
Also, I'm using 3-rd party commercial tools based on EF and L2S (LinqConnect) that provide for me the following features:
Geography support
Optimized SQL generation
Product absolutely integrated to Visual Studio
Smart database update wizard (synchronization mode)
So far I always enforce my DB with FK relationship. Things changed yesterday while mapping some classes with FluentNhibernate. My mapping didn't work and I discovered that's the issue was because of the order FN create the query.
Now a question arise: should I keep enforcing data with FK or it's better to avoid it since I focus on domain classes instead of sql queries?
Thanks
To my knowledge, it will be far better to keep your database consistent,
cause you may not be the only one who works on this DB in future,
and maybe someone else have access to the DB and do sth that could corrupt your data consistency
and as a result your application also doesn't behavior in the way you expect because of assummed conditions that no longer hold.
Letting Fluent/NH create your database during development is fine, but when it goes into production you really should check all the foreign keys, index's, etc etc and then only do scripted changes there on after.
Keep your database consistent, maintain referential integrity.
If a tool you are using breaks as a result there is bound to be a workaround. However if you lose referential integrity to use nhibernate - what happens if you decide to use a different ORM? You will have a dodgy database and who's to say that the next ORM in line will like that?
Its like a separation-of-concerns question, each chunk of your application should be designed to be robust enough to survive if another chunk is changed or removed - so don't change good database practice simply to make a product that is layered above it play nicely.
Using a domain-driven approach , or model oriented approach where the DB is merely seen as an 'implementation-detail', does not mean that you should ignore the integrity of your data.
I see no reason why you should drop foreign-key (and other) constraints from your database.
The database is more then just a storage for your data. It's task is also to guard the integrity of it.
It is perfectly possible to combine the 2 worlds (domain driven and relational database) with NHibernate. Make sure that the 2 areas focus on what they're best at. And, the database is best at storing data and making sure that the data remains valid / integer.
So, I have an app that uses a SQL Server express db. I have about 80ish tables all with a primary key but no foreign keys. (The reason we have no foreign keys is because of how we do our sql client-to-server replication. It's not true replication but it's a sync system that was in place when we took over the app. We have no guarantee what records are going to make it to the database first when a client syncs to the server so it is possible that a record would make it to the database with a foreign key that points to a nonexistant record).
We use a type-per-model convention. For each of our business objects there is a table in the db. We currently use stored procedures for every database transaction. This means for every new class there is at least 4 new stored procedures (crud). We have abstracted out our data access layer from our business objects. Each business object has a corresponding businessObjectDAO.
My question is, is entity framework feasible for me to move to? With no foreign key relationships I'm going to have to set up every association between tables manually. Is it worth the time to do this?
My biggest hang up right now is trying to figure out how I map my DAOs to the EF partial classes.
Should I be creating one big .edmx or multiple?
A lot of questions I know. This is my first big architectural type decision and I've been given the go ahead to make the change if I think it is beneficial and feasible.
Maybe I should try Linq-to-SQL? NHibernate is out because we're not allowed using open source products in production (stupid, I know).
Thanks
Cody
My personal recommendation is that if something is working, leave it. I am a big big fan of both LINQ-SQL and Entity Framework, and have managed to get my workplace to make use of the Linq-Sql. I realise that if you did bring one of these in place with your project, maintainability would probably be easier, but by the sounds of it the initial work will be more work than is worth it in the end.
Consider a database(MSSQL 2005) that consists of 100+ tables which have primary keys defined to a certain degree. There are 'relationships' between tables, however these are not enforced with foreign key constraints.
Consider the following simplified example of typical types of tables I am dealing with. The are clear relations between the User and City and Province tables. However, they key issues is the inconsistent data types in the tables and naming conventions.
User:
UserRowId [int] PK
Name [varchar(50)]
CityId [smallint]
ProvinceRowId [bigint]
City:
CityRowId [bigint] PK
CityDescription [varchar(100)]
Province:
ProvinceId [int] PK
ProvinceDesc [varchar(50)]
I am considering a rewrite of the application (in ASP.net MVC) that uses this data source as is similar in design to MVC storefront. However I am going through a proof of concept phase and this is one of the stumbling blocks I have come across.
What are my options in terms of ORM choice that can be easily used and why?
Should I even be considering an ORM? (The reason I ask this is that most explanations and tutorials all work with relatively cleanly designed existing databases, or newly created ones when compared to mine. I am thus having a very hard time trying to find a way forward with this problem)
There is a huge amount of existing SQL queries, would a datamappper(eg IBatis.net) be more suitable since we could easily modify them to work and reuse the investment already made?
I have found this question on SO which indicates to me that an ORM can be used - however I get the impression that this a question of mapping?
Note: at the moment, the object model is not clearly defined as it was non-existent. The existing system pretty much did almost everything in SQL or consisted of overly complicated, and numerous queries to complete functionality. I am pretty much a noob and have zero experience around ORMs and MVC - so this an awesome learning curve I am on.
I agree with Ben.
I was in this situation with a LAMP stack. An old dirty, bady coded website needed bringing up to scratch. It was literally the worst database I have seen, coupled with line after line of blind SQL execution.
Job? Get rid of all that SQL very quickly and replace it with an abstraction. Which ORM? I found that using an existing ORM to fit over a bad database (most databases really) retrospectively is bad news. I think this is a problem with ORMs, they move database/storage concerns closer to the application ... not further away.
My Solution: A reflective ORM that used only the existing database state to work out what was going on. All selects, inserts, updates and what-not used views/stored proceedures to mask the cruddy database. It is powered by a linq-esque API just rewrite the grim SQL with. Boiled around 100klocs SQL statements down to less than 2klocs.
pros: I can gradually port the database to a better structure behind the views and proceedures. IMHO this is how all databases should be organised, taking full advantage of the abstraction that SPs and views provide. I never want to see a single SQL statement (or an ORM masquerading as SQL) directly against a table.
That's my story. An overengineered way to slot a nice abstraction above an existing and crap database, without rewriting the database first, and without crowbaring an ORM into the mix making things much more complex.
a hack, no doubt, but it works so well I am using it in projects where I can design the database from scratch anyway ;)
The amount of work involved in trying to keep the existing schema and then crowbaring it into a much more structured orm pattern would probably be large and complex. If you are rewriting the whole system and retiring the old one then i would devise my data model create a new db and set of classes,maybe using linq2sql, then write a data migration script to move the data from the old schema to the new one. That way your complex fiddly code is all in the migration and you don't have to deal with maintining and managing a complex mapping between a structured class model and a badly designed db.
We've just faced this problem with an awful schema design (randomly has primary keys, no foreign keys at all, badly designed tables - just a mess).
We had the luxury of technology choice, and went MVC2 front end (irrelevant to your question), and had 2 devs split off - one try to model using NHibernate, the other using Entity Framework 4.
I hasten to add that we had a strong idea of what we wanted from our domain model, and modelled that first (not wanting to be constrained by the database), so our 'User' object from a schema point of view actually spanned 5 tables, we encapsulated a lot of the business logic so that the domain model wasn't aneamic, and once we were happy with our User object, we started the process of trying to plugin the ORM.
I can say without hesitation in both cases (NH and EF4) the compromises we had to make on our model in order to shoe-horn the implementation in was phenomenal. I'll give you the examples from EF4 as that's the one I was most closely involved in, others may be able to relate these to other ORMs.
private setters
Nope - not on your life with EF4. Your properties must be public. There are workarounds (for example, creating wrappers around properties that were coming in from your DB)
enums
Again, no - there was a wrapper concept and a 'mapping' to try to get a lookup int out of the DB into the models enum types.
outcomes
We persevered for a while with both approaches to get to a point where we'd completed the mapping of a user, and the outcome was that we had to compromise our domain model in too many ways.
where did we go after that?
Linq to SQL with our own mapping layer. And we've never looked back - absolutely fantastic - we've written the mapping layer ourselves once that takes the Dto object down at the Dal layer and maps it (as we specify it) into our Domain model.
Good luck with any investigation of ORMs, I'd certainly re-investigate them if I had a decent schema to base them off, but as it stood, with an awful schema, it was easier to roll our own.
Cheers,
Terry
I know, I quite dislike the catch-all survey type questions, but I couldn't think of a better way to find out what I think I need to know. I'm very green in the world of database development, having only worked on a small number of projects that merely interacted with the database rather than having to actually create a new one from scratch. However, things change and now I am faced with creating my own database.
So far, I have created the tables I need and added the columns that I think I need, including any link tables for many-many relationships and columns for one-to-many relationships. I have some specific questions on this, but I felt that rather than get just these answered, it would make more sense to ask about things I may not even know, which I should address now rather than 6 months from now when we have a populated database and client tools using it.
First the questions on my database which have led me to realise I don't know enough:
How do I ensure my many-to-many link tables and my one-to-many columns are up-to-date when changes are made to the referenced tables? What problems may I encounter?
I am using nvarchar(n) and nvarchar(MAX) for various text fields. Should I use varchar equivalents instead (I had read there may be performance risks in using nvarchar)? Are there any other gotchas regarding the selection of datatypes besides being wary of using fixed length char arrays to hold variable length information? Any rules on how to select the appropriate datatype?
I use int for the ID column of each table, which is my primary key in all but the link tables (where I have two primary keys, the IDs of the referenced table rows). This ID is set as the identity. Are there pitfalls to this approach?
I have created metadata tables for things like unit types and statuses, but I don't know if this was the correct thing to do or not. Should you create new tables for things like enumerated lists or is there a better way?
I understand that databases are complex and the subject of many many worthy tomes, but I suspect many of you have some tips and tricks to augment such reading material (though tips for essential reading would also be welcome).
Community wiki'd due to the rather subjective nature of these kinds of posts. Apologies if this is a duplicate, I've conducted a number of searches for something like this but couldn't find any, though this one is certainly related. Thanks.
Update
I just found this question, which is very similar in a roundabout way.
Not normalising
Not using normalisation
Trying to implement a denormalised schema from the start
Seriously:
Foreign keys will disallow deletes or updates from the parent tables. Or they can be cascaded.
Small as possible: 2 recent SO questions datatypes and (n)varchar
May not be portable and your "natural key" (say "product name") still needs a unique constraint. Otherwise no, but remember that an IDENTITY column is a "surrogate key"
Edit: Say you expect to store fruit with columns FruitID and FruitName. You have no way to restrict to one occurence of "Apple" or "Orange" because although this is your "natural key", you are using a surrogate key (FruitID). So, to maintain integrity, you need a unique constraint on FruitName
Not sure or your meaning, sorry. Edit: Don't do it. Ye olde "One true lookup table" idea.
I'll reply to your subjective query with some vague generalities. :)
The most common pitfall of designing a database is the same pitfall of any programming solution, not fully understanding the problem being solved. In the case of a database, it is understanding the nature of the data. How big it is, how it comes and goes, what business rules must it adhere to.
Here are some questions to ponder.
What is updated the most frequently? Is keeping that table write-locked going to lock up queries? Will it become a hot spot? Even a seemingly well normalized schema can be a poor performer if you don't understand your read versus write ratios.
What are your external interface needs? I've been on projects where the dotted line to "that other system" nearly scuttled the whole project because implementing it was delayed until everything else was in place, that is to say, everything else was inflexible.
Any other unspoken requirements? My favorite is date sensitivity. All the data is there, your reports are beautiful, the boss looks them over and asks, when did that datum change? Who did it and when? Is the database supposed to track itself and its users, or just the data? Will your front end do it for you?
Just some things to think about.
It does sounds like you've got a good grasp on what you're meant to be doing, and indeed there isn't "one true path" to doing databases.
Have you set up cascades for your hierarchical objects (i.e., a single delete at the 'head' of your object in the database will delete all entries in tables relating to that entry)?
Your link tables and 1:n columns should be foreign keys, so there isn't much to worry about if the data changes. By "two primary keys" here, did you mean indexes?
As for metadata tables, I've done them in the past, and I've not done them. A single char status with SQL comment can suffice for a limited set of statuses, but beyond a certain amount, or where you can think of adding more in the future, you might want to have a reference to another table of metadata, or maybe a char(8ish). E.g., I've seen user tables have "NORMAL", "ADMIN", "SUPER", "GUEST", etc for user type, which could have been 1,2,3,4,5 fkeys to a "UserType" table, but with such a restricted enumeration does it matter? Other people have a table of permissions (booleans of what a user can do) instead - many ways to skin a cat.
You might find some usable stuff in these slides:
[http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back][1]
I also am a beginner to database design, but I found this online tutorial very, very helpful:
Database design with UML and SQL, 3rd edition
The author explains all the fundamental design aspects of database, and in a very clear manner. Before I found this online guide I did a lot of wikipedia reading about normalization. While that helped, this author explains the exact same stuff (through 3rd normal form, at least) but in a much, much easier to read way. It pretty much addresses all your questions as well.
I'd suggest a good book. The best IMO is this:
http://www.amazon.com/Server-2005-Database-Design-Optimization/dp/1590595297/ref=ntt_at_ep_dpt_1
In addition to not normalizing, a common problem I see is overindexing, done before there are performance measurements that take into account your in-production mix of reads vs. writes.
It's really, really easy to add an index to speed up a query, and harder to figure out which one to remove when you have several that are getting updated during an INSERT or UPDATE.
The middle ground is to go after obvious secondary indexes (e.g., for common, frequent lookups by name on large tables), deferring other candidate indexes until you have reasonable performance tests in place.
Among other things, not using primary keys, not thinking ahead about whether you'll be using indexed views (and designing tables accordingly; I once had to drop and recreate a large table at my site to change its ANSI_NULL attribute to ON so that I could then use it with an indexed view), using indices.