Is it beneficial to use multicolumn (composite) primary keys when using Linq to SQL? - sql

Is it beneficial to use multicolumn (composite) primary keys for a many to many relationship table when using Linq to SQL?
Or should I just add an identity column as a non-clustered primary key and index the FK columns appropriately?

Not a LINQ issue. If you need them for your schema, then use them. If you don't, don't. Either way, LINQ will handle your schema just fine.
One area that LINQ to SQL doesn't handle well are multy column / key mapping table that are used to connect a many to many relationship but I wouldn't say this strickly falls under the category that your question addresses. You can still perform CRUD operations on a mapping table within LINQ but LINQ cannot walk the relationship presented by a many to many mapping table. (LINQ works fine with one to one and one to many tables.)
I can't speak to any issue with the Entity Framework but again, I would be very surprised if the EF had any issues with multi-column / multi-key tables.

If it makes sense in your domain to have a multi-column composite key, then use one. Otherwise use the usual identity column as the surrogate primary key.
EDIT: that was general advice and not taking into account any technical aspects of implementing using LINQtoSQL. These may be of interest:
How to: Handle Composite Keys in Queries (LINQ to SQL)
LINQ To SQL Samples
Linq to SQL DTOs and composite objects

Related

<select> for an entity with composite keys - strategy needed

So say I have database table tours (PK tour_id) holding region independent information and tours_regional_details (PK tour_id, region_id) holding region specific information.
Let's say I want to populate select control with entities from tours_regional_details table (my real scenarios are bit different, just imagine this for the sake of simplicity).
So, how would you tackle this? My guts says concatenate PKs into delimited strings, like "pk1|pk2" or "pk1,pk2" and use that as value of select control. While it works, feels dirty and possibly needs additional validation steps before splitting the string again, which again feels dirty.
I don't want to start a composite vs single pk holy war, but may this be a bad database design decision on my part? I always believed identifying relationships and composite keys are there for a reason, but I feel tempted to alter my tables and just stuff them with auto incremental IDs and unique constraints. I'm just not sure what kind of a fresh hell will that introduce.
I am a little bit flabbergasted that I encounter this for the first time now after so many years.
EDIT: Yes, there is a table regions (PK region_id) but is mostly irrelevant for the topic. While in some scenarios two select boxes would make sense, let's say here they don't, let's say I want only one select box and want to select from:
Dummy tour (Region 1)
Dummy tour (Region 2)
Another dummy tour (region 3)
...
Composite primary keys aren't bad database design. In an ideal world, our programming languages and UI libraries would support tuples and relations as first-class values, so you'd be able to assign a pair of values as the value of an option in your dropdown control. However, since they generally only support scalar variables, we're stuck trying to encode or reduce our identifiers.
You can certainly add surrogate keys / autoincrement columns (and unique constraints on the natural keys where available) to every table. It's a very common pattern, most databases I've seen have at least some tables set up like this. You may be able to keep existing composite foreign keys as is, or you may want/need to change them to reference the surrogate primary keys instead.
The risk with using surrogate keys for foreign keys is that your access paths in the database become fixed. For example, let's assume tours_regional_details had a primary key tours_regional_detail_id that's referenced by a foreign key in another table. Queries against this other table would always need to join with tours_regional_details to obtain the tour_id or region_id. Natural keys allow more flexible access paths since identifiers are reused throughout the database. This becomes significant in deep hierarchies of dependent concepts. These are exactly the scenarios where opponents of composite keys complain about the "explosion" of keys, and I can at least agree that it becomes cumbersome to remember and type out joins on numerous columns when writing queries.
You could duplicate the natural key columns into the referencing tables, but storing redundant information requires additional effort to maintain consistency. I often see this done for performance or convenience reasons where surrogate keys were used as foreign keys, since it allows querying a table without having to do all the joins to dereference the surrogate identifiers. In these cases, it might've been better to reference the natural key instead.
If I'm allowed to return to my ideal world, perhaps DBMSs could allow naming and storing joins.
In practice, surrogate keys help balance the complexity we have to deal with. Use them, but don't worship them.

Entity Framework Indexing ALL foreign key columns

This may be too much of an opinion-based question but here goes:
I've found an interesting quirk with Entity Framework and database migrations. It seems that whenever we create a foreign key it also creates an index on that column.
I read this SO question: Entity Framework Code First Foreign Key adding Index as well and everyone seems to say it's a great, efficient idea but I don't see how; indexing a column is very circumstance-specific. For instance, EF is indexing FKs on my table that are almost never (~1%) used for searches and are also on a source table, meaning that even when I join other tables, I'm searching the FK's linked table using it's PK...there's no benefit from having the FK indexed in that scenario (that I'm aware of).
My question:
Am I missing something? Is there some reason why I would want to index a FK column that is never searched and is always on the source table in any joins?
My plan is to remove some of these questionable indexes but I wanted to to confirm that there's not some optimization concept that I'm missing.
In EF Code First, the general reason why you would model a foreign key relationship is for navigability between entities. Consider a simple scenario of Country and City, with eager loading defined for the following LINQ statement:
var someQuery =
db.Countries
.Include(co => co.City)
.Where(co => co.Name == "Japan")
.Select(...);
This would result in a query along the lines of:
SELECT *
FROM Country co
INNER JOIN City ci
ON ci.CountryId = co.ID
WHERE co.Name = 'Japan';
Without an Index on the foreign key on City.CountryId, SQL will need to scan the Cities table in order to filter the cities for the Country during a JOIN.
The FK index will also have performance benefits if rows are deleted from the parent Country table, as referential integrity will need to detect the presence of any linked City rows (whether the FK has ON CASCADE DELETE defined or not).
TL;DR
Indexes on Foreign Keys are recommended, even if you don't filter directly on the foreign key, it will still be needed in Joins. The exceptions to this seem to be quite contrived:
If the selectivity of the foreign key is very low, e.g. in the above scenario, if 50% of ALL cities in the countries table were in Japan, then the Index would not be useful.
If you don't actually ever navigate across the relationship.
If you never delete rows from the parent table (or attempt update on the PK) .
One additional optimization consideration is whether to use the foreign key in the Clustered Index of the child table (i.e. cluster Cities by Country). This is often beneficial in parent : child table relationships where it is common place to retrieve all child rows for the parent simultaneously.
Short answer. No.
To expand slightly, at the database create time, entity framework does not know how many records each table or entity will have, nor does it know how the entities will be queried.
*In my opinion * the creation of a foreign key is more likely to be right than wrong, I had massive performance issues using a different ORM which took longer to diagnose because I thought I had read in the documentation that it behaved the same way.
You can check the Sql statement that EF produces and run it manually if you want to double check.
You know your data better than EF does, and it should work just fine if you drop the index manually.
IIRC you can create 1 way navigation properties if you use the right naming convention, although this was some time ago, and I never checked whether the index was created.
Change the conflict FK (Foreign Key) name in ApplicationDbContextModelSnapshot file with another one. Then add migration again. It will override to it and not gonna give error.

Fluent Nhibernate mapping Legacy DB with composite key

I am using Fluent NHibernate (which I am fairly new to) in an application I am developing using a legacy Oracle DB. The DB has composite keys which are comprised of foreign keys and database generated columns. The generated columns are supplied by calling a DB function with the table name, and one of the other foreign key parts. The generated composite key parts are not unique, and I cannot change this. The generated key parts are often used as foreign keys on other tables too.
If I create entity mapping which specifies the composite key as it is in the database, then we cannot use any identity generation strategies, which breaks unit of work
If I create entity mapping which specifies only the generated column as the primary key, then I can use trigger-identity to generate the ids, and I get unit of work, but I then have a problem when I want to update, or access a child collection: The other parts of the key are not included in the WHERE statement.
Can anyone give me any advice on how to proceed?
If I stick with mapping composite keys, can I extend nhibernate to output the SQL to use trigger-identity? If so, can you suggest a starting point?
If I map a single column key, can I include other properties in a WHERE clause for HasMany mapping and Updates?
Unfortunately, as you have already found out, there is no support at all for this setup.
My suggestion is to do INSERTS manually (using custom SQL, for example). And yes, this breaks the UoW, but that is true of identity too.

Should I use an index column in a many to many "link" table?

I have two tables, products and categories which have a many to many relationship, so I'm adding a products_categories table which will contain category_id and product_id.
Should I add another (auto-incrementing) index column or use the two existing ones as primary key?
That depends.
Are you seeing your data more as set of objects (and relational database is just
a storage medium) or as set of facts represented and analyzed natively
by relational algebra.
Some ORMs/Frameworks/Tools don't have good support for multicolumn primary keys.
If you happen to use one of them, you'll need additional id column.
If it's just a many-to-many relationship with no additional data associated with it,
it's better to avoid additional id column and have both columns as a primary key.
If you start adding some additional information to this association, then it may reach a point when it becomes
something more then many-to-many relationship of two entities.
It becomes an entity in it's own right and it'd be more convenient if it had it's own id
independent to entities it connects.
You don't need to add an extra, auto-incrementing index column, but I (perhaps contrary to most others) still recommend that you do. First, it is easier in the application program to refer to a row using a single number, for example when you delete a row. Second, it sometimes turns out to be useful to be able to know the order in which the rows were added.
No, it's not necessary at all, given that these two columns are already executing the function of a primary key.
This third column whould just add more space to your table.
But... You could use it maybe to see the order in which your records where added to your table. That's the only function I can see to this column.
You don't need to add an auto-incrementing index column. Standard practice is to use just the two existing columns as your primary key for M:M association tables like you describe.
I would make the primary key category_id and product_id. Add an auto increment only if the order will ever be relevent in later uses.
There's a conceptual question - is products_categories an entity or is simply a table that represents a relationship between two entities? If it's an entity then, even if there are no additional attributes, I'd advocate for a separate ID column for the entity. If it's a relationship, if there are additional attributes (say, begin_date, end_date or something like that), I'd advocate to have a multi-column primary key.

Why use primary keys?

What are primary keys used aside from identifying a unique column in a table? Couldn't this be done by simply using an autoincrement constraint on a column? I understand that PK and FK are used to relate different tables, but can't this be done by just using join?
Basically what is the database doing to improve performance when joining using primary keys?
Mostly for referential integrity with foreign keys,, When you have a PK it will also create an index behind the scenes and this way you don't need table scans when looking up values
RDBMS providers are usually optimized to work with tables that have primary keys. Most store statistics which helps optimize query plans. These statistics are very important to performance especially on larger tables and they are not going to work the same without primary keys, and you end up getting unpredictable query response times.
Most database best practices books suggest creating all tables with a primary key with no exceptions, it would be wise to follow this practice. Not many things say junior software dev more than one who builds a database without referential integrity!
Some PKs are simply an auto-incremented column. Also, you typically join USING the PK and FK. There has to be some relationship to do a join. Additionally, most DBMS automatically index PKs by default, which improves join performance as well as querying for a particular record based on ID.
You can join without a primary key within a query, however, you must have a primary key defined to enforce data integrity constraints, at least with SQL Server. (Foreign Keys, etc..)
Also, here is an interesting read for you on Primary Keys.
In Microsoft Access, if you have a linked table to, say, SQL Server, the source table must have a primary key in order for the linked table to be writeable. At least, that was the case with Access 2000 and SQL Server 6.5. It may be different with later versions.
Keys are about data integrity as well as identification. The uniqueness of a key is guaranteed by having a constraint in the database to keep out "bad" data that would otherwise violate the key. The fact that data integrity rules are guaranteed in that way is precisely what makes a key usable as an identifier. That goes for any key. One key per table by convention is called a "primary" key but that doesn't make other alternate keys any less important.
In practice we need to be able to enforce uniqueness rules against all types of data (not just numbers) to satisfy the demands of data quality and usability.