Identity Column as Primary Key - primary-key

Could you please opine if having identity column as primary key is a good practise?
For ORM tools, having identity column on tables helps. But there are other side effects such as accidental duplicate insertion.
Thanks
Nayn

Yes, using a INT (or BIGINT) IDENTITY is very good practice for SQL Server.
SQL Server uses the primary key as its default clustering key, and the clustering key should always have these properties:
narrow
static
unique
ever-increasing
INT IDENTITY fits the bill perfectly!
For more background info, and especially some info why a GUID as your primary (and thus clustering key) is a bad idea, see Kimberly Tripp's excellent posts:
GUIDs as PRIMARY KEYs and/or the clustering key
The Clustered Index Debate Continues...
Ever-increasing clustering key - the Clustered Index Debate..........again!
If you have reasons to use a GUID as primary key (e.g. replication), then by all means make sure to have a INT IDENTITY as your clustering key on those tables!
Marc

IDENTITY keys are a good practice for server-side generated keys, in environments where you don't have replication or heavy data merging. The way they're implemented, they don't allow duplicates in the same table, so don't worry about that. They also have the advantage of minimizing fragmentation in tables that don't have lots of DELETEs.
GUIDs are the usual alternative. They have the advantage that you can create them at the web tier, without requiring a DB round-trip. However, they're larger than IDENTITIES, and they can cause extreme table fragmentation. Since they're (semi) random, inserts are spread through the entire table, rather than being focused in one page at the end.

I use a Guid because it really helps when I am dealing with distributed applications. Especially when all the distributed instances also need to create new data.
Nevertheless, I don't see any problems with autoincrement integer primary keys in simple situations. I would prefer them actually because it is easier to work directly with SQL queries because it is easier to remember.

Related

Surrogate vs Natural Primary Keys, *SPECIFICALLY* in a Data Warehouse. Is this debated?

Are Surrogate vs Natural Primary Keys generally debated in the world of data warehouses? To be clear - the natural keys would be there regardless. And by surrogate keys, I mean keys that don't exist in the source system, but are created as part of the ETL of the datawarehouse.
Is it debated whether to rely on the source systems natural keys as primary keys, or to assign surrogate keys as part of ETL?
My (limited) understanding has always been that in operational systems - it could go either way depending on the situation/person, but that in a data warehouse setting - surrogate keys were the non-debated norm for the primary keys.
Accurate, or is it more debated than that?
Natural keys are virtually essential for almost any practical data warehouse solution (business key or domain key is really a much better term than natural key). The question is whether and when to use surrogate keys as well as, not instead of some other key. Managing surrogate keys can add a lot of complexity and some significant overhead so the best answer is "it depends...".
If your warehouse is based on a distributed write-once technology like HDFS, then surrogates would probably make no sense. If you are using some historical data capture mechanism like Microsoft's temporal tables or Oracle's flashback then you'll probably find no need for surrogates. If you are taking a temporal modelling approach based on 5NF or 6NF then you usually won't need surrogates either but you might want to use them for certain tables.
If you are following a template like Data Vault or Kimball's methods then maybe you'll want to use surrogates because that's what it says in somebody's book.

Using GUIDs for Custom Tables?

As far as I know, SAP CRM and HANA both utilise GUIDs to uniquely identify records instead of using classic incremented integers. Are there best practices or clear guidelines that cover their use?
Here are some factors I've considered in favour of GUIDs:
Offline creation of objects. IIRC GUIDs are near-guaranteed to be unique in these situations so merging or integration of disparate data sets is not an issue.
Surrogate keys have distinct development advantages. While incrementing integers are a form of surrogate key, use of different number sequences can impose a functional meaning on them.
And some scenarious that favour classic keys:
Users require human-readable keys to identify records in the system. This can be handled in GUID tables by also specifying an external ID with a readable value.
Users want to use number sequences to identify different types of records, similar to sales or purchase documents. Though I actually consider this bad design.
What scenarios for custom development would make you prefer GUIDs over classic keys?
Is blanket-usage of GUIDs for all tables a good idea?
To answer the question at the end: No, it isn’t (at least not in an ABAP environment, and I doubt it’s sensible elsewhere). Using GUIDs for primary keys everywhere makes it awfully hard to maintain and follow complex foreign key relationships at runtime. Just imagine having to debug a program that handles everything using GUIDs instead of the semantic keys you’re used to. And remember that the total length of the primary key may not exceed 255, and the total length of the primary key should not exceed 120 if you want to be able to transport table entries using fully qualified keys. Using GUIDs in composite keys blows the keys up unnecessarily, and using them as synthetics keys makes using foreign key relationships virtually impossible. So no, using GUIDs everywhere is not a good idea, especially not for configuration / customizing data.
It is however a good idea to use GUIDs in almost every place where you would have used a number range object in “old-school ABAP development”. GUIDs can be generated by the application server, while number ranges require network communication to the enqueuing server. (Yes, there is some buffering involved, but generally speaking, GUIDs are a lot faster and easier to handle). So unless you need your keys to follow a certain pattern, you should consider using a GUID. Even if you need some kind of sequential number for whatever business reasons, it might be sensible to use a GUID as the primary key and store the sequential number inside an (indexed) attribute to increase flexibility at development time.

Index on every Foreign Key?

Does Index on every Foreign Key makes queries optimized ??
Typically it's considered good practice to place indexes on foreign keys. This is done b/c it helps with join performance when linking the FK table to the table that contains the definition of the key.
This doesn't magically make your entire query optimized, but it will definitely help to improve the join performance between the FK and it's Primary Key counter-part.
It might be a seen as a good practice to add an index on every foreign key, but you should be warned that if you have a large database, the more index you have, the more heavy you system will become. There are always extra maintenance and system resource cost required when adding an index.
I personally would add indexes only on the foreign keys that are used in queries that needs optimization. Be sure to keep your indexes up to date by occasionally running a profiler to monitor your system.
i did a little bit of testing on this, and i didn't find any performance enhancement, but SQLMenace will tell you otherwise. My opinion is to try it and see if it works for you.

Is it good to introduce nHibernate for a legacy database in an ongoing project?

I am working on a current ongoing project where, there are two instances of the database having different schemas for some of the tables and is being used for transfer from one to another.
Database schema is not well defined like,
No Primary key for some of the tables
Primary key as a composite key
Foreign keys in composite primary keys
Foreign key constraint referencing the primary key column of the same table
Composite primary key has been referenced as a foreign key in another table
Having more than 400 tables and will be increased
Application having very less OOPS concept implemented or let's say less objects used at all.
So, looking for some answers if at all, we introduce NHibernate with Repository pattern at this particular time, to faster the development process.
Cheers.
I have successfully introduced it into a project that had tons of custom sql and it worked quite successfully, hardest part was mapping the tables to an object model that was at least partly oksih. But other than that, it was good and it made things go a lot faster and helped with testing and got rid of a lot of SQL query issues.

Does introducing foreign keys to MySQL reduce performance

I'm building Ruby on Rails 2.3.5 app. By default, Ruby on Rails doesn't provide foreign key contraints so I have to do it manually. I was wondering if introducing foreign keys reduces query performance on the database side enough to make it not worth doing. Performance in this case is my first priority as I can check for data consistency with code. What is your recommendation in general? do you recommend using foreign keys? and how do you suggest I should measure this?
Assuming:
You are already using a storage engine that supports FKs (ie: InnoDB)
You already have indexes on the columns involved
Then I would guess that you'll get better performance by having MySQL enforce integrity. Enforcing referential integrity, is, after all, something that database engines are optimized to do. Writing your own code to manage integrity in Ruby is going to be slow in comparison.
If you need to move from MyISAM to InnoDB to get the FK functionality, you need to consider the tradeoffs in performance between the two engines.
If you don't already have indicies, you need to decide if you want them. Generally speaking, if you're doing more reads than writes, you want (need, even) the indicies.
Stacking an FK on top of stuff that is currently indexed should cause less of an overall performance hit than implementing those kinds of checks in your application code.
Generally speaking, more keys (foreign or otherwise) will reduce INSERT/UPDATE performance and increase SELECT performance.
The added benefit of data integrity, is likely just about always worth the small performance decrease that comes with adding your foreign keys. What good is a fast app if the data within it is junk (missing parts or etc)?
Found a similar query here: Does Foreign Key improve query performance?
You should define foreign keys. In general (though I do not know the specifics about mySQL), there is no effect on queries (and when there is an optimizer, like the Cost based optimizer in Oracle, it may even have a positive effects since the optimizer can rely on the foreign key information to choose better access plans).
As per the effect on insert and update, there may be an impact, but the benefits that you get (referential integrity and data consistency) far outweight the performance impact. Of course, you can design a system that will not perform at all, but the main reason will not be because you added the foreign keys. And the impact on maintaining your code when you decide to use some other language, or because the business rules have slightly changed, or because a new programmer joins your team, etc., is far more expensive than the performance impact.
My recommendation, then, is yes, go and define the foreign keys. Your end product will be more robust.
It is a good idea to use foreign keys because that assures you of data consistency ( you do not want orphan rows and other inconsistent data problems).
But at the same time adding a foreign key does introduce some performance hit. Assuming you are using INNODB as the storage engine, it uses clustered index for PK's where essentially data is stored along with the PK. For accessing data using secondary index requires a pass over the secondary index tree ( where nodes contain the PK) and then a second pass over the clustered index to actually fetch the data. So any DML on the parent table which involves the FK in question, will require two passes over the index in the child table. Ofcourse, the impact of the performance hit depends on the amount of data, your disk performance, your memory constraints ( data/index cached). So it is best to measure it with your target system in mind. I would say the best way to measure it is with your sample target data, or atleast some representative target data for your system. Then try to run some benchmarks with and without FK constraints. Write client side scripts which generate the same load in both cases.
Though, if you are manually checking for FK constraints, I would recommend that you leave it upto mysql and let mysql handle it.
Two points:
1. are you sure that checking integrity at the application level would be better in terms of performance?
2. run your own test - testing if FKs have positive or negative influence on performance should be almost trivial.