Which Database can i Safely use a GUID as Primary Key besides SQL Server? - sql

The reason I want to use a Guid is because in the event that I have to split the database into I won't have primary keys that overlap on both databases. So if I use a Guid there won't be any overlapping. I also want to use the GUID in the url also, so the Guid will need to be Indexed.
I will be using ASP.NET C# as my web server.

Postgres has a UUID type. MySQL has a UUID function. Oracle has a SYS_GUID function.

As others have said you can use GUIDs/UUIDs in pretty much any modern DB. The algorithm for generating a GUID is pretty straitforward and you can be reasonably sure that you won't get dupes however there are some considerations.
+) Although GUIDs are generally representations of 128 Bit values the actual format used differs from implementation to implemenation - you may want to consider normalizing them by removing non-significant characters (usually dashes or spaces).
+) To absolutely ensure uniqueness you can also append a value to the guid. For example if you're worried about MS and Oracle guids colliding add "MS" to the former and "Or" to the latter - now even if the guids themselves do collide they keys won't.
As others have mentioned however there is a potentially severe price to pay here: your keys will be large (128 bits) and won't index very well (although this is somewhat dependent on the implementation).
The techique works very well for small databases (especially those where the entire dataset can fit in memory) but as DBs grow you'll definately have to accept a performance trade-off.
One thing you might consider is a hybrid approach. Without more information it's hard to really know what you're trying to do so these might not help:
1) Remember that primary keys don't have to be a single column - you can have a simple numeric key to identify your rows and another row, containing a single value, that identifies the database that hosts the data or created the key. Creating the primary key as aggregate of both columns allows indexing to index fewer complex values and should be significantly faster.
2) You can "fake it" by constructing the key as a concatenated field (as in the above idea to append a DB identifier to the key). So your key would be a simple number followed by some DB identifier (perhaps a guid for each DB).
Indexing such a value (since the values would still be sequential) should be much faster.
In both cases you'll have some manual work to do if you ever do split the DB(s) - you'll have to update some keys with a new DB ID, but this would be a one-time,infrequent event. In exchange you can tune your DB much better.
There are definately other ways to ensure data integrity across mutiple databases. Many enterprise DBMSs have tools built-in for clustering data across multiple servers or databases, some have special tools or design patterns that make it easier, etc.
In short I would say that guids are nice and simple and do what you want, but that you should only consider them if either a) the dataset is small or b) the DBMS has specific features to optimize their use as keys (for example sequential guids). If the datasets are going to be very large or if you're trying to limit DBMS-specific dependencies I would play around more with optimizing a "key + identifier" strategy.

Most any RDBMS you will use can take any number and type of columns as a PK. So, if you're storing the GUID as a CHAR(n) for some length n, you should be fine. Now, I'm not sure if this is advisable, as I'm guessing indexing on CHARs is not as efficient as on integers.
Hope that helps.

I suppose you could store a GUID as an int128 as well.
Both mySQL and postgres are known to support GUID datatypes (I believe it's called UUID but it's the same thing).

Unless I have completely lost my memory, a properly designed 3rd+ normal form database schema does not rely on unique ints, or by extension GUIDs or UUIDs for primary keys. Nor does it use intermediate lookup tables of ints/GUIDS/UUIDS to relate the tables containing the data.
You should grind your schema until it expresses the relations amongst tables of data in terms of the data in the tables, not auto-generated identifiers that have no intrinsic relationship to the data.
I freely grant that you may just possibly be doing something that really really requires GUIDs (or auto-increment integers) for primary keys. But I seriously doubt that is the case - it almost never is.

You can implement your own membership provider based on whatever database schema you choose to design. It's nowhere near as tricky as it may look at first.
google "roll your own membership provider" for plenty of pointers.

In my theoretical little world, you'd be able to do this with SQLite. You'd generate the Guid from .Net and write it to the SQLite database as a string. You could also index that field.
You do loose some of the index benefits because it'd be stored as a string but it should be fully backwards compatible so that you could import/export to/from SQL Server.

From looking through the comments it looks like you are trying to use a different database to MS SQL with the ASP.net membership provider - as others have mentioned you could roll your own provider to use a different DB however a quick Google search turned up a few ready made options:
MySQL Provider
MySQL Provider 2
SqlLite Provider
Hope these help

If you are using other MS technologies already you should consider Sql Server Express.
http://www.microsoft.com/express/sql/default.aspx
It is a real implementation of MS Sql Server and it is free. It does have significant limitations as you might imagine, but if your product can fit inside those you get the support, developer community and stability of Sql Server and a clear upgrade path if you need to grow.

Related

Using GUIDs for Custom Tables?

As far as I know, SAP CRM and HANA both utilise GUIDs to uniquely identify records instead of using classic incremented integers. Are there best practices or clear guidelines that cover their use?
Here are some factors I've considered in favour of GUIDs:
Offline creation of objects. IIRC GUIDs are near-guaranteed to be unique in these situations so merging or integration of disparate data sets is not an issue.
Surrogate keys have distinct development advantages. While incrementing integers are a form of surrogate key, use of different number sequences can impose a functional meaning on them.
And some scenarious that favour classic keys:
Users require human-readable keys to identify records in the system. This can be handled in GUID tables by also specifying an external ID with a readable value.
Users want to use number sequences to identify different types of records, similar to sales or purchase documents. Though I actually consider this bad design.
What scenarios for custom development would make you prefer GUIDs over classic keys?
Is blanket-usage of GUIDs for all tables a good idea?
To answer the question at the end: No, it isn’t (at least not in an ABAP environment, and I doubt it’s sensible elsewhere). Using GUIDs for primary keys everywhere makes it awfully hard to maintain and follow complex foreign key relationships at runtime. Just imagine having to debug a program that handles everything using GUIDs instead of the semantic keys you’re used to. And remember that the total length of the primary key may not exceed 255, and the total length of the primary key should not exceed 120 if you want to be able to transport table entries using fully qualified keys. Using GUIDs in composite keys blows the keys up unnecessarily, and using them as synthetics keys makes using foreign key relationships virtually impossible. So no, using GUIDs everywhere is not a good idea, especially not for configuration / customizing data.
It is however a good idea to use GUIDs in almost every place where you would have used a number range object in “old-school ABAP development”. GUIDs can be generated by the application server, while number ranges require network communication to the enqueuing server. (Yes, there is some buffering involved, but generally speaking, GUIDs are a lot faster and easier to handle). So unless you need your keys to follow a certain pattern, you should consider using a GUID. Even if you need some kind of sequential number for whatever business reasons, it might be sensible to use a GUID as the primary key and store the sequential number inside an (indexed) attribute to increase flexibility at development time.

About GUID usage

Wiki said it used to make class,interface uniquely identifier , how about object (actual instance) ??
When work with SQL,i also see the GUID for ID field (table user,..etc in database aspnetdb in asp.net MVC template project)
So I want to clearly understand the GUID usage, which case should use it , and is it really unique ,
Any explain appreciated
thank
For a good overview of what a GUID is, check out our good friend Wikipedia: GUID.
and is it really unique
GUIDs generated from the same machine are virtually guaranteed to be unique. You have an infinitesimally small chance of generating the same one twice on the same machine. Arguably you have a tiny chance of generating two GUIDs the same out in the wider world, but that chance is still small and the chances of those two GUIDs ever meeting are also pretty small. In fact you probably have a greater chance of the Large Hadron Collider generating a black hole that swallows the Earth than you would having two identical GUIDs meeting somewhere on a network.
Because of this, some people like to use it as the primary key for database tables. Personally i don't like to do this because:
an auto-incrementing integer gives me enough uniqueness to be able to use it as a primary key
GUIDs are a massive PITA to deal with when you are writing SQL queries.
Wiki said it used to make class,interface uniquely identifier
If you need an identifier that is unique across several disparate areas (like hives in a registry), then GUIDs are a good solution. In this particular case they are being used to identify a type. A concrete instance could also internally use a GUID identifier, but this is really only useful for data objects.

Is using MS SQL Identity good practice?

Is using MS SQL Identity good practice in enterprise applications? Isn't it make difficulties in creating business logic, and migrating database from one to another?
Personally I couldn't live without identity columns and use them everywhere however there are some reasons to think about not using them.
Origionally the main reason not to use identity columns AFAIK was due to distributed multi-database schemas (disconnected) using replication and/or various middleware components to move data. There just was no distributed synchronization machinery avaliable and therefore no reliable means to prevent collisions. This has changed significantly as SQL Server does support distributing IDs. However, their use still may not map into more complex application controlled replication schemes.
They can leak information. Account ID's, Invoice numbers, etc. If I get an invoice from you every month I can ballpark the number of invoices you send or customers you have.
I run into issues all the time with merging customer databases and all sides still wanting to keep their old account numbers. This sometimes makes me question my addiction to identity fields :)
Like most things the ultimate answer is "it depends" specifics of a given situation should necessarily hold a lot of weight in your decision.
Yes, they work very well and are reliable, and perform the best. One big benefit of using identity fields vs non, is they handle all of the complex concurrency issues of multiple callers attempting to reserve new id's. This may seem like something trivial to code but it's not.
These links below offer some interesting information about identity fields and why you should use them whenever possible.
DB: To use identity column or not?
http://www.codeproject.com/KB/database/AgileWareNewGuid.aspx?display=Print
http://www.sqlmag.com/Article/ArticleID/48165/sql_server_48165.html
The question is always:
What are the chances that you're realistically going to migrate from one database to another? If you're building a multi-db app it's a different story, but most apps don't ever get ported over to a new db midstream - especially when they start out with something as robust as SQL Server.
The identity construct is excellent, and there's really very few reasons why you shouldn't use it. If you're interested, I wrote a blog article on some of the common myths surrounding identity values.
The IDENTITY Property: A Much-Maligned Construct in SQL Server
Yes.
They generally works as intended, and you can use the DBCC CHECKIDENT command to manipulate and work with them.
The most common idea of an identity is to provide an ordered list of numbers on which to base a primary key.
Edit: I was wrong about the fill factor, I didn't take into account that all of the inserts would happen on one side of the B-tree.
Also, In your revised question, you asked about migrating from one DB to another:
Identities are perfectly fine as long as the migrating is a one-way replication. If you have two databases that need to replicate to each other, a UniqueIdentifier column may be your best bet.
See: When are you truly forced to use UUID as part of the design? for a discussion on when to use a UUID in a database.
Good article on identities, http://www.simple-talk.com/sql/t-sql-programming/identity-columns/
IMO, migrating to another RDBMS is rarely needed these days. Even if it is needed, the best way to develop portable applications is to develop a layer of stored procedures isolating your application from proprietary features:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/02/24/writing-ansi-standard-sql-is-not-practical.aspx

When is sqlite's manifest typing useful?

sqlite uses something that the authors call "Manifest Typing", which basically means that sqlite is dynamically typed: You can store a varchar value in a "int" column if you want to.
This is an interesting design decision, but whenever I've used sqlite, I've used it like a standard RDMS and treated the types as if they were static. Indeed, I've never even wished for dynamically typed columns when designing databases in other systems.
So, when is this feature useful? Has anybody found a good use for it in practice that could not have been done just as easily with statically typed columns?
It really just makes types easier to use. You don't need to worry about how big this field needs to be at a database level anymore, or how many digets your intergers can be. More or less it is a 'why not?' thing.
On the other side, static typing in SQL Server allows the system to search and index better, in some cases much better, but for half of all applications, I doubt that database performance improvements would matter, or their performance is 'poor' for other reasons (temp tables being created every select, exponential selects, etc).
I use SqLite all the time for my .NET projects as a client cache because it is just too easy too use. Now if they can only get it to use GUIDs the same as SQL server I would be a happy camper.
Dynamic typing is useful for storing things like configuration settings. Take the Windows Registry, for example. Each key is a lot like having an SQLite table of the form:
CREATE TABLE Settings (Name TEXT PRIMARY KEY, Value);
where Value can be NULL (REG_NONE) or an INTEGER (REG_DWORD/REG_QWORD), TEXT (REG_SZ), or BLOB (REG_BINARY).
Also, I'll have to agree with the Jasons about the usefulness of not enforcing a maximum size for strings. Because much of the time, those limits are purely arbitary, and you can count on someday finding a 32-byte string that needs to be stored in your VARCHAR(30).

NHibernate and string primary keys

We have a legacy database that uses strings as primary keys. I want to implement objects on top of that legacy database to better implement some business logic and provide more functionality to the user.
I have read in places that using strings for primary keys on tables is bad. I'm wondering why this is? Is it because of the case-sensitivity issues? character sets?
... why is this particularly bad for NHibernate?
... and following up on that ... if strings do make bad primary keys, is it worth it to replace the primary keys in the database with ints or GUIDs or the like? (we only have about 25-30 tables involved)
Okay, I will have a stab at this. I will give a couple of quick caveats - I am not an expert on databases and my experience is with Hibernate (Java) rather than NHibernate, but here goes.
I think the issue of primary keys as strings is to do with the SQL data-type that is used to represent them in the database. Because the primary key is used all the time when inserting, querying and so on, the database engine has to spend lots of time comparing primary keys. If you are using numbers, these are simply stored as bytes which computers are really good at doing stuff with quickly. As soon as you start using strings, the cost of these operations (comparisons mainly) goes up significantly. Even if the database engine is using really neat strategies to compare keys, it will still always be faster to compare bytes as bytes rather than strings.
On modern hardware though, this is becoming much less an issue than it used to be, and with indexes the problem almost disappears.
I don't know for sure about why this is really bad in Hibernate (and NHibernate) but in my experience, because my application has a complex graph of objects that often have references to other persisted objects, often as lists or sets, the references are all stored using the ID of the other object, and because of the rules I have in place for cascading saves, fetching and so on, this will mean that the primary keys are being used ALL the time. Hibernate - which I quite like - tends to do exactly what its told to, and sometimes people (especially me!) tell it to do really dumb things. As a result, even seemingly simple updates or queries end up generating quite complex SQL.
So - in summary - strings as primary keys are bad due to cost of simple operations on them and using Hibernate may magnify this. In practice though, modern database engines have lots of neat strategies to ensure that the performance hit is not that bad. (Postgres - and presumably others - by default create indexes for primary keys)
For your follow up - should you replace your keys? Well, that depends on the performance of your application. If performance is critical, then for a high volume and very intensive application it may be a good idea, otherwise there will probably be minimal benefit, with the downside of having to spend time changing all your tables. You could expect to get much better results refining the strategies you are using with NHibernate (ie fetching strategies and when you are cascading saves and so on).
Andy K seems to imply that strings are not stored as bytes. That would be funny! In fact it all depends on how long the string PK is and what collation you use. It might be even faster than bigint or int identity and will almost definitely be faster than Guids. If these strings are something you'd have to search by anyway, then you would need an index (perhaps even clustered index) on them anyway, so why not make them PKs!
Using strings or chars adds a huge amount of accidental complexity to your system. Consider these questions:
how to handle case sensitivity;
how to handle padding. NHibernate lets you insert a shorter string, and the database will silently add padding to it, but it won't be reflected in your persisted entity. Trying to fetch the entity again with the in-memory ID returns null;
how to handle encoding issues. C# uses unicode strings, your database migth not. Can you tell how the conversion will be handled? I don't think so.
synthetic integer keys can be autogenerated by most databases without extra effort. With strings you most probably create them "by hand". Unless you hide them behind a Factory (in the DDD sense), the resulting code will clutter your domain model.
Though the performance overhead mentioned by andy K can diminish because of indexing, still many times you do ID comparisions in-memory (hash-maps?) and the DB optimizations do not apply there.
I have been working on a project with a legacy database having string primary keys and no foreign keys at all. We are not allowed to thouch the old schema because a legacy app depends on every minor aspects of it. I feel that the string primary keys hurt the consistency more than the missing foreign keys, since NHibernate handles the later quite gracefully.