I have a database table that has a non null column of type uniqueidentifier. This was put in place for use in the near future. But for now, I need to use some placeholder. Can I simply use:
00000000-0000-0000-0000-000000000000
for all the rows until a real guid is used when new rows are inserted in the future? Does SQL Server enforce uniqueness on this column?
SQL Server will enforce uniqueness, IF and only if you put a unique constraint or unique index on that field. Otherwise, SQL Server will only enforce that the value must be NOT NULL.
As marc_s, says, you can do that because uniqueness is not enforced for uniqueidentifiers even in the same column of a table without an explicit declared unique index/constraint (after all, two rows can legitimately have the same foreign key).
IF this is just a temporary bootstrap, and in the future only real GUIDs (NOT NULL) are going to be allowed, I think this is an OK workaround to avoid generating GUIDs which just need to be replaced later and so not needing to keep a separate partially-initialized flag column or table of the temporary rows so you can fill in the appropriate GUIDs later.
However, from a design point of view, I'm more concerned about the semantics of what this special reserved GUID is, and why allowing a special reserved value is OK, but NULLs are not. Like I said, if it's just temporary and in the steady state, you don't want to ever allow NULLs OR this special reserved 0, that's fine, but if you are going to continue to allow this special reserved GUID in steady state operations, I think that raises design questions.
Is it meant to be a foreign key? If so, NULLs can be used (but a reserved key value like 0 which is not in the referenced table cannot). If it's a loose association, storing the GUID in this table might not be a great design.
Related
What exactly does the SORT statement without key specification do when run on a standard internal table? As per the documentation:
If no explicit sort key is entered using the addition BY, the internal table itab is sorted by the primary table key. The priority of the sort is based on the order in which the key fields are specified in the table definition. In standard keys, the sort is prioritized according to the order of the key fields in the row type of the table. If the primary table key of a standard table is empty, no sort takes place. If this is known statically, the syntax check produces a warning.
With the primary table key being defined as:
Each internal table has a primary table key that is either a self-defined key or the standard key. For hashed tables, the primary key is a hash key, for sorted tables, the primary key is a sorted key. Both of these table types are key tables for which key access is optimized and the primary key thus has its own administration. The key fields of these tables are write-protected when you access individual rows. Standard tables also have a primary key, but the corresponding access is not optimized, there is no separate key administration, and the key fields are not write-protected.
And for good measure, the standard key is defined as:
Primary table key of an internal table, whose key fields in a structured row type are all table fields with character-like data types and byte-like data types. If the row type contains substructures, these are broken down into elementary components. The standard key for non-structured row types is the entire table row if the row type itself is not a table type. If there are no corresponding table fields, or the row type itself is a table type, the standard key from standard tables is empty or contains no key fields.
All of which mainly just confuses me as I'm not sure if I can really rely on the basic SORT statement to provide a reliable or safe result. Should I really just avoid it in all situations or does it have a purpose if used properly?
By extension, if I want to run a DELETE ADJACENT DUPLICATES FROM itab COMPARING ALL FIELDS, when would it be safe to do so after a simple SORT itab.? Only if I added a key on all fields? Without an explicit key only if I have an internal table with clike and xsequence columns? If I want to execute that DELETE statement, what is the most optimal SORT statement to run on the internal table?
SORT without BY should be avoided in all situations because it "makes the program difficult to understand and possibly unpredictable" (dixit ABAP documentation). I think that if you don't mention BY, there is a warning by a static check in the Code Inspector. You should use SORT itab BY table_line where table_line is a special name ("pseudo-component") meaning "all fields of the line".
Not your question, but you may also define the internal table with primary and secondary keys, so that you don't need to sort explicitly - DELETE ADJACENT DUPLICATES can be used with any of those keys.
Internal tables can have keys that can be inherited from structures the itab is based on or specified. As the documentation says, sort without by sorts by primary key, and that is safe assuming the internal table is implemented correctly.
I think this feature is designed as a dynamic feature to be used with smart table key design. If done correctly, sort without by can get your program to adapt to table key changes in the future. (so if your key changes, sort with change with it). Problems might arise when key is modified in an odd way.
As rule of a thumb:
The more specific your program code is, the less prone to errors (and safer) it is.
So sort by key_id, key_date will always produce the same sort by those 2 fields.
Dynamic components in an application make it more flexible, but tend to have (often hard to notice) bugs coming out when things they rely on are modified .
So if you take the previous example with 2 key fields, you add 1 in the middle (let's say key_is_active between 2 existing fields), sorting results might change in a way you did not expect.
If you had an algorithm that processes based on date, your algorithm might be broken by that change.
In your particular case with delete adjacent I would follow Sandra Rossi's advice.
I have 2 tables: Users and Locales.
Locales(id, name, code)
Locales.name is a english name of locale ('Spanish' for example) and Locales.code is a 5 char code of locale ('en_us' for example). Each user should have one locale. I thought about 2 options:
Users table will have LocaleId column as FK to Locales.id.
Users table will have locale (string) column as FK to Locales.code column
Which approach will be better? Note that I will have to search for users in some locale, perform some joins on other tables on locale field etc.
In your table, you have an (I assume) system-generated value in ID. This is your surrogate key. However, you also have a natural key value in Code. Yes, you can define a unique constraint on Code so foreign keys can refer to it. But if you do that, why not just go ahead and make it your primary key in the first place?
There continues to be a very long-standing debate on this issue.
One side comes down on the if-it's-good-in-most-cases-it-must-be-good-in-all-cases side of the issue and demands that all tables have a surrogate key.
I don't agree. Let's see what happens if you chuck the ID field altogether and make Code the PK of the table.
On the plus side:
The key value itself will be meaningful to almost all who see it. So a look at the Users table will be enough to tell what language they prefer. Is there anyone but my Grandmother who doesn't know "en_us" from "en_gb"? So a lot of times, you won't even have to join to the Locales table -- you already have enough information in most cases.
You've eliminated an extra field from your table. Simplification without sacrificing function is always good.
Generating a guaranteed unique value for every insert does require some system overhead. In this case, this will not be significant as once you've populated the Locales table, it should be very stable.
On the minus side:
Joining via a numeric value is slightly more efficient than a character string. The difference is minuscule but there.
How much can you rely on Microsoft to hold the current values steady and make no changes, maybe change "en_gb" to "en_uk" or something like that? Once you've established references to a key value, changing that value can be quite a challenge.
Even if you are certain that existing values won't change, could future values exceed 5 characters? This isn't really an inherent disadvantage of the practice, just your particular implementation. :)
Surrogates (generally integer) tend to be smaller than natural keys (generally strings). While you have one less column in your key table, you may have larger FK columns in referencing tables.
So which is better for your particular situation?
I have no idea.
You and your team will have to weigh the pros and cons to make your own decision.
There are a couple of similar questions already out there and the consensus seemed to be that a primary key should always be created.
But what if you have a single row table for storing settings (and let's not turn this into a discussion about why it might be good/bad to create a single row table please)?
Surely having a primary key on a single row table becomes completely useless?
It may seem completely useless, but it's also completely harmless, and I'd vote for harmless with good design principles vs. useless with no design principles every time.
Other people have commented, rightly, that you don't know how you're going to use the table in a year or five years... what if someone comes along and decides they want to duplicate the configuration -- move it to a distributed environment or add a test environment by using a duplicate configuration string or whatever. Having a field that acts like a primary key means that whenever you query the table, if you use the key, you'll be certain no matter what anyone else may do to your table, that you're getting the correct record.
You're right there are a million other aspects -- surrogate keys vs. intelligent keys, indexing, partitioning (silly on a single row table, I know), whatever... but without getting into that I'd vote add the key rather than not add it. You could have done it by the time you read this thread.
Short answer, no key, duplicate records possible. Your planning a single row now, but what about six months in the future when you single row multiplies. Put a primary key on the table, even for single row.
You could always base your primary key on the name of the setting. Then your table would become a key-value store.
But no, in many RDBMS you are not REQUIRED to have a primary key per table.
Having declared a primary key on a single row table in SQL will ensure that there will be no duplicates. Whether it is useless depends on your requirements. Usually it is a good idea to avoid duplicates.
A Unique Constraint can be created upon a column that can contain NULLs. However, at most, only a single row may ever contain a NULL in that column.
I do not understand why this is the case since, by definition, a NULL is not equal to another NULL (since NULL is really an unknown value and one unknown value does not equal another unknown value).
My questions:
1. Why is this so?
2. Is this specific to MsSQL?
I have a hunch that it is because a Unique Constraint can act as a reference field for a Foreign Key and that the FK would otherwise not know which record in the reference table to which it was refering if more than one record with NULL existed. But, it is just a hunch.
(Yes, I understand that UCs can be across multiple columns, but that doesn't change the question; rather it just complicates it a bit.)
Yes, it's "specific" to Microsoft SQL Server (in that some other database systems have the opposite approach, the one you expected - and the one defined in the ANSI standard, but I believe there are other database systems that are the same as SQL Server).
If you're working on a version of SQL Server that supports filtered indexes, you can apply one of those:
CREATE UNIQUE INDEX IX_T ON [Table] ([Column]) WHERE [Column] IS NOT NULL
(But note that this index cannot be the target of an FK constraint)
The "Why" of it really just comes down to, that's how it was implemented long ago (possibly pre-standards) and it's one of those awkward situations where to change it now could potentially break a lot of existing systems.
Re: Foreign Keys - you would be correct, if it wasn't for the fact that a NULL value in a foreign key column causes the foreign key not to be checked - there's no way (in SQL Server) to use NULL as an actual key.
Yes it's a SQL Server feature (and a feature of a few other DBMSs) that is contrary to the ISO SQL standard. It perhaps doesn't make much sense given the logic applied to nulls in other places in SQL - but then the ISO SQL Standard isn't very consistent about its treatment of nulls either. The behaviour of nullable uniqueness constraints in Standard SQL is not very helpful. Such constraints aren't necessarily "unique" at all because they permit duplicate rows. E.g., the constraint UNIQUE(foo,bar) permits the following rows to exist simultaneously in a table:
foo bar
------ ------
999 NULL
999 NULL
(!)
Avoid nullable uniqueness constraints. It's usually straightforward to move the columns to a new table as non-nullable columns and put the uniqueness constraint there. The information that would have been represented by populating those columns with nulls can (presumably) be represented by simply not populating those columns in the new table at all.
It's been habitual in most of the scenarios while developing a database design we set primary key as integer type for a unique identifier in the Table. Why not use string or float for primary keys? Does this affect the accessibility of values, or in plain words retrieval speed? Are there any specific reasons?
An integer will use less disk space than a string, thus giving you a smaller index file to search through. This is important for large tables where you want to have as much of the index as possible cached in RAM.
Also, they can be autoincremented so you don't need to write your own routines to generate keys.
You often want to have a technical key (also called a surrogate key), a key that is only used to identify the row and not used for anything else. Most data may change sooner or later for reasons you can't control and you don't want to update it everywhere. Even such seemingly static data as a nation-assigned personal id number can change (if you get a new identity) or there may be laws prohibiting their use. A key generated by you, however, is in your own control. For such surrogate keys it's useful to have a small key that is easily generated.
As for "floats as primary keys": Don't do this. A primary key should uniquely identify a row. Floats have no equality relation, which means you cannot safely compare two float values for equality. This is an inherent shortcoming of floating-point values. If you need decimals, use a fixed-point number type instead.
The primary key is supposed to be an index that can provide a unique way to access a specific row in a table. Primary keys can be most data types (in practical applications, float/double won't work too well), and primary keys can also be compound keys (comprised of several columns.)
If you carefully examine the data in the table, you might be able to find a data item that will be unique for every row in the table, thereby eliminating the requirement that you fabricate a key like the autoincrement integer that you find in some schemas.
If you're in a manufacturing environment it might be an alphanumeric field like part number or assembly identifier. Retail or warehousing applications might have a stock number or combination of stock number/shipment/manufacturer.
Generally, If some data in your table is supposed to be a unique identifier it probably will serve well as a primary key for your table.
Using data that exists in the table already completely eliminates the requirement to "make up" a value (such as the autoincrement column) and use it as the primary key. This saves space since it's one less column in the table and one less index on the table.
Yes, in my experience integer keys are almost always faster, since it's more efficient for the database engine to compare integers than comparing strings. Depending on the "uniqueness" of the data (technically called cardinality http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)), the effect of character vs. integer keys is nominal.
Character keys may degrade performance depending on the number of characters that the database needs to compare to determine whether keys are equal or not equal. In the pathological case, imagine a hundred-character field which differ only on the right hand side. One row has 100 A's. We need to compare this to a key with 99 A's and a B as the last character. Conceptually, databases compare character fields just like strcmp() (strncmp() if you prefer) from left to right.
good luck!
The only reason is for performance.
A logical database design should specify which "real" columns are unique, but when the logical design is transformed into a physical design, it is traditional to not use any of these "natural" keys as the primary key; instead, a meaningless integer column is added for this purpose - called a "surrogate key".
Normally the designer will add further unique constraints for the "real" uniqueness business rules as specified in the logical design.
This is because most DBMS's have trouble updating a primary key (e.g. due to performance issues when cascading the update to child tables). Some DBMS's might not be able to support non-integer primary keys at all.
Some side notes:
There's no theoretical reason why
primary keys should be immutable.
This is nothing to do with
normalization, which happens in the
logical model (which should never
have surrogate keys).
Also, note that the idea of a
"primary" key is not a relational
concept - it is simply a way of
denoting the "preferred" uniqueness
constraint, perhaps for relational
integrity - but there's nothing in
the RM that says that you must use
the same key for each child table.
I've created natural keys as "Primary
Keys" in Oracle databases before,
albeit rarely. I've even had them
used for foreign key constraints.
Admittedly, they were either
immutable, or I hand-wrote the
update-cascade code; and I had
trouble with one front-end
application where the PK included a
date column.
Bottom line: there is no theoretical requirement for surrogate keys, but they're much more practical than the alternative.
I suspect that it is because we can auto-increment integer values so it's easy to generate a new unique key for every insert.
Many common ORM (Object Relational Mapping) tools either force to use or at least recommend using integer as primary key.
Integer primary key also saves space compared to string and integer primary key is in some cases also faster. Sequences or auto increment fields make integer primary key generation easy at least if you do not work with distributed databases.
These are some of the main reasons why i think we have integers/ numbers as primary keys.
1.Primary keys should be able to uniquely define your row and should be immutable. One of the problems with using real attributes (name etc..) is that they could change over time. To maintain relational integrity in such a case would be very difficult as this change needs to cascade to all the child records.
2.The size of the table and thereby the index would be smaller in case we use a number as a key for the tab.e
3.Since these are automatically generated using a sequence, we can be sure that the values would be unique under all circumstances.
Check this.
http://forums.oracle.com/forums/thread.jspa?messageID=3916511�