How do I create a unique public id column which is not primary key - sql

I read somewhere that it is bad to use your db table's primary key as a public identifier online. However, I would like my users to link to a specific object in the table.
How do I create a unique identifier column to my table that is non-related to the primary key (which is a auto-increment integer)?
My initial idea is to use a php script to generate random hexadecimal values of suitable length (there will be about 100 000-200 000 items i the table at most I think) and then inserting them. But then I don't know if it would be unique...

You can use a GUID (Globally Unique IDentifier) to uniquely identify a record. The number of possible GUIDs is so high the chances of duplicating one is next to nothing. Similarly, the chances of someone guessing the GUID is so low that generally they are safe to display to the user (for example www.yoursite.com?id=21EC20203AEA1069A2DD08002B30309D).
If you're using php you can use the com_create_guid method. *Note: This method is only supported in PHP5. For PHP4, look at uniqueid.

Related

Behavior of a SORT without BY on standard internal tables? Is it safe?

What exactly does the SORT statement without key specification do when run on a standard internal table? As per the documentation:
If no explicit sort key is entered using the addition BY, the internal table itab is sorted by the primary table key. The priority of the sort is based on the order in which the key fields are specified in the table definition. In standard keys, the sort is prioritized according to the order of the key fields in the row type of the table. If the primary table key of a standard table is empty, no sort takes place. If this is known statically, the syntax check produces a warning.
With the primary table key being defined as:
Each internal table has a primary table key that is either a self-defined key or the standard key. For hashed tables, the primary key is a hash key, for sorted tables, the primary key is a sorted key. Both of these table types are key tables for which key access is optimized and the primary key thus has its own administration. The key fields of these tables are write-protected when you access individual rows. Standard tables also have a primary key, but the corresponding access is not optimized, there is no separate key administration, and the key fields are not write-protected.
And for good measure, the standard key is defined as:
Primary table key of an internal table, whose key fields in a structured row type are all table fields with character-like data types and byte-like data types. If the row type contains substructures, these are broken down into elementary components. The standard key for non-structured row types is the entire table row if the row type itself is not a table type. If there are no corresponding table fields, or the row type itself is a table type, the standard key from standard tables is empty or contains no key fields.
All of which mainly just confuses me as I'm not sure if I can really rely on the basic SORT statement to provide a reliable or safe result. Should I really just avoid it in all situations or does it have a purpose if used properly?
By extension, if I want to run a DELETE ADJACENT DUPLICATES FROM itab COMPARING ALL FIELDS, when would it be safe to do so after a simple SORT itab.? Only if I added a key on all fields? Without an explicit key only if I have an internal table with clike and xsequence columns? If I want to execute that DELETE statement, what is the most optimal SORT statement to run on the internal table?
SORT without BY should be avoided in all situations because it "makes the program difficult to understand and possibly unpredictable" (dixit ABAP documentation). I think that if you don't mention BY, there is a warning by a static check in the Code Inspector. You should use SORT itab BY table_line where table_line is a special name ("pseudo-component") meaning "all fields of the line".
Not your question, but you may also define the internal table with primary and secondary keys, so that you don't need to sort explicitly - DELETE ADJACENT DUPLICATES can be used with any of those keys.
Internal tables can have keys that can be inherited from structures the itab is based on or specified. As the documentation says, sort without by sorts by primary key, and that is safe assuming the internal table is implemented correctly.
I think this feature is designed as a dynamic feature to be used with smart table key design. If done correctly, sort without by can get your program to adapt to table key changes in the future. (so if your key changes, sort with change with it). Problems might arise when key is modified in an odd way.
As rule of a thumb:
The more specific your program code is, the less prone to errors (and safer) it is.
So sort by key_id, key_date will always produce the same sort by those 2 fields.
Dynamic components in an application make it more flexible, but tend to have (often hard to notice) bugs coming out when things they rely on are modified .
So if you take the previous example with 2 key fields, you add 1 in the middle (let's say key_is_active between 2 existing fields), sorting results might change in a way you did not expect.
If you had an algorithm that processes based on date, your algorithm might be broken by that change.
In your particular case with delete adjacent I would follow Sandra Rossi's advice.

Can I set this column as the primary key?

I'm new to SQL Server and would really appreciate it if you could help me out here.
So are a healthcare provider and internally we assign an ID to each patient (for example, 1234). I'm currently constructing another database, and I just wonder can I use our internal IDs as primary key, given they are unique? If so, since I am not going to do any calculation on the primary key, can I set them to string/char datatype for primary key?
In short, yes you can but it is not recommended at all!
To give you some heads up:
Primary keys should never change
You cannot use a natural key or a key form other system
They cannot have any formula
Use short but suitable key type
If you have an external key that you want to use to find some patients, create another column for it and add UNIQUE Constraint to it.
just don't forget to add index for that column
Read this post of mine for more information:
http://pilpag.blogspot.dk/2016/06/relational-database-designsimple-rules.html
The conditions for a primary key are that the key is unique in the table and never NULL.
Your patient id would appear to have these characteristics.
That said, there are good reasons for developing a synthetic primary key (auto-incremented/identity/serial depending on the database). More importantly, the actual patient ID may be sensitive information. For instance, patients might use the id when logging in or it might be printed on invoices.
It might not be a good idea to have sensitive information repeated throughout the database. For this reason, an "internal" id would be used to refer to patients in table and all the sensitive information would be contained in one or a handful of tables.
This would perhaps be more obvious if the "patient id" were a government id ("social security number") or email address.
Yes, but the ID can also be numeric and a primary key - it doesn't have to be a string. As long as the ID is unique, you should be fine.
Yes, you can use your internal IDs if they are unique;PK limit is 900 bytes for char/varchar data types.So if your IDs are int is fine. But if your IDs can change with time or can be reused them for more than one patient I strong recommend not to use them to avoid chaos. I prefer a surrogate key, like an identity
If I understand correctly, you are assigning each patient a number so as to uniquely identify them. So a report would contain the patient number rather than only a patient's name which can be ambiguous. You won't ever change the patient numbers, because then you'd have to change this in all databases and would have to re-print all documents on the patient that are still needed. This makes this number a perfect primary key for a patient table in any of your databases.
You could use a generated technical ID instead as the table's primary key and have the patient number only as another field in the table (which would still have a unique constraint of course, because it is still the business key uniquely identifying a patient). Whether to do this or not is mainly a matter of personal preference and experience. I prefer natural keys over IDs (so I would make the patient number the primary key). This stems from having worked with rather large databases with thousands of tables and much hierarchy where the natural keys proved to result in faster queries, enhanced data consistency and easier maintenance. Others may have different experience, though.
So yes, the patient number seems to be the perfect natural primary key in my opinion.

Confusing t-sql exam answer about sequence or uniqueidentifier

I found a t-sql question and its answer. It is too confusing. I could use a little help.
The question is:
You develop a database application. You create four tables. Each table stores different categories of products. You create a Primary Key field on each table.
You need to ensure that the following requirements are met:
The fields must use the minimum amount of space.
The fields must be an incrementing series of values.
The values must be unique among the four tables.
What should you do?
A. Create a ROWVERSION column.
B. Create a SEQUENCE object that uses the INTEGER data type.
C. Use the INTEGER data type along with IDENTITY
D. Use the UNIQUEIDENTIFIER data type along with NEWSEQUENTIALID()
E. Create a TIMESTAMP column.
The said answer is D. But, I think the more suitable answer is B. Because sequence will use less space than GUID and it satisfies all the requirements.
D is a wrong answer, because NEWSEQUENTIALID doesn't guarantee "an incrementing series of values" (second requirement).
NEWSEQUENTIALID()
Creates a GUID that is greater than any GUID
previously generated by this function on a specified computer since
Windows was started. After restarting Windows, the GUID can start
again from a lower range, but is still globally unique.
I'd say that B (sequence) is the correct answer. At least, you can use a sequence to fulfil all three requirements, if you don't restart/recycle it manually. I think it is the easiest way to meet all three requirements.
Between the choices provided D B is the correct answer, since it meets all requirements:
ROWVERSION is a bad choice for a primary key, as stated in MSDN:
Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column. This property makes a rowversion column a poor candidate for keys, especially primary keys. Any update made to the row changes the rowversion value and, therefore, changes the key value. If the column is in a primary key, the old key value is no longer valid, and foreign keys referencing the old value are no longer valid.
TIMESTAMP is deprecated, as stated in that same page:
The timestamp syntax is deprecated. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
An IDENTITY column does not guarantee uniqueness, unless all it's values are only ever generated automatically (you can use SET IDENTITY_INSERT to insert values manually), nor does it guarantee uniqueness between tables for any value.
A GUID is practically guaranteed to be unique per system, so if a guid is the primary key for all 4 tables it ensures uniqueness for all tables. the one requirement it doesn't fulfill is storage size - It's storage size is quadruple that of int (16 bytes instead of 4).
A SEQUENCE, when is not declared as recycle, guarantee uniqueness, and has the lowest storage size.
The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted.
However,
I would actually probably choose a different option all together - create a base table with a single identity column and link it with a 1:1 relationship with all other categories. then use an instead of insert trigger for all categories tables that will first insert a record to the base table and then use scope_identity() to get the value and insert it as the primary key for the category table.
This will enforce uniqueness as well as make it possible to use a single foreign key reference between the categories and products.
The issue has been discussed extensively in the past, in general:
http://blog.codinghorror.com/primary-keys-ids-versus-guids/
The constraint #3 is why a SEQUENCE could run into issues as there is a higher risk of collision/lowered number of possible rows in each table.

sql primary key auto increment

Is having a primary key that auto increments on each new row necessary? for me this number is getting quite long and I'm not even using it for anything.
I can imagine that with gradual user activity on my site new rows will be added (I am only testing atm with just 2 alfa test users and already the number has auto incremented to over 100), eventually this number could reach silly proportions (example: 10029379000577352881086) and not only slow the site down (effecting user experience) but also could inevitably push my site over its quota (exceeding its allowed size (laymen's))
really is this needed?
If you have some field/column (or combination of columns) which can be a primary key, use that, why use Auto increment. There are school of thoughts which believe using a mix of both. You could search for surrogate keys and you may find this answer interesting Surrogate vs. natural/business keys
For size quota problem, practically I don't think the maximum auto increment value would cause your site to go over data limit. If it is of int type it will take 4 bytes, regardless of the value inside. For SQL server int type could contain values ranging from -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647).
Here is the link for that
You need a way to uniquely identify each record in your table.
If you have that already -- say a user-ID or email-address -- then you don't necessarily need that auto-incrementing field.
Note: If you don't already have a unique constraint on that field, you should add one so that duplicate data cannot be entered into the table.
Warning: If you decide to get rid of it, be sure that no other tables are using it.
can't you user multiple columns to get a composite key instead of that?
just a hint.
You do need a key that identifies every row. But a key doesn't have to be a number that "auto-increments" for every row. The fact that a few people seem to think incrementing numbers are always a good idea for keys is probably a consequence either of carelessness or a lack of appreciation of database fundamentals, sound design and data integrity.
primary key is not always necessary to have for a table . for your question check my answer:
when and when not primary key should use

Is ID column required in SQL?

Traditionally I have always used an ID column in SQL (mostly mysql and postgresql).
However I am wondering if it is really necessary if the rest of the columns in each row make in unique. In my latest project I have the "ID" column set as my primary key, however I never call it or use it in any way, as the data in the row makes it unique and is much more useful for me.
So, if every row in a SQL table is unique, does it need a primary key ID table, and are there ant performance changes with or without one?
Thanks!
EDIT/Additional info:
The specific example that made me ask this question is a table I am using for a many-to-many-to-many-to-many table (if we still call it that at that point) it has 4 columns (plus ID) each of which represents an ID of an external table, and each row will always be numeric and unique. only one of the columns is allowed to be null.
I understand that for normal tables an ID primary key column is a VERY good thing to have. But I get the feeling on this particular table it just wastes space and slows down adding new rows.
If you really do have some pre-existing column in your data set that already does uniquely identify your row - then no, there's no need for an extra ID column. The primary key however must be unique (in ALL circumstances) and cannot be empty (must be NOT NULL).
In my 20+ years of experience in database design, however, this is almost never truly the case. Most "natural" ID's that appear to be unique aren't - ultimately. US Social Security Numbers aren't guaranteed to be unique, and most other "natural" keys end up being almost unique - and that's just not good enough for a database system.
So if you really do have a proper, unique key in your data already - use it! But most of the time, it's easier and more convenient to have just a single surrogate ID that you can guarantee will be unique over all rows.
Don't confuse the logical model with the implementation.
The logical model shows a candidate key (all columns) which could makes your primary key.
Great. However...
In practice, having a multi column primary key has downsides: it's wide, not good when clustered etc. There is plenty of information out there and in the "related" questions list on the right
So, you'd typically
add a surrogate key (ID column)
add a unique constraint to keep the other columns unique
the ID column will be the clustered key (can be only one per table)
You can make either key the primary key now
The main exception is link or many-to-many tables that link 2 ID columns: a surrogate isn't needed (unless you have a braindead ORM)
Edit, a link: "What should I choose for my primary key?"
Edit2
For many-many tables: SQL: Do you need an auto-incremental primary key for Many-Many tables?
Yes, you could have many attributes (values) in a record (row) that you could use to make a record unique. This would be called a composite primary key.
However it will be much slower in general because the construction of the primary index will be much more expensive. The primary index is used by relational database management systems (RDBMS) not only to determine uniqueness, but also in how they order and structure records on disk.
A simple primary key of one incrementing value is generally the most performant and the easiest solution for the RDBMS to manage.
You should have one column in every table that is unique.
EDITED...
This is one of the fundamentals of database table design. It's the row identifier - the identifier identifies which row(s) are being acted upon (updated/deleted etc). Relying on column combinations that are "unique", eg (first_name, last_name, city), as your key can quickly lead to problems when two John Smiths exist, or worse when John Smith moves city and you get a collision.
In most cases, it's best to use a an artificial key that's guaranteed to be unique - like an auto increment integer. That's why they are so popular - they're needed. Commonly, the key column is simply called id, or sometimes <tablename>_id. (I prefer id)
If natural data is available that is unique and present for every row (perhaps retinal scan data for people), you can use that, but all-to-often, such data isn't available for every row.
Ideally, you should have only one unique column. That is, there should only be one key.
Using IDs to key tables means you can change the content as needed without having to repoint things
Ex. if every row points to a unique user, what would happen if he/she changed his name to let say John Blblblbe which had already been in db? And then again, what would happen if you software wants to pick up John Blblblbe's details, whose details would be picked up? the old John's or the one ho has changed his name? Well if answer for bot questions is 'nothing special gonna happen' then, yep, you don't really need "ID" column :]
Important:
Also, having a numeric ID column with numbers is much more faster when you're looking for an exact row even when the table hasn't got any indexing keys or have more than one unique
If you are sure that any other column is going to have unique data for every row and isn't going to have NULL at any time then there is no need of separate ID column to distinguish each row from others, you can make that existing column primary key for your table.
No, single-attribute keys are not essential and nor are surrogate keys. Keys should have as many attributes as are necessary for data integrity: to ensure that uniqueness is maintained, to represent accurately the universe of discourse and to allow users to identify the data of interest to them. If you have already identified a suitable key and if you don't find any real need to create another one then it would make no sense to add redundant attributes and indexes to your table.
An ID can be more meaningful, for an example an employee id can represent from which department he is, year of he join and so on. Apart from that RDBMS supports lots operations with ID's.