Stop MySQL Reusing AUTO_INCREMENT IDs - sql

I have a table with an AUTO_INCREMENT primary key. If the last row in the table is deleted, the next-inserted row will take the same ID.
Is there a way of getting MySQL to behave like t-SQL, and not reuse the ID? Then if the deleted row is erroneously referenced from something external to the database, no rows will be returned, highlighting the error.

In this case, you probably should not be using AUTO_INCREMENT indices in publicly accessible places.
Either derive a key field from other data, or use a different mechanism to create your id's. One way I've used before, although you need to be aware of the (potentially severe) performance implications, is a "keys" table to track the last-used key, and increment that.
That way, you can use any type of key you want, even non-numeric, and increment them using your own algorithm.
I have used 6-character alpha-numeric keys in the past:
CREATE TABLE `TableKeys` (
`table_name` VARCHAR(8) NOT NULL,
`last_key` VARCHAR(6) NOT NULL,
PRIMARY KEY (`table_name`)
);
SELECT * FROM `TableKeys`;
table_name | last_key
-----------+---------
users | U00003A2
articles | A000166D
products | P000009G

As of MySQL version 8, MySQL no longer re-uses AUTO_INCREMENT ID values, fixing the long-standing (opened in 2003!!) bug #199.
For more info, see this blog post by MySQL Community Manager lefred: https://lefred.be/content/bye-bye-bug-199/

That's not the way our MySQL databases work, when a record is deleted the next inserted has the next number, not the one that was deleted.

As I understand it, there is no way of doing this. You might consider working around it by adding a deleted flag, and then setting the deleted flag instead of removing the row.
The "right" answer is that once a row is deleted, you shouldn't be referencing it. You can add foreign keys to make sure that the db will not allow rows to be deleted that are referenced elsewhere in the db.

Mysql manual says:
In this case (when the AUTO_INCREMENT column is part of a multiple-column index), AUTO_INCREMENT values are reused if you delete the row with the biggest AUTO_INCREMENT value in any group. This happens even for MyISAM tables, for which AUTO_INCREMENT values normally are not reused.
It seems there is such a behavior possible for the engines, other than MyISAM

Related

Appending Rows into an SQLite Database Where Primary Key May Already Exist

I’m trying to merge a few pairs of SQLite3 databases that have the same tables (and schemas). Some of the tables are pretty simple and just have rows of plain data, but some of the tables have primary keys. Some of the keys are unique like a URL (eg url LONGVARCHAR PRIMARY KEY), and some of them are just simple integer indexes, but NOT set to auto-increment (eg id INTEGER PRIMARY KEY).
I’ve found several topics on merging databases (and I had already manually merged one pair of non-primary-key databases without effort), but am concerned about the ones with keys which may already exist in both.
My question is what happens if a row is inserted to a database where a row with the same key already exists? It should overwrite the row that has that key right? I was hoping that it would append them to the table and update the key, but that only works if the key has a numeric component that is set to auto-increment correct?
Can anyone confirm my suppositions—and if possible, offer a suggestion on the easiest way to append such rows?
Thanks a lot.
You should have no problems if you set the primary key in the destination table to auto increment.
Therefore, when you do you bulk insert command or whatever you are using to insert values into your new table, you simply do not supply input for your primary key field and there will NEVER be a duplicate.
Columns:
ID Name
Just don't provide ID field, ie/
INSERT INTO tableName ("Synetech")
The insert would just add this with the next available ID index in the table.
Good Luck!
If you try to INSERT a duplicate primary key, it will give you an error and not allow the insert. SQLite also supports the 'REPLACE INTO' syntax, which will update on a duplicate primary key.
If you want to append on duplicates, you will have to check whether a field with that key already exists, and if so then change the key to some new value. The correct way to do this likely depends on your application. For integer keys you could just take the max+1, but for the url keys it's not clear what the correct behavior should be.

Should i have a primary ID? i am indexing another field

Using sqlite i need a table to hold a blob to store a md5 hash and a 4byte int. I plan to index the int but this value will not be unique.
Do i need a primary key for this table? and is there an issue with indexing a non unique value? (I assume there is not issue or reason for any).
Personally, I like to have a unique primary id on all tables. It makes finding unique records for updating/deleting easier.
How are you going to reference on a SELECT * FROM Table WHERE or an UPDATE ... WHERE? Are you sure you want each one?
You already have one.
SQLite automatically creates an integer ROWID column for every row of every table. This can function as a primary key if you don't declare your own.
In general it's a good idea to declare your own primary key column. In the particular instance you mentioned, ROWID will probably be fine for you.
My advice is to go with primary key if you want to have referential integrity. However there is no issue with indexing a non unique value. The only thing is that your performance will downgrade a little.
What are the consequences of letting two identical rows somehow get into this table?
One consequence is, of course, wasted space. But I'm talking about something more fundamental, here. There are times when duplicate rows in data give you wrong results. For example, if you grouped by the int column (field), and listed the count of rows in each group, a duplicate row (record) might throw you off, depending on what you are really looking for.
Relational databases work better if they are based on relations. Relations are always in first normal form. The primary reason for declaring a primary key is to prevent the table from getting out of first normal form, and thus not representing a relation.

Database Design and the use of non-numeric Primary Keys

I'm currently in the process of designing the database tables for a customer & website management application. My question is in regards to the use of primary keys as functional parts of a table (and not assigning "ID" numbers to every table just because).
For example, here are four related tables from the database so far, one of which uses the traditional primary key number, the others which use unique names as the primary key:
--
-- website
--
CREATE TABLE IF NOT EXISTS `website` (
`name` varchar(126) NOT NULL,
`client_id` int(11) NOT NULL,
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
`notes` text NOT NULL,
`website_status` varchar(26) NOT NULL,
PRIMARY KEY (`name`),
KEY `client_id` (`client_id`),
KEY `website_status` (`website_status`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--
-- website_status
--
CREATE TABLE IF NOT EXISTS `website_status` (
`name` varchar(26) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `website_status` (`name`) VALUES
('demo'),
('disabled'),
('live'),
('purchased'),
('transfered');
--
-- client
--
CREATE TABLE IF NOT EXISTS `client` (
`id` int(11) NOT NULL auto_increment,
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
`client_status` varchar(26) NOT NULL,
`firstname` varchar(26) NOT NULL,
`lastname` varchar(46) NOT NULL,
`address` varchar(78) NOT NULL,
`city` varchar(56) NOT NULL,
`state` varchar(2) NOT NULL,
`zip` int(11) NOT NULL,
`country` varchar(3) NOT NULL,
`phone` text NOT NULL,
`email` varchar(78) NOT NULL,
`notes` text NOT NULL,
PRIMARY KEY (`id`),
KEY `client_status` (`client_status`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
--
-- client_status
---
CREATE TABLE IF NOT EXISTS `client_status` (
`name` varchar(26) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `client_status` (`name`) VALUES
('affiliate'),
('customer'),
('demo'),
('disabled'),
('reseller');
As you can see, 3 of the 4 tables use their 'name' as the primary key. I know that these will always be unique. In 2 of the cases (the *_status tables) I am basically using a dynamic replacement for ENUM, since status options could change in the future, and for the 'website' table, I know that the 'name' of the website will always be unique.
I'm wondering if this is sound logic, getting rid of table ID's when I know the name is always going to be a unique identifier, or a recipe for disaster? I'm not a seasoned DBA so any feedback, critique, etc. would be extremely helpful.
Thanks for taking the time to read this!
There are 2 reasons I would always add an ID number to a lookup / ENUM table:
If you are referencing a single column table with the name then you may be better served by using a constraint
What happens if you wanted to rename one of the client_status entries? e.g. if you wanted to change the name from 'affiliate' to 'affiliate user' you would need to update the client table which should not be necessary. The ID number serves as the reference and the name is the description.
In the website table, if you are confident that the name will be unique then it is fine to use as a primary key. Personally I would still assign a numeric ID as it reduces the space used in foreign key tables and I find it easier to manage.
EDIT:
As stated above, you will run into problems if the website name is renamed. By making this the primary key you will be making it very difficult if not impossible for this to be changed at a later date.
When making natural PRIMARY KEY's, make sure their uniqueness is under your control.
If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as PRIMARY KEY's.
Since website_status and client_status seem to be generated and used by you and only by you, it's acceptable to use them as a PRIMARY KEY, though having a long key may impact performance.
website name seems be under control of the outer world, that's why I'd make it a plain field. What if they want to rename their website?
The counterexamples would be SSN and ZIP codes: it's not you who generates them and there is no guarantee that they won't be ever duplicated.
Kimberly Tripp has an Excellent series of blog articles (GUIDs as PRIMARY KEYs and/or the clustering key and The Clustered Index Debate Continues) on the issue of creating clustered indexes, and choosing the primary key (related issues, but not always exactly the same). Her recommendation is that a clustered index/primary key should be:
Unique (otherwise useless as a key)
Narrow (the key is used in all non-clustered indexes, and in foreign-key relationships)
Static (you don't want to have to change all related records)
Always Increasing (so new records always get added to the end of the table, and don't have to be inserted in the middle)
Using "Name" as your key, while it seems to satisfy #1, doesn't satisfy ANY of the other three.
Even for your "lookup" table, what if your boss decides to change all affiliates to partners instead? You'll have to modify all rows in the database that use this value.
From a performance perspective, I'm probably most concerned that a key be narrow. If your website name is actually a long URL, then that could really bloat the size of any non-clustered indexes, and all tables that use it as a foreign key.
Besides all the other excellent points that have already been made, I would add one more word of caution against using large fields as clustering keys in SQL Server (if you're not using SQL Server, then this probably doesn't apply to you).
I add this because in SQL Server, the primary key on a table by default also is the clustering key (you can change that, if you want to and know about it, but most of the cases, it's not done).
The clustering key that determines the physical ordering of the SQL Server table is also being added to every single non-clustered index on that table. If you have only a few hundred to a few thousand rows and one or two indices, that's not a big deal. But if you have really large tables with millions of rows, and potentially lots of indices to speed up the queries, this will indeed cause a lot of disk space and server memory to be wasted unnecessarily.
E.g. if your table has 10 million rows, 10 non-clustered indices, and your clustering key is 26 bytes instead of 4 (for an INT), then you're wasting 10 mio. by 10 by 22 bytes for a total of 2.2 billion bytes (or 2.2 GBytes approx.) - that's not peanuts anymore!
Again - this only applies to SQL Server, and only if you have really large tables with lots of non-clustered indices on them.
Marc
"If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as PRIMARY KEY's."
If you're absolutely sure you will never ever have uniqueness violation, then don't bother to define the key.
Personally, I think you will run into trouble using this idea. As you end up with more parent child relationships, you end up with a huge amount of work when the names change (As they always will sooner or later). There can be a big performance hit when having to update a child table that has thousands of rows when the name of the website changes. And you have to plan for how do make sure that those changes happen. Otherwise, the website name changes (oops we let the name expire and someone else bought it.) either break because of the foreign key constraint or you need to put in an automated way (cascade update) to propagate the change through the system. If you use cascading updates, then you can suddenly bring your system to a dead halt while a large chage is processed. This is not considered to be a good thing. It really is more effective and efficient to use ids for relationships and then put unique indexes on the name field to ensure they stay unique. Database design needs to consider maintenance of the data integrity and how that will affect performance.
Another thing to consider is that websitenames tend to be longer than a few characters. This means the performance difference between using an id field for joins and the name for joins could be quite significant. You have to think of these things at the design phase as it is too late to change to an ID when you have a production system with millions of records that is timing out and the fix is to completely restructure the databse and rewrite all of the SQL code. Not something you can fix in fifteen minutes to get the site working again.
This just seems like a really bad idea. What if you need to change the value of the enum? The idea is to make it a relational database and not a set of flat files. At this point, why have the client_status table? Moreover, if you are using the data in an application, by using a type like a GUID or INT, you can validate the type and avoid bad data (in so far as validating the type). Thus, it is another of many lines to deter hacking.
I would argue that a database that is resistant to corruption, even if it runs a little slower, is better than one that isn’t.
In general, surrogate keys (such as arbitrary numeric identifiers) undermine the integrity of the database. Primary keys are the main way of identifying rows in the database; if the primary key values are not meaningful, the constraint is not meaningful. Any foreign keys that refer to surrogate primary keys are therefore also suspect. Whenever you have to retrieve, update or delete individual rows (and be guaranteed of affecting only one), the primary key (or another candidate key) is what you must use; having to work out what a surrogate key value is when there is a meaningful alternative key is a redundant and potentially dangerous step for users and applications.
Even if it means using a composite key to ensure uniqueness, I would advocate using a meaningful, natural set of attributes as the primary key, whenever possible. If you need to record the attributes anyway, why add another one? That said, surrogate keys are fine when there is no natural, stable, concise, guaranteed-to-be-unique key (e.g. for people).
You could also consider using index key compression, if your DBMS supports it. This can be very effective, especially for indexes on composite keys (think trie data structures), and especially if the least selective attributes can appear first in the index.
I think I am in agreement with cheduardo. It has been 25 years since I took a course in database design but I recall being told that database engines can more efficiently manage and load indexes that use character keys. The comments about the database having to update thousands of records when a key is changed and on all of the added space being taken up by the longer keys and then having to be transferred across systems, assumes that the key is actually stored in the records and that it does not have to be transferred across systems anyway. If you create an index on a column(s) of a table, I do not think the value is stored in the records of the table (unless you set some option to do so).
If you have a natural key for a table, even if it is changed occassionally, creating another key creates a redundancy that could result in data integrity issues and actually creates even more information that needs to be stored and transferred across systems. I work for a team that decided to store the local application settings in the database. They have an identity column for each setting, a section name, a key name, and a key value. They have a stored procedure (another holy war) to save a setting that ensures it does not appear twice. I have yet to find a case where I would use a setting's ID. I have, however, ended up with multiple records with the same section and key name that caused my application to fail. And yes, I know that could have been avoided by defining a constraint on the columns.
Here few points should be considered before deciding keys in table
Numeric key is more suitable when you
use references ( foreign keys), since
you not using foreign keys, it ok in
your case to use non numeric key.
Non-numeric key uses more space than
numeric keys, can decrease
performance.
Numeric keys make db look simpler to
understand ( you can easily know no
of rows just by looking at last row)
You NEVER know when the company you work for suddenly explodes in growth and you have to hire 5 developers overnight. Your best bet is to use numeric (integer) primary keys because they will be much easier for the entire team to work with AND will help your performance if and when the database grows. If you have to break records out and partition them, you might want to use the primary key. If you are adding records with a datetime stamp (as every table should), and there is an error somewhere in the code that updates that field incorrectly, the only way to confirm if the record was entered in the proper sequence it to check the primary keys. There are probably 10 more TSQL or debugging reasons to use INT primary keys, not the least of which is writing a simple query to select the last 5 records entered into the table.

Can you use auto-increment in MySql with out it being the primary Key

I am using GUIDs as my primary key for all my other tables, but I have a requirement that needs to have an incrementing number. I tried to create a field in the table with the auto increment but MySql complained that it needed to be the primary key.
My application uses MySql 5, nhibernate as the ORM.
Possible solutions I have thought of are:
change the primary key to the auto-increment field but still have the Id as a GUID so the rest of my app is consistent.
create a composite key with both the GUID and the auto-increment field.
My thoughts at the moment are leaning towards the composite key idea.
EDIT: The Row ID (Primary Key) is the GUID currently. I would like to add an an INT Field that is Auto Incremented so that it is human readable. I just didn't want to move away from current standard in the app of having GUID's as primary-keys.
A GUID value is intended to be unique across tables and even databases so, make the auto_increment column primary index and make a UNIQUE index for the GUID
I would lean the other way.
Why? Because creating a composite key gives the impression to the next guy who comes along that it's OK to have the same GUID in the table twice but with different sequence numbers.
A couple of thoughts:
If your GUID is auntoincremental and unique, why not let it be the actual Primary Key?
On the other hand, you should never take semantical decisions based on programmatic problems: you have a problem with MySQL, not with the design of your DB.
So, a couple of workarounds here:
Creating a trigger that would set the GUID to the proper value once it's inserted. That's a MySQL solution to a MySQL problem, without altering semantics for your schema.
Before inserting, start a transaction (make sure auto commit is set to false), find out the latest GUID, increment and insert with the new value. In other words, auto-increment not automatically :P
GUID's are not intended to be orderable, that's why AUTO_INCREMENT for them does not make sense.
You may, though, use an AUTO_INCREMENT for a second column of a composite primary key in MyISAM tables. You can create a composite key over (GUID, INT) column and make the second column to be AUTO_INCREMENT.
To generate a new GUID, just call UUID() in an INSERT statement or in a trigger.
No, only the primary key can have auto_increment as its value.
If, for some reason, you can't change the identity column to be a primary key, what about manually generating the auto-increment via some kind of SEQUENCE table plus a trigger to query the SEQUENCE table and save the next value to use. Then assign the value to the destination table in the trigger. Same effect. The only question I would have is whether the auto-incremented value is going to make it back thru NHibernate without a re-select of the table.

Any way to enforce numeric primary key size limit in sql?

I'd like to create a table which has an integer primary key limited between 000 and 999. Is there any way to enforce this 3 digit limit within the sql?
I'm using sqlite3.
Thanks.
SQLite supports two ways of doing this:
Define a CHECK constraint on the primary key column:
CREATE TABLE mytable (
mytable_id INT PRIMARY KEY CHECK (mytable_id BETWEEN 0 and 999)
);
Create a trigger on the table that aborts any INSERT or UPDATE that attempts to set the primary key column to a value you don't want.
CREATE TRIGGER mytable_pk_enforcement
BEFORE INSERT ON mytable
FOR EACH ROW
WHEN mytable_id NOT BETWEEN 0 AND 999
BEGIN
RAISE(ABORT, 'primary key out of range');
END
If you use an auto-assigned primary key, as shown above, you may need to run the trigger AFTER INSERT instead of before insert. The primary key value may not be generated yet at the time the BEFORE trigger executes.
You may also need to write a trigger on UPDATE to prevent people from changing the value outside the range. Basically, the CHECK constraint is preferable if you use SQLite 3.3 or later.
note: I have not tested the code above.
You may be able to do so using a CHECK constraint.
But,
CHECK constraints are supported as of version 3.3.0. Prior to version 3.3.0, CHECK constraints were parsed but not enforced.
(from here)
So unless SQLite 3 = SQLite 3.3 this probably won't work
jmisso, I would not recommend reusing primary keys that have been deleted. You can create data integrity problems that way if all other tables that might have that key in them were not deleted first (one reason to always enforce setting up foreign key relationships in a database to prevent orphaned data like this). Do not do this unless you are positive that you have no orphaned data that might get attached to the new record.
Why would you even want to limit the primary key to 1000 possible values? What happens when you need 1500 records in the table? This doesn't strike me as a very good thing to even be trying to do.
What about pre-populating the table with the 1000 rows at the start. Toggle the available rows with some kind of 1/0 column like Is_Available or similar. Then don't allow inserts or deletes, only updates. Under this scenario your app only has to be coded for updates.