(related to Finding the lowest unused unique id in a list and Getting unused unique values on a SQL table)
Suppose I have a table containing on id column and some others (they don't make any difference here):
+-----+-----+
| id |other|
+-----+-----+
The id has numerical increasing value. My goal is to get the lowest unused id and creating that row. So of course for the first time I run it will return 0 and the the row of this row would have been created. After a few executions it will look like this:
+-----+-----+
| id |other|
+-----+-----+
| 0 | ... |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
+-----+-----+
Fairly often some of these rows might get deleted. Let's assume the rows with the id's of 1 and 3 are removed. No the table will look like this:
+-----+-----+
| id |other|
+-----+-----+
| 0 | ... |
| 2 | ... |
| 4 | ... |
+-----+-----+
If I now run again the query it would like to get back the id 1 and this row should be created:
| id |other|
+-----+-----+
| 0 | ... |
| 1 | ... |
| 2 | ... |
| 4 | ... |
+-----+-----+
The next times the query runs it should return the id's 3, 5, 6, etc.
What's the most effective way to run those kinds of query as I need to execute them fairly often in a second (it is fair to assume that the the id's are the only purpose of the table)? Is it possible to get the next unused row with one query? Or is it easier and faster by introducing another table which keeps track of the unused id's?
If it is significantly faster it is also possible to get a way to reuse any hole in the table provided that all numbers get reused at some time.
Bonus question: I plan to use SQLite for this kind of storing information as I don't need a database except for storing these id's. Is there any other free (as in speech) server which can do this job significantly faster?
I think I'd create a trigger on delete, and insert the old.id in a separate table.
Then you can select min(id) from that table to get the lowest id.
disclaimer: i don't know what database engine you use, so i don't know if triggers are available to you.
Like Dennis Haarbrink said; a trigger on delete and another on insert :
The trigger on delete would take the deleted id and insert it in a id pool table (only one column id)
The trigger on before insert would check if an id value is provided, otherwise it just query the id pool table (ex: SELECT MIN(id) FROM id_pool_table) and assign it (i.g. deletes it from the id_pool_table)
Normally you'd let the database handle assigning the ids. Is there a particular reason you need to have the id's sequential rather than unique? Can you, instead, timestamp them, and just number them when you display them? Or make a separate column for the sequential id, and renumber them?
Alternatively, you could not delete the rows themselves, but rather, mark them as deleted with a flag in a column, and then re-use the id's of the marked rows by finding the lowest numbered 'deleted' row, and reusing that id.
The database doesn't care if the values are sequential, only that they are unique. The desire to have your id values sequential is purely cosmetic, and if you are exposing this value to users -- it should not be your primary key, nor should there be any referential integrity based on the value because a client could change the format if desired.
The fastest and safest way to deal with the id value generation is to rely on native functionality that gives you a unique integer value (IE: SQLite's autoincrement). Using triggers only adds overhead, using MAX(id) +1 is extremely risky...
Summary
Ideally, use the native unique integer generator (SQLite/MySQL auto_increment, Oracle/PostgreSQL sequences, SQL Server IDENTITY) for the primary key. If you want a value that is always sequential, add an additional column to store that sequential value & maintain it as necessary. MySQL/SQLite/SQL Server unique integer generation only allows one per column - sequences are more flexible.
Related
I have a table place2022 which has a very long CHAR column
timestamp | user_id | pixel_color | coordinate
-----------------+------------------------------------------------------------------------------------------+-------------+------------
17:38:20.021+00 | p0sXpmkcmg1KLiCdK5e4xKdudb1f8cjscGs35082sKpGBfQIw92nZ7yGvWbQ/ggB1+kkRBaYu1zy6n16yL/yjA== | #FF4500 | 371,488
17:38:20.024+00 | Ctar52ln5JEpXT+tVVc8BtQwm1tPjRwPZmPvuamzsZDlFDkeo3+ItUW89J1rXDDeho6A4zCob1MKmJrzYAjipg== | #51E9F4 | 457,493
17:38:20.025+00 | rNMF5wpFYT2RAItySLf9IcFZwOhczQhkRhmTD4gv0K78DpieXrVUw8T/MBAZjj2BIS8h5exPISQ4vlyzLzad5w== | #000000 | 65,986
17:38:20.025+00 | u0a7l8hHVvncqYmav27EARAE6ciLtpUTPXMI33lDrUmtj5Ei3ixlfRuG28KUvs7r5LpeiE/iOKPALVjkILhrYg== | #3690EA | 73,961
The user_ids are already hashes, so all I really care about here is having some sort of id column which is 1-1 with the user_id.
I've counted the number of unique user_ids, which is 10381163, which fits into 24 bits. Therefore, I can compress the id field down to a 32-bit integer using the obvious scheme of "Assign 1 to the first new user_id you see, 2 to the second new user_id you see", etc. I don't even care that the user_id's are mapped in the order that they're seen: I just need them to be mapped in an invertible manner to 32-bit ints somehow. I'd also like to persist this mapping somewhere so that, if I want to, I can go backwards.
What would be the best way to achieve this? I imagine that we could create a new table (create table place2022_user_ids as select distinct(user_id) from place2022;?) and then reverse-lookup the user_id column in that table, but I don't know quite how to formulate the queries and also make sure that I'm not doing something ridiculously slow.
I am using postgresql, if it matters.
If you have a recent (>8) version of Postgres you can add an auto increment id column to an existing table.
ALTER TABLE place2022
ADD COLUMN id SERIAL PRIMARY KEY;
NB If the existing column is a PRIMARY KEY you will need to drop it first.
See drop primary key constraint in postgresql by knowing schema and table name only
Is there a way to tell nHibernate to remove the duplicate value on a row's column that is uniquely constrained when updating a row with a duplicate value.
For example (OtherId and Animal is compositely unique constrained)
Id | OtherId | Animal
------------
1 | 1 | Dog
2 | 1 | Cat
3 | 1 | Bear
4 | 2 | Dog
Updating Id 3 to Dog, should result in this
Id | OtherId | Animal
1 | 1 | NULL
2 | 1 | Cat
3 | 1 | Dog
4 | 2 | Dog
EDIT:
I was able to solve my problem by creating an unique index in my table
CREATE UNIQUE INDEX [Id_OtherId_Animal_Index]
ON [dbo].[Animals] (OtherId, Animal)
WHERE Animal IS NOT NULL;
This way, I prevent insertion of duplicate (1, Dog) and still allow (2, Dog). This will also allow multiple (1, NULL) to be inserted.
Next, based on Frédéric's suggestion below, I edited my service layer to check BEFORE insertion if it will be a duplicate. If it will, then NULL the animal column of which would be uniquely constrained.
This answer has been outdated by substantial changes in OP question
I am quite sure there is no such feature in NHibernate, or any other ORM.
By the way, what should yield updating Id 3 to Cat after having updated it to Dog?
Id | Animal
1 |
2 |
3 | Cat
If that means that Id 1&2 now have the empty string value, that will be an unique constraint violation too.
If they have the null value, it depends then on the db engine being ANSI null compliant or not (null not considered equal to null). This is not the case of SQL Server, any version I know of, at least for the case of unique indexes. (I have not tested the unique constraint case.)
Anyway, this kind of logic, updating a row resulting in an additional update on some other rows, has to be handled explicitly.
You have many options for that:
Before each assignment to the Animal property, query the db for finding if another one has that name and take appropriate action on that another one. Take care of flushing right after having handling this other one, for ensuring it get handled prior to the actual update of the first one.
Or inject an event or an interceptor in NHibernate for catching any update on any entities, and add there your check for duplicates. Stack Overflow has examples of NHibernate events or interceptors, like this one.
But your case will probably bit a bit tough, since flushing some other changes while already flushing a session will probably cause troubles. You may have to directly tinker with the sql statement with IInterceptor.OnPrepareStatement by example, for injecting your other update first in it.
Or handle that with some trigger in DB.
Or detect a failed flush due to an unique constraint, analyze it and take appropriate action.
The third option is very likely easier and more robust than the others.
I create the following table
CREATE TABLE dogs (
id serial,
name VARCHAR(15),
age integer;
I have the table looking like this
Table "public.birds"
Column | Type | Modifiers
---------+-----------------------+-------------------------------------
id | integer | not null default nextval('birds_id_seq'::regclass)
name | character varying(25) |
age | integer |
I insert two rows
INSERT INTO dogs (name, age)
VALUES ('puffy', 13),
('fluffy', 15);
The table now looks like this
id | name | age
----+--------+-----
1 | puffy | 13
2 | fluffy | 15
(2 rows)
Then I delete the row with id = 2
DELETE FROM dogs WHERE id = 2;
And add another row instead
INSERT INTO dogs (name, age) VALUES('mimi', 20);
The table is
id | name | age
----+-------+-----
1 | puffy | 13
3 | mimi | 20
(2 rows)
My question is why the next number for id in the second row is not 2 but 3? I guess that somewhere underneath something stores the last value in a memory and it doesn't matter that the row with that id was deleted. I know I can insert value for id explicitly if I need to. But I would like to be clear why it happens in this case.
And what functionality or feature is responsible for that? How does it work?
PostgreSQL makes no effort to keep track of deleted sequence IDs. It just uses a counter to get the next ID to generate.
Gaps will also appear if you generate values then ROLLBACK a transaction, the client connection crashes before committing, or the server crashes.
The only property you can rely on from generated IDs is uniqueness. You cannot even rely on them appearing in the table in the same order they're generated, since commit order isn't necessarily the same as ID allocation order.
If you need gapless sequences there are ways to implement them, but they have terrible performance in concurrent write loads. That's why PostgreSQL does things the way it does.
For more info, Google "gapless sequence postgresql" and read the documentation chapter on sequences and the "nextval" function.
I'm designing a db in PostgreSQL that primarily stores info about different people. I'd like to associate a log with each person, consisting of the date and a text entry. Logs can have arbitrary numbers of entries. Here's the ideas I've toyed with:
What I think I want is a log_table like this:
person_id | row_num | row_date | row_text
-----------------------------------------
1 | 1 | 01/01/12 | Blah...
2 | 1 | 01/02/12 | Foo...
1 | 2 | 01/04/12 | Bar...
But I don't know how to get row_num to increment properly; it should default to one more than the largest current row_num for that person_id. In other words, the row_nums for a given person_id should be sequential.
Or I can just have row_num increment regardless of person_id so that every log entry has a distinct row number. But it doesn't seem very satisfying to have person_id 1's log jump from row 1 to row 3, and this could also make errors hard to spot.
My last idea is to include the log directly in the person table, by making a composite type log_entry = (date, text). Then a column log in the person table can store an array:
person_id | name | log
----------------------
1 | Bob | {(01/01/12, Blah...), (01/04/12, Bar...)}
But this seems cumbersome.
So my questions are, a) which solution if any is good design; b) any way to solve the auto-incrementing problem for solution 1? If it matters, this is a small db for personal use; I want good structure but it's highly likely I'll be the only user. Thanks so much for any help!
Why don't you use a timestamp to store the time when the row has been inserted?
That way you don't need the extra row_num column in the table, and you can always "calculate" it on the fly:
SELECT person_id,
row_number() over (partition by person_id order by row_timestamp) as row_num,
row_timestamp,
row_text
FROM log_table
Of course if there are chances that a user generates more than one entry per micro second that you might wind up with log entries with exactly the same timestamp.
But even in a busy system this is quite unlikely (but not impossible).
If you can't (or don't want to to) use a timestamp, you can always use a sequence that increments for all users and then use the row_number() function to generate a gapless row number during retrieval (as shown above, just use an order by on the column populated by the sequence).
Suppose I have a table containing the following data:
Name | Things
-------------
Foo | 5
Bar | 3
Baz | 8
If I want to insert a row, so that the final state of the table is:
Name | Things
-------------
Foo | 5
Qux | 6
Bar | 3
Baz | 8
Is this possible?
I understand we don't typically rely on the order of rows in a table, but I've inherited some code that does precisely that. If I can insert to a location, I can avoid a significant refactor.
As you say, you can't rely on the order of rows in a table (without an ORDER BY).
I would probably refactor the code anyway - there's a chance it will break with no warning at some point in the future - surely better to deal with it now under controlled circumstances?
I would add a FLOAT column to the table and if you wanted to insert a row between the rows whose value in that column was 7.00000 and 8.000000 respectively, your new row would have value 7.50000. If you then wanted to insert a row between 7.00000 and 7.50000 the new row would get 7.25000, and so on. Then, when you order by that column, you get the columns in the desired order. Fairly easy to retrofit. But not as robust as one likes things to be. You should revoke all update/insert permissions from the table and handle I/O via a stored procedure.