Database Insert Mechanism

Database Insert Mechanism - sql

I have a question in mind about the insert mechanism in different databases. Supposing a table with a single column primary key that is automatically generated (like identity columns), will the entire table become locked when inserting a new record? and if the insert takes too much time, will the other transactions have to wait more?

By default Oracle uses row level locks.
These locks are blocking only for writers(update, delete, insert etc). That means select will works all the time when a table is heavy updated, delete from, etc.
For example, let be tableA(col1 number, col2 number), with this data within it:
col1 | col2
1 | 10
2 | 20
3 | 30
If user John issues at time1:
update tableA set col2=11 where col1=1;
will lock row1.
At time2 user Mark issue an
update tableA set col2=22 where col1=2;
the update will work, because the row 2 is not locked.
Now the table looks in database:
col1 | col2
1 | 11 --locked by john
2 | 22 --locked by mark
3 | 30
For Mark table is(he does not see the changes uncommited)
col1 | col2
1 | 10
2 | 22
3 | 30
For John table is:(he does not see the changes uncommited)
col1 | col2
1 | 11
2 | 20
3 | 30
If mark tries at time3:
update tableA set col2=12 where col1=1;
his session will hang until time4 when John will issue an commit.(Rollback will also unlock the rows, but changes will be lost)
table is(in db, at time4):
col1 | col2
1 | 11
2 | 22 --locked by mark
3 | 30
Immediatley, after John's commit, the row1 is unlocked and marks's update will do the job:
col1 | col2
1 | 12 --locked by mark
2 | 22 --locked by mark
3 | 30
lets's mark issue a rollbak at time5:
col1 | col2
1 | 11
2 | 20
3 | 30
The insert case is simpler, because inserted rows are locked, but also are not seen by other users because they are not commited. When the user commits, he also releases the locks, so, other users can view these rows, update them, or delete them.
EDIT: As Jeffrey Kemp explained, when you have PK(it is implemented in Oracle with an unique index), if the users try to insert the same value(so, we would have a duplicate), the locking will happen in the index. The second session will be blocked until the first session ends because it try to write in the same place. If the first session commits, the second will throw Primary key violated exception and will fail to change the database. If first session does a rollback, the second will succeed(if no other problem appears).
(NB: In this explanation by user John I mean a session started by user John.)

Inserting will not lock the table. The inserted records will not be visible to other sessions until you commit.

Your question is relevant to any case where you are inserting into a table with any unique constraint. If there was no index, and you insert a row into the table, you'd expect the database would need to lock the entire table - otherwise duplicates might be inserted in a multi-user system.
However, Oracle always polices unique constraints with an index. This means that the data for the column is always sorted, and it can quickly and easily determine whether a conflicting row already exists. To protect against multiple sessions trying to insert the same value at the same time, Oracle will just lock the block in the index for that value - in this way, you won't get contention for the whole table, only for the particular value you're inserting. And since an index lookup is typically very fast, the lock will only need to be held for a very small period of time.
(But now, you might ask, what if a session inserts a value but doesn't commit straight away? What if another session tries to insert the same value? The answer is, the second session will wait. This is because it will request a lock on the same index block, but since the first session hasn't committed yet, the block will still be locked. It must wait because it cannot know if the first session will commit or rollback.)

Related

NHibernate unique constraint remove if duplicate

Is there a way to tell nHibernate to remove the duplicate value on a row's column that is uniquely constrained when updating a row with a duplicate value.
For example (OtherId and Animal is compositely unique constrained)
Id | OtherId | Animal
------------
1 | 1 | Dog
2 | 1 | Cat
3 | 1 | Bear
4 | 2 | Dog
Updating Id 3 to Dog, should result in this
Id | OtherId | Animal
1 | 1 | NULL
2 | 1 | Cat
3 | 1 | Dog
4 | 2 | Dog
EDIT:
I was able to solve my problem by creating an unique index in my table
CREATE UNIQUE INDEX [Id_OtherId_Animal_Index]
ON [dbo].[Animals] (OtherId, Animal)
WHERE Animal IS NOT NULL;
This way, I prevent insertion of duplicate (1, Dog) and still allow (2, Dog). This will also allow multiple (1, NULL) to be inserted.
Next, based on Frédéric's suggestion below, I edited my service layer to check BEFORE insertion if it will be a duplicate. If it will, then NULL the animal column of which would be uniquely constrained.

This answer has been outdated by substantial changes in OP question
I am quite sure there is no such feature in NHibernate, or any other ORM.
By the way, what should yield updating Id 3 to Cat after having updated it to Dog?
Id | Animal
1 |
2 |
3 | Cat
If that means that Id 1&2 now have the empty string value, that will be an unique constraint violation too.
If they have the null value, it depends then on the db engine being ANSI null compliant or not (null not considered equal to null). This is not the case of SQL Server, any version I know of, at least for the case of unique indexes. (I have not tested the unique constraint case.)
Anyway, this kind of logic, updating a row resulting in an additional update on some other rows, has to be handled explicitly.
You have many options for that:
Before each assignment to the Animal property, query the db for finding if another one has that name and take appropriate action on that another one. Take care of flushing right after having handling this other one, for ensuring it get handled prior to the actual update of the first one.
Or inject an event or an interceptor in NHibernate for catching any update on any entities, and add there your check for duplicates. Stack Overflow has examples of NHibernate events or interceptors, like this one.
But your case will probably bit a bit tough, since flushing some other changes while already flushing a session will probably cause troubles. You may have to directly tinker with the sql statement with IInterceptor.OnPrepareStatement by example, for injecting your other update first in it.
Or handle that with some trigger in DB.
Or detect a failed flush due to an unique constraint, analyze it and take appropriate action.
The third option is very likely easier and more robust than the others.

Why in PostgreSQL when you delete a row in a table the id number of the future inserted row is not sequential?

I create the following table
CREATE TABLE dogs (
id serial,
name VARCHAR(15),
age integer;
I have the table looking like this
Table "public.birds"
Column | Type | Modifiers
---------+-----------------------+-------------------------------------
id | integer | not null default nextval('birds_id_seq'::regclass)
name | character varying(25) |
age | integer |
I insert two rows
INSERT INTO dogs (name, age)
VALUES ('puffy', 13),
('fluffy', 15);
The table now looks like this
id | name | age
----+--------+-----
1 | puffy | 13
2 | fluffy | 15
(2 rows)
Then I delete the row with id = 2
DELETE FROM dogs WHERE id = 2;
And add another row instead
INSERT INTO dogs (name, age) VALUES('mimi', 20);
The table is
id | name | age
----+-------+-----
1 | puffy | 13
3 | mimi | 20
(2 rows)
My question is why the next number for id in the second row is not 2 but 3? I guess that somewhere underneath something stores the last value in a memory and it doesn't matter that the row with that id was deleted. I know I can insert value for id explicitly if I need to. But I would like to be clear why it happens in this case.
And what functionality or feature is responsible for that? How does it work?

PostgreSQL makes no effort to keep track of deleted sequence IDs. It just uses a counter to get the next ID to generate.
Gaps will also appear if you generate values then ROLLBACK a transaction, the client connection crashes before committing, or the server crashes.
The only property you can rely on from generated IDs is uniqueness. You cannot even rely on them appearing in the table in the same order they're generated, since commit order isn't necessarily the same as ID allocation order.
If you need gapless sequences there are ways to implement them, but they have terrible performance in concurrent write loads. That's why PostgreSQL does things the way it does.
For more info, Google "gapless sequence postgresql" and read the documentation chapter on sequences and the "nextval" function.

Strategy to display this metadata table in a more readable format?

I have a table which is basically a 'metadata table'; it records changes done on the data of another table, with the added caveat of due to a design flaw, we can't do SQL update operations and have to delete then re-insert rows when just 1 field on the row is changed, and only each field is recorded per row. For example:
Table Customers:
Customer ID | Customer Name | Customer Address |
001 | John F | 213 Privet Drive
002 | Kyle A | 16 Gammon Road
Table Customers-History:
TIMESTAMP | OPERATION | FIELD NAME | FIELD VALUE
1-Dec-2010 19:54:1232| INSERT | CUSTOMER ID | 001
1-Dec-2010 19:54:1232| INSERT | CUSTOMER NAME | Kyle A
1-Dec-2010 19:54:1500| INSERT | CUSTOMER ADDRESS | 10 Gammon Road
2-Dec-2010 09:54:9432| DElETE | CUSTOMER ID | 001
2-Dec-2010 09:54:9500| DELETE | CUSTOMER NAME | Kyle A
2-Dec-2010 09:54:9600| DELETE | CUSTOMER ADDRESS | 10 Gammon Road
2-Dec-2010 09:54:9800| INSERT | CUSTOMER ID | 001
2-Dec-2010 09:54:9900| INSERT | CUSTOMER NAME | Kyle A
2-Dec-2010 09:54:9600| INSERT | CUSTOMER ADDRESS | 16 Gammon Road
2-Dec-2010 09:55:9921| DELETE | CUSTOMER NAME | Josh C
2-Dec-2010 09:55:9925| DELETE | CUSTOMER ADDRESS| 2 Agin Court
So from the example above, we see that a customer called Kyle A who lives in 10 Gammon Road has been inserted, then the next day the address was updated to 16 Gammon Road. A while later, customer Josh C is deleted. You will notice that whilst only the customer address was edited, as the whole row was removed the customer name was also registered as removed and re-inserted. Therefore the name field looks like it was updated, but it actually was not - it was part of the edit on customer address.
I want to group the delete-insert operations up as update based on timestamp of operation and field name, show the user that it was actually an update operation and maybe only show the fields which were updated as well - in this case, hide results from customer address.
My question is, is this even possible on the SQL level (as it returns results the quickest)? If not, what are some strategies I could explore on my SQL query to return the smallest relevant results as possible, which could then be passed on to another component to process?

If I understood correctly, an INSERT or DELETE is considered to be an UPDATE if and only if:
INSERT as UPDATE - The record was INSERTed after it was DELETEed
DELETE as UPDATE - The record was DELETEDed before it was INSERTed
Under all other circumstances, you would want to present the
operation as either a standalone INSERT or DELETE whichever the case
While [3] is straightforward, [1] and [2] pose a problem, as you would need a key to identify if the same record got inserted after it was deleted so that you can mark that as an update. The same part here can only be ensured if there is a key which may not change as part of the INSERT or DELETE. Due to the nature of this problem, I think the key has to be NATURAL instead of SURROGATE.
The following strategy can be used in such a scenario:
Step 1: Query all PK rows with DELETE PRECEDING or INSERT FOLLOWING flag data
Step 2: Mark all INSERT or DELETE records as an UPDATE that satisfies our criteria
Step 3: Flat out UPDATE records that are currently 2 records [due to Step 2] into a single record
Step 4: Use UNION to retrieve the other fields (Customer Address in your example) based on the timestamp of the operation in the resultset of Step 3.
Assuming Customer Name is that key for lack of a better column in your example, the following pseudo-SQL code outlines the method to solve the core of the problem [Steps 1&2 only. Steps 3,4 should be simple enough and can be added in the outer queries]
SELECT /* 2nd step: Modify Operations to 'Update' if this INSERT id done after a DELETE or DELETE done afteran INSERT */
CASE
WHEN
OPERATION='INSERT' AND IS_DELETE_PRECEDING_FLAG=1
THEN
'UPDATE'
WHEN
OPERATION='DELETE' AND IS_INSERT_FOLLOWING_FLAG=1
THEN
'UPDATE'
ELSE OPERATION
END AS OPERATION,
FIELDNAME,
FIELDVAL,
OPDATE
(
SELECT /* 1st step: Query all PK records and also the flag information commented below */
OPERATION,
FIELDNAME,
FIELDVAL,
OPDATE,
IS_DELETE_PRECEDING_FLAG, --Pseudo Column using Oracle analytics, grouped using the customer name. Is 1 when there is a DELETE preceding, 0 otherwise
IS_INSERT_FOLLOWING_FLAG --Pseudo Column using Oracle analytics, grouped using the customer name. Is 1 when there is a INSERT following, 0 otherwise
FROM
CUST_TAB_HIST
WHERE
FIELDNAME='CUSTOMER NAME' --Assuming this is the primary key
)
;

Insert to particular location in Oracle DB table?

Suppose I have a table containing the following data:
Name | Things
-------------
Foo | 5
Bar | 3
Baz | 8
If I want to insert a row, so that the final state of the table is:
Name | Things
-------------
Foo | 5
Qux | 6
Bar | 3
Baz | 8
Is this possible?
I understand we don't typically rely on the order of rows in a table, but I've inherited some code that does precisely that. If I can insert to a location, I can avoid a significant refactor.

As you say, you can't rely on the order of rows in a table (without an ORDER BY).
I would probably refactor the code anyway - there's a chance it will break with no warning at some point in the future - surely better to deal with it now under controlled circumstances?

I would add a FLOAT column to the table and if you wanted to insert a row between the rows whose value in that column was 7.00000 and 8.000000 respectively, your new row would have value 7.50000. If you then wanted to insert a row between 7.00000 and 7.50000 the new row would get 7.25000, and so on. Then, when you order by that column, you get the columns in the desired order. Fairly easy to retrofit. But not as robust as one likes things to be. You should revoke all update/insert permissions from the table and handle I/O via a stored procedure.

Most efficient way of getting the next unused id

(related to Finding the lowest unused unique id in a list and Getting unused unique values on a SQL table)
Suppose I have a table containing on id column and some others (they don't make any difference here):
+-----+-----+
| id |other|
+-----+-----+
The id has numerical increasing value. My goal is to get the lowest unused id and creating that row. So of course for the first time I run it will return 0 and the the row of this row would have been created. After a few executions it will look like this:
+-----+-----+
| id |other|
+-----+-----+
| 0 | ... |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
+-----+-----+
Fairly often some of these rows might get deleted. Let's assume the rows with the id's of 1 and 3 are removed. No the table will look like this:
+-----+-----+
| id |other|
+-----+-----+
| 0 | ... |
| 2 | ... |
| 4 | ... |
+-----+-----+
If I now run again the query it would like to get back the id 1 and this row should be created:
| id |other|
+-----+-----+
| 0 | ... |
| 1 | ... |
| 2 | ... |
| 4 | ... |
+-----+-----+
The next times the query runs it should return the id's 3, 5, 6, etc.
What's the most effective way to run those kinds of query as I need to execute them fairly often in a second (it is fair to assume that the the id's are the only purpose of the table)? Is it possible to get the next unused row with one query? Or is it easier and faster by introducing another table which keeps track of the unused id's?
If it is significantly faster it is also possible to get a way to reuse any hole in the table provided that all numbers get reused at some time.
Bonus question: I plan to use SQLite for this kind of storing information as I don't need a database except for storing these id's. Is there any other free (as in speech) server which can do this job significantly faster?

I think I'd create a trigger on delete, and insert the old.id in a separate table.
Then you can select min(id) from that table to get the lowest id.
disclaimer: i don't know what database engine you use, so i don't know if triggers are available to you.

Like Dennis Haarbrink said; a trigger on delete and another on insert :
The trigger on delete would take the deleted id and insert it in a id pool table (only one column id)
The trigger on before insert would check if an id value is provided, otherwise it just query the id pool table (ex: SELECT MIN(id) FROM id_pool_table) and assign it (i.g. deletes it from the id_pool_table)

Normally you'd let the database handle assigning the ids. Is there a particular reason you need to have the id's sequential rather than unique? Can you, instead, timestamp them, and just number them when you display them? Or make a separate column for the sequential id, and renumber them?
Alternatively, you could not delete the rows themselves, but rather, mark them as deleted with a flag in a column, and then re-use the id's of the marked rows by finding the lowest numbered 'deleted' row, and reusing that id.

The database doesn't care if the values are sequential, only that they are unique. The desire to have your id values sequential is purely cosmetic, and if you are exposing this value to users -- it should not be your primary key, nor should there be any referential integrity based on the value because a client could change the format if desired.
The fastest and safest way to deal with the id value generation is to rely on native functionality that gives you a unique integer value (IE: SQLite's autoincrement). Using triggers only adds overhead, using MAX(id) +1 is extremely risky...
Summary
Ideally, use the native unique integer generator (SQLite/MySQL auto_increment, Oracle/PostgreSQL sequences, SQL Server IDENTITY) for the primary key. If you want a value that is always sequential, add an additional column to store that sequential value & maintain it as necessary. MySQL/SQLite/SQL Server unique integer generation only allows one per column - sequences are more flexible.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas