How to prevent concurrent inserts in table - sql

I have a database where the users booking classes.
There is a table Bookings where lets say we want to have ony 5 rows for 5 students.
When the student trying to book the class, i am checking first how many rows are in the table and if are less than 5, i do the Insert.
The problem is that when there are concurrent bookings in the same second of the time, i have more than 5 records in the table.
In every Insert i check first the number of the rows, but when are in the same time, the return number is the same and its not increasing.
How to avoid these concurrent inserts and keep the table rows to 5.

This sounds like the job for a TRIGGER!
create trigger LimitTable
on YourTableToLimit
after insert
as
declare #tableCount int
select #tableCount = Count(*)
from YourTableToLimit
if #tableCount > 5
begin
rollback
end
go
To be more clear, and you probably already know this... inserts are never concurrent. The concurrency happens from the calling code.
I might suggest that you have the wrong data structure if you need something like this though. I personally dislike relying on triggers like this.
Without knowing the full use case, it'd be hard to really offer a full solution though.

you can use unique constraint on (student_id, class_id, seat_number) and set seat_number as 1,2,3,4,5 (result from count function). But on delete you must update seat numbers for all bookings in class.
you can use queue to prevent concurrent inserts

Related

Mass Updating a single column based on ID

I keep track of users and their activity, assigning a numerical value for what they do, storing it in a cache and updating the DB every 2 hours logging their activity.
I usually have about 10000 users during this period, all with different activity points - so for example, I would have to update 10000 rows of activity column in the table users based on column user_id every 2 hours, with something simple like activity = activity + 500 per row.
What would be an effective way to do so? Obviously it would be really slow if I sent a query each time for each user, some methods I researched was using case, but ultimately 10,000 cases would also take really long and would be inefficient as well. I'm sure there's a good method to do so that I haven't seen yet.
You can use a values list in order to create a virtual user-supplied table, and then do an update with a join to that table.
update users set activity=activity+t.y
from (values (1,5),(2,9),(3,19) /*, ...*/ ) t(id,y)
where users.user_id=t.id;
First, do you need this optimization? 10,000 users over 2 hours is only about 2 users per second. Consider instead simply inserting activities into a user_activity table as needed. Inserting rows into a different table, rather than updating users, avoids needing write locks on user rows. Such an "insert-only" table should perform well.
Second, 2 hours between updates seems excessive. The win of caching is to avoid a flurry of update queries per second, but the benefits rapidly drop off. Try 1 minute or even less. This will reduce the size of the update, greatly simplify the update process, and avoid possibly locking a bunch of rows.
If you do need this optimization, you can do it by updating from a temp table.
Make a temp table with user ID and activity count.
copy your cached user IDs and activity counts into the temp table.
update from the temp table.
The update would look something like this...
update users u
set activity = u.activity + tmp.activity
from tmp_user_activity tmp
where tmp.user_id = u.id

Limiting the number of records in a Sqlite DB

What I'm trying to implement here is a condition wherein a sqlite database holds only the most recent 1000 records. I have timestamps with each record.
One of the inefficient logic which strikes right away is to check the total number of records. If they exceed 1000, then simply delete the ones which fall out of the periphery.
However, I would have to do this check with each INSERT which makes things highly inefficient.
What could be a better logic? Can we do something with triggers?
Some related questions which follow the same logic I thought of are posted on SO:-
Delete oldest records from database
SQL Query to delete records older than two years
You can use an implicit "rowid" column for that.
Assuming you don't delete rows manually in different ways:
DELETE FROM yourtable WHERE rowid < (last_row_id - 1000)
You can obtain last rowid using API function or as max(rowid)
If you don't need to have exactly 1000 records (e.g. just want to cleanup old records), it is not necessary to do it on each insert. Add some counter in your program and execute cleanup f.i. once every 100 inserts.
UPDATE:
Anyway, you pay performance either on each insert or on each select. So the choice depends on what you have more: INSERTs or SELECTs.
In case you don't have that much inserts to care about performance, you can use following trigger to keep not more than 1000 records:
CREATE TRIGGER triggername AFTER INSERT ON tablename BEGIN
DELETE FROM tablename WHERE timestamp < (SELECT MIN(timestamp) FROM tablename ORDER BY timestamp DESC LIMIT 1000);
END
Creating unique index on timestamp column should be a good idea too (in case it isn't PK already). Also note, that SQLITE supports only FOR EACH ROW triggers, so when you bulk-insert many records it is worth to temporary disable the trigger.
If there are too many INSERTs, there isn't much you can do on database side. You can achieve less frequent trigger calls by adding trigger condition like AFTER INSERT WHEN NEW.rowid % 100 = 0. And with selects just use LIMIT 1000 (or create appropriate view).
I can't predict how much faster that would be. The best way would be just measure how much performance you will gain in your particular case.

How to convert a loop in SQL to Set-based logic

I have spent a good portion of today and yesterday attempting to decide whether to utilize a loop or cursor in SQL or to figure out how to use set based logic to solve the problem. I am not new to set logic, but this problem seems to be particularly complex.
The Problem
The idea is that if I have a list of all transactions (10's, 100's of millions) and a date they occurred, I can start combining some of that data into a daily totals table so that it is more rapidly view able by reporting and analytic systems. The pseudocode for this is as such:
foreach( row in transactions_table )
if( row in totals_table already exists )
update totals_table, add my totals to the totals row
else
insert into totals_table with my row as the base values
delete ( or archive ) row
As you can tell, the block of the loop is relatively trivial to implement, and as is the cursor/looping iteration. However, the execution time is quite slow and unwieldy and my question is: is there a non-iterative way to perform such a task, or is this one of the rare exceptions where I just have to "suck it up" and use a cursor?
There have been a few discussions on the topic, some of which seem to be similar, but not usable due to the if/else statement and the operations on another table, for instance:
How to merge rows of SQL data on column-based logic? This question doesn't seem to be applicable because it simply returns a view of all sums, and doesn't actually make logical decisions about additions or updates to another table
SQL Looping seems to have a few ideas about selection with a couple of cases statements which seems possible, but there are two operations that I need done dependent upon the status of another table, so this solution does not seem to fit.
SQL Call Stored Procedure for each Row without using a cursor This solution seems to be the closest to what I need to do, in that it can handle arbitrary numbers of operations on each row, but there doesn't seem to be a consensus among that group.
Any advice how to tackle this frustrating problem?
Notes
I am using SQL Server 2008
The schema setup is as follows:
Totals: (id int pk, totals_date date, store_id int fk, machine_id int fk, total_in, total_out)
Transactions: (transaction_id int pk, transaction_date datetime, store_id int fk, machine_id int fk, transaction_type (IN or OUT), transaction_amount decimal)
The totals should be computed by store, by machine, and by date, and should total all of the IN transactions into total_in and the OUT transactions into total_out. The goal is to get a pseudo data cube going.
You would do this in two set-based statements:
BEGIN TRANSACTION;
DECLARE #keys TABLE(some_key INT);
UPDATE tot
SET totals += tx.amount
OUTPUT inserted.some_key -- key values updated
INTO #keys
FROM dbo.totals_table AS tot WITH (UPDLOCK, HOLDLOCK)
INNER JOIN
(
SELECT t.some_key, amount = SUM(amount)
FROM dbo.transactions_table AS t WITH (HOLDLOCK)
INNER JOIN dbo.totals_table AS tot
ON t.some_key = tot.some_key
GROUP BY t.some_key
) AS tx
ON tot.some_key = tx.some_key;
INSERT dbo.totals_table(some_key, amount)
OUTPUT inserted.some_key INTO #keys
SELECT some_key, SUM(amount)
FROM dbo.transactions_table AS tx
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.totals_table
WHERE some_key = tx.some_key
)
GROUP BY some_key;
DELETE dbo.transactions_table
WHERE some_key IN (SELECT some_key FROM #keys);
COMMIT TRANSACTION;
(Error handling, applicable isolation level, rollback conditions etc. omitted for brevity.)
You do the update first so you don't insert new rows and then update them, performing work twice and possibly double counting. You could use output in both cases to a temp table, perhaps, to then archive/delete rows from the tx table.
I'd caution you to not get too excited about MERGE until they've resolved some of these bugs and you have read enough about it to be sure you're not lulled into any false confidence about how much "better" it is for concurrency and atomicity without additional hints. The race conditions you can work around; the bugs you can't.
Another alternative, from Nikola's comment
CREATE VIEW dbo.TotalsView
WITH SCHEMABINDING
AS
SELECT some_key_column(s), SUM(amount), COUNT_BIG(*)
FROM dbo.Transaction_Table
GROUP BY some_key_column(s);
GO
CREATE UNIQUE CLUSTERED INDEX some_key ON dbo.TotalsView(some_key_column(s));
GO
Now if you want to write queries that grab the totals, you can reference the view directly or - depending on query and edition - the view may automatically be matched even if you reference the base table.
Note: if you are not on Enterprise Edition, you may have to use the NOEXPAND hint to take advantage of the pre-aggregated values materialized by the view.
I do not think you need the loop.
You can just
Update all rows/sums that match your filters/ groups
Archive/ delete previous.
Insert all rows that do not match your filter/ groups
Archive/ delete previous.
SQL is supposed to use mass data not rows one by one.

Have "select for update" block on nonrexisting rows

we have some persistent data in an application, that is queried from a server and then stored in a database so we can keep track of additional information. Because we do not want to query when an object is used in the memory we do an select for update so that other threads that want to get the same data will be blocked.
I am not sure how select for update handles non-existing rows. If the row does not exist and another thread tries to do another select for update on the same row, will this thread be blocked until the other transaction finishes or will it also get an empty result set? If it does only get an empty result set is there any way to make it block as well, for example by inserting the missing row immediately?
EDIT:
Because there was a remark, that we might lock too much, here some more details on the concrete usage in our case. In reduced pseudocode our programm flow looks like this:
d = queue.fetch();
r = SELECT * FROM table WHERE key = d.key() FOR UPDATE;
if r.empty() then
r = get_data_from_somewhere_else();
new_r = process_stuff( r );
if Data was present then
update row to new_r
else
insert new_r
This code is run in multiple thread and the data that is fetched from the queue might be concerning the same row in the database (hence the lock). However if multiple threads are using data that needs the same row, then these threads need to be sequentialized (order does not matter). However this sequentialization fails, if the row is not present, because we do not get a lock.
EDIT:
For now I have the following solution, which seems like an ugly hack to me.
select the data for update
if zero rows match then
insert some dummy data // this will block if multiple transactions try to insert
if insertion failed then
// somebody beat us at the race
select the data for update
do processing
if data was changed then
update the old or dummy data
else
rollback the whole transaction
I am neither 100% sure however that this actually solves the problem, nor does this solution seem good style. So if anybody has to offer something more usable this would be great.
I am not sure how select for update handles non-existing rows.
It doesn't.
The best you can do is to use an advisory lock if you know something unique about the new row. (Use hashtext() if needed, and the table's oid to lock it.)
The next best thing is a table lock.
That being said, your question makes it sound like you're locking way more than you should. Only lock rows when you actually need to, i.e. write operations.
Example solution (i haven't found better :/)
Thread A:
BEGIN;
SELECT pg_advisory_xact_lock(42); -- database semaphore arbitrary ID
SELECT * FROM t WHERE id = 1;
DELETE FROM t WHERE id = 1;
INSERT INTO t (id, value) VALUES (1, 'thread A');
SELECT 1 FROM pg_sleep(10); -- only for race condition simulation
COMMIT;
Thread B:
BEGIN;
SELECT pg_advisory_xact_lock(42); -- database semaphore arbitrary ID
SELECT * FROM t WHERE id = 1;
DELETE FROM t WHERE id = 1;
INSERT INTO t (id, value) VALUES (1, 'thread B');
SELECT 1 FROM pg_sleep(10); -- only for race condition simulation
COMMIT;
Causes always correct order of transactions execution.
Looking at the code added in the second edit, it looks right.
As for it looking like a hack, there's a couple options - basically it's all about moving the database logic to the database.
One is simply to put the whole select for update, if not exist then insert logic in a function, and do select get_object(key1,key2,etc) instead.
Alternatively, you could make an insert trigger that will ignore attempts to add an entry if it already exists, and simply do an insert before you do the select for update. This does have more potential to interfere with other code already in place, though.
(If I remember to, I'll edit and add example code later on when I'm in a position to check what I'm doing.)

SQL trigger for deleting old results

We have a database that we are using to store test results for an embedded device. There's a table with columns for different types of failures (details not relevant), along with a primary key 'keynum' and a 'NUM_FAILURES' column that lists the number of failures. We store passes and failures, so a pass has a '0' in 'NUM_FAILURES'.
In order to keep the database from growing without bounds, we want to keep the last 1000 results, plus any of the last 50 failures that fall outside of the 1000. So, worst case, the table could have 1050 entries in it. I'm trying to find the most efficient SQL insert trigger to remove extra entries. I'll give what I have so far as an answer, but I'm looking to see if anyone can come up with something better, since SQL isn't something I do very often.
We are using SQLITE3 on a non-Windows platform, if it's relevant.
EDIT: To clarify, the part that I am having problems with is the DELETE, and specifically the part related to the last 50 failures.
The reason you want to remove these entries is to keep the database growing too big and not to keep it in some special state. For that i would really not use triggers and instead setup a job to run at some interval cleaning up the table.
So far, I have ended up using a View combined with a Trigger, but I'm not sure it's going to work for other reasons.
CREATE VIEW tablename_view AS SELECT keynum FROM tablename WHERE NUM_FAILURES!='0'
ORDER BY keynum DESC LIMIT 50;
CREATE TRIGGER tablename_trig
AFTER INSERT ON tablename WHEN (((SELECT COUNT(*) FROM tablename) >= 1000) or
((SELECT COUNT(NUM_FAILURES) FROM tablename WHERE NUM_FAILURES!='0') >= 50))
BEGIN
DELETE FROM tablename WHERE ((((SELECT MAX(keynum) FROM ibit) - keynum) >= 1000)
AND
((NUM_FAILURES=='0') OR ((SELECT MIN(keynum) FROM tablename_view) > keynum)));
END;
I think you may be using the wrong data structure. Instead I'd create two tables and pre-populate one with a 1000 rows (successes) and the other with 50 (failures). Put a primary ID on each. The when you record a result instead of inserting a new row find the ID+1 value for the last timestamped record entered (looping back to 0 if > max(id) in table) and update it with your new values.
This has the advantage of pre-allocating your storage, not requiring a trigger, and internally consistent logic. You can also adjust the size of the log very simply by just pre-populating more records rather than to have to change program logic.
There's several variations you can use on this, but the idea of using a closed loop structure rather than an open list would appear to match the problem domain more closely.
How about this:
DELETE
FROM table
WHERE ( id > ( SELECT max(id) - 1000 FROM table )
AND num_failures = 0
)
OR id > ( SELECT max(id) - 1050 FROM table )
If performance is a concern, it might be better to delete on a periodic basis, rather than on each insert.