SELECT, check and INSERT if condition is met - sql

I have the following tables: Tour, Customer, TourCustomerXRef. Assuming every Tour has a capacity Cap. Now a request arrives at the backend for a booking.
What I need to do is:
SELECT and count() all of the entries in TourCustomerXRef where the tourid=123
In the program code (or the query?): If count() < Cap
True: INSERT into TourCustomerXRef, return success
False: return an error
However, it might be possible that the backend api is being called concurrently. This could result into the SELECT statement returning a number under the max. capacity both times, resulting in the INSERT being executed multiple times (even though there is just one place left).
What would be the best prevention for above case? set transaction isolation level serializable? or Repeatable Read?
I'm worried that the application logic (even though it's just a simple if) could hurt the performance, since read and write locks would not allow the api to execute querys that just want to select the data without inserting anything, right?
(Using mariaDB)

You can try using lock hint "FOR UPDATE" in the SELECT (mysql example):
BEGIN TRANSACTION;
SELECT * FROM TourCustomerXRef FOR UPDATE WHERE ...;
INSERT ...;
COMMIT;

Related

Why can "select * from table* cause a transaction to be non-serializable (postgres)?

i run two transactions with isolation level serializable in parallel. these contain the same statements:
select * from table;
insert into table values ... ;
(i took the exact case from this video https://youtu.be/4EajrPgJAk0?t=1472 at 24:34)
i can reproduce the same error as in the video, but removing the select statement makes the inserts pass. if i remove the inserts, the selects pass.
now my question(s):
why does the select cause the transactions to fail? it's just a select, it performs no insert or update. the updates alone do not fail. logically, this makes no sense to me.
according to an assignment i saw, it's apparently possible to reproduce an error in mode serializable, make the transactions pass with "read committed", and all that with a single statement per transactions.
my understanding after watching the video above doesn't allow this. waht obvious thing am i misunderstanding?
As described in detail in the documentation[1], postgres will determine if data modified by one transaction is accessed by the other. In the case you remove the selects that will not be the case, presumably.
[1] https://www.postgresql.org/docs/12/transaction-iso.html#XACT-SERIALIZABLE

How to establish read-only-once implement within SAP HANA?

Context: I am a long-time MSSQL developer... What I would like to know is how to implement a read-only-once select from SAP HANA.
High-level pseudo-code:
Collect request via db proc (query)
Call API with request
Store results of the request (response)
I have a table (A) that is the source of inputs to a process. Once a process has completed it will write results to another table (B).
Perhaps this is all solved if I just add a column to table A to avoid concurrent processors from selecting the same records from A?
I am wondering how to do this without adding the column to source table A.
What I have tried is a left outer join between tables A and B to get rows from A that have no corresponding rows (yet) in B. This doesn't work, or I haven't implemented such that rows are processed only 1 time by any of the processors.
I have a stored proc to handle batch selection:
/*
* getBatch.sql
*
* SYNOPSIS: Retrieve the next set of criteria to be used in a search
* request. Use left outer join between input source table
* and results table to determine the next set of inputs, and
* provide support so that concurrent processes may call this
* proc and get their inputs exclusively.
*/
alter procedure "ACOX"."getBatch" (
in in_limit int
,in in_run_group_id varchar(36)
,out ot_result table (
id bigint
,runGroupId varchar(36)
,sourceTableRefId integer
,name nvarchar(22)
,location nvarchar(13)
,regionCode nvarchar(3)
,countryCode nvarchar(3)
)
) language sqlscript sql security definer as
begin
-- insert new records:
insert into "ACOX"."search_result_v4" (
"RUN_GROUP_ID"
,"BEGIN_DATE_TS"
,"SOURCE_TABLE"
,"SOURCE_TABLE_REFID"
)
select
in_run_group_id as "RUN_GROUP_ID"
,CURRENT_TIMESTAMP as "BEGIN_DATE_TS"
,'acox.searchCriteria' as "SOURCE_TABLE"
,fp.descriptor_id as "SOURCE_TABLE_REFID"
from
acox.searchCriteria fp
left join "ACOX"."us_state_codes" st
on trim(fp.region) = trim(st.usps)
left outer join "ACOX"."search_result_v4" r
on fp.descriptor_id = r.source_table_refid
where
st.usps is not null
and r.BEGIN_DATE_TS is null
limit :in_limit;
-- select records inserted for return:
ot_result =
select
r.ID id
,r.RUN_GROUP_ID runGroupId
,fp.descriptor_id sourceTableRefId
,fp.merch_name name
,fp.Location location
,st.usps regionCode
,'USA' countryCode
from
acox.searchCriteria fp
left join "ACOX"."us_state_codes" st
on trim(fp.region) = trim(st.usps)
inner join "ACOX"."search_result_v4" r
on fp.descriptor_id = r.source_table_refid
and r.COMPLETE_DATE_TS is null
and r.RUN_GROUP_ID = in_run_group_id
where
st.usps is not null
limit :in_limit;
end;
When running 7 concurrent processors, I get a 35% overlap. That is to say that out of 5,000 input rows, the resulting row count is 6,755. Running time is about 7 mins.
Currently my solution includes adding a column to the source table. I wanted to avoid that but it seems to make a simpler implement. I will update the code shortly, but it includes an update statement prior to the insert.
Useful references:
SAP HANA Concurrency Control
Exactly-Once Semantics Are Possible: Here’s How Kafka Does It
First off: there is no "read-only-once" in any RDBMS, including MS SQL.
Literally, this would mean that a given record can only be read once and would then "disappear" for all subsequent reads. (that's effectively what a queue does, or the well-known special-case of a queue: the pipe)
I assume that that is not what you are looking for.
Instead, I believe you want to implement a processing-semantic analogous to "once-and-only-once" aka "exactly-once" message delivery. While this is impossible to achieve in potentially partitioned networks it is possible within the transaction context of databases.
This is a common requirement, e.g. with batch data loading jobs that should only load data that has not been loaded so far (i.e. the new data that was created after the last batch load job began).
Sorry for the long pre-text, but any solution for this will depend on being clear on what we want to actually achieve. I will get to an approach for that now.
The major RDBMS have long figured out that blocking readers is generally a terrible idea if the goal is to enable high transaction throughput. Consequently, HANA does not block readers - ever (ok, not ever-ever, but in the normal operation setup).
The main issue with the "exactly-once" processing requirement really is not the reading of the records, but the possibility of processing more than once or not at all.
Both of these potential issues can be addressed with the following approach:
SELECT ... FOR UPDATE ... the records that should be processed (based on e.g. unprocessed records, up to N records, even-odd-IDs, zip-code, ...). With this, the current session has an UPDATE TRANSACTION context and exclusive locks on the selected records. Other transactions can still read those records, but no other transaction can lock those records - neither for UPDATE, DELETE, nor for SELECT ... FOR UPDATE ... .
Now you do your processing - whatever this involves: merging, inserting, updating other tables, writing log-entries...
As the final step of the processing, you want to "mark" the records as processed. How exactly this is implemented, does not really matter.
One could create a processed-column in the table and set it to TRUE when records have been processed. Or one could have a separate table that contains the primary keys of the processed records (and maybe a load-job-id to keep track of multiple load jobs).
In whatever way this is implemented, this is the point in time, where this processed status needs to be captured.
COMMIT or ROLLBACK (in case something went wrong). This will COMMIT the records written to the target table, the processed-status information, and it will release the exclusive locks from the source table.
As you see, Step 1 takes care of the issue that records may be missed by selecting all wanted records that can be processed (i.e. they are not exclusively locked by any other process).
Step 3 takes care of the issue of records potentially be processed more than once by keeping track of the processed records. Obviously, this tracking has to be checked in Step 1 - both steps are interconnected, which is why I point them out explicitly. Finally, all the processing occurs within the same DB-transaction context, allowing for guaranteed COMMIT or ROLLBACK across the whole transaction. That means, that no "record marker" will ever be lost when the processing of the records was committed.
Now, why is this approach preferable to making records "un-readable"?
Because of the other processes in the system.
Maybe the source records are still read by the transaction system but never updated. This transaction system should not have to wait for the data load to finish.
Or maybe, somebody wants to do some analytics on the source data and also needs to read those records.
Or maybe you want to parallelise the data loading: it's easily possible to skip locked records and only work on the ones that are "available for update" right now. See e.g. Load balancing SQL reads while batch-processing? for that.
Ok, I guess you were hoping for something easier to consume; alas, that's my approach to this sort of requirement as I understood it.

Get exact ID within thousands of concurrent transactions

I have a very crowd table with thousands of logs. Every time a new log is inserted in a database, I need to do update some tables using the ID of new log. So I get the last ID using these two lines of code:
objcon.execute "insert into logs (member,refer) values (12,12345)"
objcon.execute "select top 1 id from logs order by id desc"
I am afraid if the second line get another ID from a most recent order because there are thousands of new logs in one second.
This is a sample scenario and I know that there is built in methods to get the ID of recently inserted row. But my exact question is that if there is a logical order of transactions in a server (both IIS and SQL server) or it is possible that a new transaction finishes before an old transaction so the second line, get the ID of another log?
It is definitely possible that your second query will get id from another transaction. I strongly suggest that you use SCOPE_IDENTITY(). These kind of methods are provided in DBMS just for this exact scenario where you insert a row and then do select the last row from that table, but in between these 2 operations other connections might have inserted new rows.
Yes. Concurrent transactions can cause problems with what you are trying to do.
The right solution is the output clause. The code looks like this:
declare #ids table (id int);
insert into logs (member, refer)
output inserted.id into #ids
values (12, 12345);
select *
from #ids;
You can find multiple discussions on the web about why OUTPUT is better. Here are some reasons:
You can return multiple columns, not just the identity.
It is session- and table- safe.
It handles multiple rows.
It is the same syntax for SELECT, UPDATE, and DELETE.
If you don't specify a WHERE clause on the SELECT query, you would need to execute these queries in a transaction and under the SNAPSHOT isolation level before committing the changes. That way, only changes made by the current transaction are visible.
It would be better to use SCOPE_IDENTITY() to return the last identity value generated in the outermost scope of current connection. This differs from ##IDENTITY in that the value is not affected by triggers that might also generate identify values.
objcon.execute "insert into logs (member,refer) values (12,12345)"
objcon.execute "select SCOPE_IDENTITY() AS id;"

Isolation level for two statements on the same transaction vs. single statement

I have the following table:
DROP TABLE IF EXISTS event;
CREATE TABLE event(
kind VARCHAR NOT NULL,
num INTEGER NOT NULL
);
ALTER TABLE event ADD PRIMARY KEY (kind, num);
The idea is that I want to use the num column to maintain separate increment counters for each kind of event. So num is like a dedicated auto-increment sequence for each different kind.
Assuming multiple clients/threads writing events (potentially of the same kind) into that table concurrently, is there any difference in terms of the required level for transaction isolation between: (a) executing the following block:
BEGIN TRANSACTION;
DO
$do$
DECLARE
nextNum INTEGER;
BEGIN
SELECT COALESCE(MAX(num),-1)+1 FROM event WHERE kind='A' INTO nextNum;
INSERT INTO event(kind, num) VALUES('A', nextNum);
END;
$do$;
COMMIT;
... and (b) combining the select and insert into a single statement:
INSERT INTO event(kind, num)
(SELECT 'A', COALESCE(MAX(num),-1)+1 FROM event WHERE kind='A');
From some tests I run it seems that in both cases I need the serializable transaction isolation level. What's more, even with the serializable transactions isolation level, my code has to be prepared to retry due to the following error in highly concurrency situations:
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
In other words merging the select into the insert does not seem to confer any benefit in terms of atomicity or allow any lower/more lenient transaction isolation level to be set. Am I missing anything?
(This question is broadly related in that it asks for a PostgreSQL pattern to facilitate the generation of multiple sequences inside a table. So just to be clear I am not asking for the right pattern to do that sort of thing; I just want to understand if the block of two statements is in any way different than a single merged INSERT/SELECT statement).
The problem with the task is possible concurrent write access. Merging SELECT and INSERT into one statement reduces the time frame for possible conflicts to a minimum and is the superior approach in any case. The potential for conflicts is still there. And yes, serializable transaction isolation is one possible (if expensive) solution.
Typically (but that's not what you are asking), the best solution is not to try what you are trying. Gapless sequential numbers are a pain in databases with concurrent write access. If possible, use a serial column instead, which gives you unique ascending numbers - with possible gaps. You can eliminate gaps later or dynamically with in a VIEW. Details:
Serial numbers per group of rows for compound key
Aside: you don't need parentheses around the SELECT:
INSERT INTO event(kind, num)
SELECT 'A', COALESCE(MAX(num) + 1, 0) FROM event WHERE kind='A';

Prevent other sessions from reading data until I'm finished

Prevent other sessions from reading data until I'm finished
I have a table that holds customers from different companies, something like:
CUSTOMER
CUSTOMER_ID
COMPANY_ID
CUSTOMER_NAME
FOO_CODE
When I insert or update a customer I need to calculate a FOO_CODE based on existing ones (within the company).
If I simply do this:
SELECT MAX(FOO_CODE) AS GREATEST_CODE_SO_FAR
FROM CUSTOMER
WHERE COMPANY_ID=:company_id
... then generate the code in the client language (PHP) and finally issue the INSERT/UPDATE I understand I can face a race condition if other program instance fetches the same GREATEST_CODE_SO_FAR.
Is it possible to issue a row-level lock on the table so other sessions that attempt to read the FOO_CODE column of any customer that belongs to a given company are delayed until I commit or rollback my transaction?
My failed attemps:
This:
SELECT MAX(FOO_CODE)
FROM CUSTOMER
WHERE COMPANY_ID=:company_id
FOR UPDATE
... triggers:
ORA-01786: FOR UPDATE of this query expression is not allowed
This:
SELECT FOO_CODE
FROM CUSTOMER
WHERE COMPANY_ID=:company_id
FOR UPDATE
... retrieves all company rows and does not even prevent other sessions from reading data.
LOCK TABLE... well, documentation barely has any example and I can't figure out the syntax
P.S. Is it not an incrementing number, it's an alphanumeric string.
You can't block another session from reading data, as far as I'm aware. One of the differences between Oracle and some other databases is that writers don't block readers.
I'd probably look at this slightly differently. I'm assuming the way you generate the next foo_code is deterministic. If you add a unique index on company_id, foo_code then you can have your application attempt the insert in a loop:
get your current max value
calculate your new code
do the insert
if you don't get a constraint violation, break out of the loop
otherwise continue to the next iteration of the loop and repeat the process
If two sessions attempt this at the same time then the second one will attempt to insert the same foo_code and will get a unique constraint violation. That is trapped and handled nicely and it just tries again; potentially multiple times until it gets a clean insert.
You could have a DB procedure that attempts the insert in a loop, but since you want to generate the new value in PHP then it would make sense for the loop to be in PHP too, attempting a simple insert.
This doesn't necessarily scale well if you have a high volume of inserts and clashes are likely. But if you're expecting simultaneous inserts for the same customer to be rare and just have to handle the odd occasions when it does happen this won't add much overhead.