Concurrency Spanner java stacktrace on `bq query` update column of partitioned table - google-bigquery

I've created sql that does an update of all values in one column
UPDATE `Blackout_POC2.measurements_2020`
SET visitor.customerId_enc = enc.encrypted
FROM `Blackout_POC2.encrypted` AS enc
WHERE dateAmsterdam="2020-01-05"
AND session.visitId = enc.visitId
AND visitor.customerId = enc.plain
where dateAmsterdam is the partition key of the measurements_2020 table, and encrypted is a non-partitioned table that holds visitId, plain and encrypted fields. The code sets all values in the customerId_enc column with values from the encrypted table.
The code works perfectly fine when I run it one day at a time, but when I run days in parallel, I occassionally (1% or so) get a stacktrace from my bq query <sql> (see below).
I thought that I could modify partitioned tables in parallel within each partition, but that seems to occassionally not be the case. Could someone point me to where this would be documented, and preferably how to avoid this?
I can probably just rerun that query again, since it is idempotent, but I would like to know why this happens.
Thanks
Bart van Deenen, data-engineer Bol.com
Error in query string: Error processing job 'bolcom-dev-blackout-339:bqjob_r131fa5b3dfd24829_0000016faec5e5da_1': domain: "cloud.helix.ErrorDomain"
code: "QUERY_ERROR" argument: "Could not serialize access to table bolcom-dev-blackout-339:Blackout_POC2.measurements_2020 due to concurrent update"
debug_info: "[CONCURRENT_UPDATE] Table modified by concurrent UPDATE/DELETE/MERGE DML or truncation at 1579185217979. Storage set job_uuid:
03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786, Reason: code=CONCURRENT_UPDATE message=Could not serialize
access to table bolcom-dev-blackout-339:Blackout_POC2.measurements_2020 due to concurrent update debug=Table modified by concurrent UPDATE/DELETE/MERGE
DML or truncation at 1579185217979. Storage set job_uuid: 03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786
errorProto=domain: \"cloud.helix.ErrorDomain\"\ncode: \"QUERY_ERROR\"\nargument: \"Could not serialize access to table bolcom-dev-
blackout-339:Blackout_POC2.measurements_2020 due to concurrent update\"\ndebug_info: \"Table modified by concurrent UPDATE/DELETE/MERGE DML or
truncation at 1579185217979. Storage set job_uuid: 03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786\"\n\n\tat
com.google.cloud.helix.common.Exceptions$Public.concurrentUpdate(Exceptions.java:381)\n\tat
com.google.cloud.helix.common.Exceptions$Public.concurrentUpdate(Exceptions.java:373)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerData.verifyStorageSetUpdate(StorageTrackerData.java:224)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.validateUpdates(AtomicStorageTrackerSpanner.java:1133)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.updateStorageSets(AtomicStorageTrackerSpanner.java:1310)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.updateStorageSets(AtomicStorageTrackerSpanner.java:1293)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker.updateStorageSets(MetaTableTracker.java:2274)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects$1.update(StorageSideEffects.java:1123)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects$1.update(StorageSideEffects.java:976)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker$1.update(MetaTableTracker.java:2510)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerSpanner.lambda$atomicUpdate$7(StorageTrackerSpanner.java:165)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory$1.run(AtomicStorageTrackerSpanner.java:3775)\n\tat com.google.cloud.helix.se
rver.metadata.AtomicStorageTrackerSpanner$Factory.lambda$performJobWithCommitResult$0(AtomicStorageTrackerSpanner.java:3792)\n\tat
com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$RetryCountingWork.run(SpannerTransactionContext.java:1002)\n\tat com.googl
e.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeWithResultInternal(SpannerTransactionContext.java:840)\n\tat com.goo
gle.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeOptimisticWithResultInternal(SpannerTransactionContext.java:722)\n
\tat com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.lambda$executeOptimisticWithResult$1(SpannerTransactionContex
t.java:716)\n\tat
com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeWithMonitoring(SpannerTransactionContext.java:942)\n\tat co
m.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeOptimisticWithResult(SpannerTransactionContext.java:715)\n\ta
t com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory.performJobWithCommitResult(AtomicStorageTrackerSpanner.java:3792)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory.performJobWithCommitResult(AtomicStorageTrackerSpanner.java:3720)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerSpanner.atomicUpdate(StorageTrackerSpanner.java:159)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker.atomicUpdate(MetaTableTracker.java:2521)\n\tat com.google.cloud.helix.server.metadata.StatsRequ
estLoggingTrackers$LoggingStorageTracker.lambda$atomicUpdate$8(StatsRequestLoggingTrackers.java:494)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.record(StatsRequestLoggingTrackers.java:181)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.record(StatsRequestLoggingTrackers.java:158)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.access$500(StatsRequestLoggingTrackers.java:123)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$LoggingStorageTracker.atomicUpdate(StatsRequestLoggingTrackers.java:493)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects.apply(StorageSideEffects.java:1238)\n\tat
com.google.cloud.helix.server.rosy.MergeStorageImpl.commitChanges(MergeStorageImpl.java:936)\n\tat
com.google.cloud.helix.server.rosy.MergeStorageImpl.merge(MergeStorageImpl.java:729)\n\tat
com.google.cloud.helix.server.rosy.StorageStubby.mergeStorage(StorageStubby.java:937)\n\tat
com.google.cloud.helix.proto2.Storage$ServiceParameters$21.handleBlockingRequest(Storage.java:2100)\n\tat
com.google.cloud.helix.proto2.Storage$ServiceParameters$21.handleBlockingRequest(Storage.java:2098)\n\tat
com.google.net.rpc3.impl.server.RpcBlockingApplicationHandler.handleRequest(RpcBlockingApplicationHandler.java:28)\n\tat
....

BigQuery DML operations doesn't have support for multi-statement transactions; nevertheless, you can execute some concurrent statements:
UPDATE and INSERT
DELETE and INSERT
INSERT and INSERT
For example, you execute two UPDATES statements simultaneously against the table then only one of them will succeed.
Keeping this in mind, due you can execute concurrently UPDATE and INSERT statements, another possible cause is if you are executing multiple UPDATE statements simultaneously.
You could try using the Scripting feature to manage the execution flow to prevent DML concurrency.

Related

GCP BigQuery - Transaction is aborted due to concurrent update (though no concurrent update)

I have something like this:
BEGIN transaction ;
UPDATE `$BQ_DATA_PROJECT`.`$DWDB`.Table_a AS ic ;
update `$BQ_DATA_PROJECT`.`$DWDB`.Table_a AS ic ;
Insert into `$BQ_DATA_PROJECT`.`$DWDB`.Table_a ;
Commit transaction;
Error:
Transaction is aborted due to concurrent update against table
`$BQ_DATA_PROJECT`.`$DWDB`.Table_a
The GCP big query documentation says we can have 20 DML operations for a table but it fails just for this.
And this is random, sometimes it fails, sometimes it doesn't. If I run after sometime it succeeds.
One thing that works is not giving them in a BT/CT block but Is there anything else that I can try to avoid getting
the concurrent update error
This error occurs due to concurrency of mutating DML (UPDATE, DELETE and MERGE) statements which possibly causes conflict as whenever a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. Conflicting transactions are canceled. As BigQuery uses optimistic concurrency control means when committing a job, BigQuery checks if mutations generated by the job conflict with any other mutations performed on the table while the job runs and if any conflict is found then commit operation is aborted.
To avoid this error, I would suggest space out the DML operations such that they operate on the same table but at different times. If your application is more transaction specific then it would be better to use Cloud SQL rather than Bigquery. You can follow the best practices to avoid any errors in future.

Understanding locks and query status in Snowflake (multiple updates to a single table)

While using the python connector for snowflake with queries of the form
UPDATE X.TABLEY SET STATUS = %(status)s, STATUS_DETAILS = %(status_details)s WHERE ID = %(entry_id)s
, sometimes I get the following message:
(snowflake.connector.errors.ProgrammingError) 000625 (57014): Statement 'X' has locked table 'XX' in transaction 1588294931722 and this lock has not yet been released.
and soon after that
Your statement X' was aborted because the number of waiters for this lock exceeds the 20 statements limit
This usually happens when multiple queries are trying to update a single table. What I don't understand is that when I see the query history in Snowflake, it says the query finished successfully (Succeded Status) but in reality, the Update never happened, because the table did not alter.
So according to https://community.snowflake.com/s/article/how-to-resolve-blocked-queries I used
SELECT SYSTEM$ABORT_TRANSACTION(<transaction_id>);
to release the lock, but still, nothing happened and even with the succeed status the query seems to not have executed at all. So my question is, how does this really work and how can a lock be released without losing the execution of the query (also, what happens to the other 20+ queries that are queued because of the lock, sometimes it seems that when the lock is released the next one takes the lock and have to be aborted as well).
I would appreciate it if you could help me. Thanks!
Not sure if Sergio got an answer to this. The problem in this case is not with the table. Based on my experience with snowflake below is my understanding.
In snowflake, every table operations also involves a change in the meta table which keeps track of micro partitions, min and max. This meta table supports only 20 concurrent DML statements by default. So if a table is continuously getting updated and getting hit at the same partition, there is a chance that this limit will exceed. In this case, we should look at redesigning the table updation/insertion logic. In one of our use cases, we increased the limit to 50 after speaking to snowflake support team
UPDATE, DELETE, and MERGE cannot run concurrently on a single table; they will be serialized as only one can take a lock on a table at at a time. Others will queue up in the "blocked" state until it is their turn to take the lock. There is a limit on the number of queries that can be waiting on a single lock.
If you see an update finish successfully but don't see the updated data in the table, then you are most likely not COMMITting your transactions. Make sure you run COMMIT after an update so that the new data is committed to the table and the lock is released.
Alternatively, you can make sure AUTOCOMMIT is enabled so that DML will commit automatically after completion. You can enable it with ALTER SESSION SET AUTOCOMMIT=TRUE; in any sessions that are going to run an UPDATE.

Redshift: Serializable isolation violation on table

I have a very large Redshift database that contains billions of rows of HTTP request data.
I have a table called requests which has a few important fields:
ip_address
city
state
country
I have a Python process running once per day, which grabs all distinct rows which have not yet been geocoded (do not have any city / state / country information), and then attempts to geocode each IP address via Google's Geocoding API.
This process (pseudocode) looks like this:
for ip_address in ips_to_geocode:
country, state, city = geocode_ip_address(ip_address)
execute_transaction('''
UPDATE requests
SET ip_country = %s, ip_state = %s, ip_city = %s
WHERE ip_address = %s
''')
When running this code, I often receive errors like the following:
psycopg2.InternalError: 1023
DETAIL: Serializable isolation violation on table - 108263, transactions forming the cycle are: 647671, 647682 (pid:23880)
I'm assuming this is because I have other processes constantly logging HTTP requests into my table, so when I attempt to execute my UPDATE statement, it is unable to select all rows with the ip address I'd like to update.
My question is this: what can I do to update these records in a sane way that will stop failing regularly?
Your code is violating the serializable isolation level of Redshift. You need to make sure that your code is not trying to open multiple transactions on the same table before closing all open transactions.
You can achieve this by locking the table in each transaction so that no other transaction can access the table for updates until the open transaction gets closed. Not sure how your code is architected (synchronous or asynchronous), but this will increase the run time as each lock will force others to wait till the transaction gets over.
Refer: http://docs.aws.amazon.com/redshift/latest/dg/r_LOCK.html
Just got the same issue on my code, and this is how I fixed it:
First things first, it is good to know that this error code means you are trying to do concurrent operations in redshift. When you do a second query to a table before the first query you did moments ago was done, for example, is a case where you would get this kind of error (that was my case).
Good news is: there is a simple way to serialize redshift operations! You just need to use the LOCK command. Here is the Amazon documentation for the redshift LOCK command. It works basically making the next operation wait until the previous one is closed. Note that, using this command your script will naturally get a little bit slower.
In the end, the practical solution for me was: I inserted the LOCK command before the query messages (in the same string, separated by a ';'). Something like this:
LOCK table_name; SELECT * from ...
And you should be good to go! I hope it helps you.
Since you are doing a point update in your geo codes update process, while the other processes are writing to the table, you can intermittently get the Serializable isolation violation error depending on how and when the other process does its write to the same table.
Suggestions
One way is to use a table lock like Marcus Vinicius Melo has suggested in his answer.
Another approach is to catch the error and re run the transaction.
For any serializable transaction, it is said that the code initiating the transaction should be ready to retry the transaction in the face of this error. Since all transactions in Redshift are strictly serializable, all code initiating transactions in Redshift should be ready to retry them in the face of this error.
Explanations
The typical cause of this error is that two transactions started and proceeded in their operations in such a way that at least one of them cannot be completed as if they executed one after the other. So the db system chooses to abort one of them by throwing this error. This essentially gives control back to the transaction initiating code to take an appropriate course of action. Retry being one of them.
One way to prevent such a conflicting sequence of operations is to use a lock. But then it restricts many of the cases from executing concurrently which would not have resulted in a conflicting sequence of operations. The lock will ensure that the error will not occur but will also be concurrency restricting. The retry approach lets concurrency have its chance and handles the case when a conflict does occur.
Recommendation
That said, I would still recommend that you don't update Redshift in this manner, like point updates. The geo codes update process should write to a staging table, and once all records are processed, perform one single bulk update, followed by a vacuum if required.
Either you start a new session when you do second update on the same table or you have to 'commit' once you transaction is complete.
You can write set autocommit=on before you start updating.

How to do a massive insert

I have an application in which I have to insert in a database, with SQL Server 2008, in groups of N tuples and all the tuples have to be inserted to be a success insert, my question is how I insert these tuples and in the case that someone of this fail, I do a rollback to eliminate all the tuples than was inserted correctly.
Thanks
On SQL Server you might consider doing a bulk insert.
From .NET, you can use SQLBulkCopy.
Table-valued parameters (TVPs) are a second route. In your insert statement, use WITH (TABLOCK) on the target table for minimal logging. eg:
INSERT Table1 WITH (TABLOCK) (Col1, Col2....)
SELECT Col1, Col1, .... FROM #tvp
Wrap it in a stored procedure that exposes #tvp as parameter, add some transaction handling, and call this procedure from your app.
You might even try passing the data as XML if it has a nested structure, and shredding it to tables on the database side.
You should look into transactions. This is a good intro article that discusses rolling back and such.
If you are inserting the data directly from the program, it seems like what you need are transactions. You can start a transaction directly in a stored procedure or from a data adapter written in whatever language you are using (for instance, in C# you might be using ADO.NET).
Once all the data has been inserted, you can commit the transaction or do a rollback if there was an error.
See Scott Mitchell's "Managing Transactions in SQL Server Stored Procedures for some details on creating, committing, and rolling back transactions.
For MySQL, look into LOAD DATA INFILE which lets you insert from a disk file.
Also see the general MySQL discussion on Speed of INSERT Statements.
For a more detailed answer please provide some hints as to the software stack you are using, and perhaps some source code.
You have two competing interests, doing a large transaction (which will have poor performance, high risk of failure), or doing a rapid import (which is best not to do all in one transaction).
If you are adding rows to a table, then don't run in a transaction. You should be able to identify which rows are new and delete them should you not like how the look on the first round.
If the transaction is complicated (each row affects dozens of tables, etc) then run them in transactions in small batches.
If you absolutely have to run a huge data import in one transaction, consider doing it when the database is in single user mode and consider using the checkpoint keyword.

Sybase ASE: "Your server command encountered a deadlock situation"

When running a stored procedure (from a .NET application) that does an INSERT and an UPDATE, I sometimes (but not that often, really) and randomly get this error:
ERROR [40001] [DataDirect][ODBC Sybase Wire Protocol driver][SQL Server]Your server command (family id #0, process id #46) encountered a deadlock situation. Please re-run your command.
How can I fix this?
Thanks.
Your best bet for solving you deadlocking issue is to set "print deadlock information" to on using
sp_configure "print deadlock information", 1
Everytime there is a deadlock this will print information about what processes were involved and what sql they were running at the time of the dead lock.
If your tables are using allpages locking. It can reduce deadlocks to switch to datarows or datapages locking. If you do this make sure to gather new stats on the tables and recreate indexes, views, stored procedures and triggers that access the tables that are changed. If you don't you will either get errors or not see the full benefits of the change depending on which ones are not recreated.
I have a set of long term apps which occasionally over lap table access and sybase will throw this error. If you check the sybase server log it will give you the complete info on why it happened. Like: The sql that was involved the two processes trying to get a lock. Usually one trying to read and the other doing something like a delete. In my case the apps are running in separate JVMs, so can't sychronize just have to clean up periodically.
Assuming that your tables are properly indexed (and that you are actually using those indexes - always worth checking via the query plan) you could try breaking the component parts of the SP down and wrapping them in separate transactions so that each unit of work is completed before the next one starts.
begin transaction
update mytable1
set mycolumn = "test"
where ID=1
commit transaction
go
begin transaction
insert into mytable2 (mycolumn) select mycolumn from mytable1 where ID = 1
commit transaction
go