Batching heterogeneous SQL prepared statements with Groovy? - sql

Batch insert using groovy Sql? discusses how to execute multiple prepared statements in a batch. But all the statements must have the same structure (passed in as a top-level argument to withBatch).
Is there a way to batch up heterogeneous prepared statements like:
sql.withBatch {ps ->
ps.addBatch("insert into t1 values(:a, :b)", [a:1, b:2])
ps.addBatch("insert into t2 values(:c)", [c:3])
}
(This throws an exception because addBatch doesn't have that signature.)

As described in Oracle's documentation:
Prepared statements:
The same statement is repeated with different bind variables.
Batch updates:
You can reduce the number of round-trips to the database, thereby improving application performance, by grouping multiple UPDATE, DELETE, or INSERT statements into a single batch and having the whole batch sent to the database and processed in one trip. This is especially useful in combination with a prepared statement.
As described in IBM's documentation and taken from here:
The JDBC drivers that support JDBC 2.0 and above support batch
updates. With batch updates, instead of updating rows of a DB2(R)
table one at a time, you can direct JDBC to execute a group of updates
at the same time. Statements that can be included in the same batch of
updates are known as batchable statements.
If a statement has input parameters or host expressions, you can
include that statement only in a batch that has other instances of the
same statement. This type of batch is known as a homogeneous batch. If
a statement has no input parameters, you can include that statement in
a batch only if the other statements in the batch have no input
parameters or host expressions. This type of batch is known as a
heterogeneous batch. Two statements that can be included in the same
batch are known as batch compatible.
This means that your request is not possible. The only advantage you can get is performance improvement of batching the same type of statements AND preparing only once:
When you execute a single SQL statement the database performs the following actions:
prepare the statement
bind the parameters
execute the statement
When you use batch commands the following happens:
prepare the statement (all received in a single transmission)
for all following identical statements with different parameters
bind the parameters
execute the statement
Since the preparation is performed only once you save time.
But you can sort and split the commands:
sql.withBatch(20, "insert into t1 values(:a, :b)") {
...
}
sql.withBatch(20, "insert into t2 values(:c)") {
...
}
BTW, what will compile is
sql.withBatch {ps ->
ps.addBatch("insert into t1 values(1, 2)")
ps.addBatch("insert into t2 values(3)")
}
But in this case I am curious what will happen: I expect that the JDBC driver will simply not use batching.

For this example, consider writing a stored procedure for your database (docs) that takes three parameters and inserts both records. Your application can call the procedure with a single prepared statement, and the statement could be batched.

Related

Concurrency Spanner java stacktrace on `bq query` update column of partitioned table

I've created sql that does an update of all values in one column
UPDATE `Blackout_POC2.measurements_2020`
SET visitor.customerId_enc = enc.encrypted
FROM `Blackout_POC2.encrypted` AS enc
WHERE dateAmsterdam="2020-01-05"
AND session.visitId = enc.visitId
AND visitor.customerId = enc.plain
where dateAmsterdam is the partition key of the measurements_2020 table, and encrypted is a non-partitioned table that holds visitId, plain and encrypted fields. The code sets all values in the customerId_enc column with values from the encrypted table.
The code works perfectly fine when I run it one day at a time, but when I run days in parallel, I occassionally (1% or so) get a stacktrace from my bq query <sql> (see below).
I thought that I could modify partitioned tables in parallel within each partition, but that seems to occassionally not be the case. Could someone point me to where this would be documented, and preferably how to avoid this?
I can probably just rerun that query again, since it is idempotent, but I would like to know why this happens.
Thanks
Bart van Deenen, data-engineer Bol.com
Error in query string: Error processing job 'bolcom-dev-blackout-339:bqjob_r131fa5b3dfd24829_0000016faec5e5da_1': domain: "cloud.helix.ErrorDomain"
code: "QUERY_ERROR" argument: "Could not serialize access to table bolcom-dev-blackout-339:Blackout_POC2.measurements_2020 due to concurrent update"
debug_info: "[CONCURRENT_UPDATE] Table modified by concurrent UPDATE/DELETE/MERGE DML or truncation at 1579185217979. Storage set job_uuid:
03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786, Reason: code=CONCURRENT_UPDATE message=Could not serialize
access to table bolcom-dev-blackout-339:Blackout_POC2.measurements_2020 due to concurrent update debug=Table modified by concurrent UPDATE/DELETE/MERGE
DML or truncation at 1579185217979. Storage set job_uuid: 03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786
errorProto=domain: \"cloud.helix.ErrorDomain\"\ncode: \"QUERY_ERROR\"\nargument: \"Could not serialize access to table bolcom-dev-
blackout-339:Blackout_POC2.measurements_2020 due to concurrent update\"\ndebug_info: \"Table modified by concurrent UPDATE/DELETE/MERGE DML or
truncation at 1579185217979. Storage set job_uuid: 03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786\"\n\n\tat
com.google.cloud.helix.common.Exceptions$Public.concurrentUpdate(Exceptions.java:381)\n\tat
com.google.cloud.helix.common.Exceptions$Public.concurrentUpdate(Exceptions.java:373)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerData.verifyStorageSetUpdate(StorageTrackerData.java:224)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.validateUpdates(AtomicStorageTrackerSpanner.java:1133)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.updateStorageSets(AtomicStorageTrackerSpanner.java:1310)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.updateStorageSets(AtomicStorageTrackerSpanner.java:1293)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker.updateStorageSets(MetaTableTracker.java:2274)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects$1.update(StorageSideEffects.java:1123)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects$1.update(StorageSideEffects.java:976)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker$1.update(MetaTableTracker.java:2510)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerSpanner.lambda$atomicUpdate$7(StorageTrackerSpanner.java:165)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory$1.run(AtomicStorageTrackerSpanner.java:3775)\n\tat com.google.cloud.helix.se
rver.metadata.AtomicStorageTrackerSpanner$Factory.lambda$performJobWithCommitResult$0(AtomicStorageTrackerSpanner.java:3792)\n\tat
com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$RetryCountingWork.run(SpannerTransactionContext.java:1002)\n\tat com.googl
e.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeWithResultInternal(SpannerTransactionContext.java:840)\n\tat com.goo
gle.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeOptimisticWithResultInternal(SpannerTransactionContext.java:722)\n
\tat com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.lambda$executeOptimisticWithResult$1(SpannerTransactionContex
t.java:716)\n\tat
com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeWithMonitoring(SpannerTransactionContext.java:942)\n\tat co
m.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeOptimisticWithResult(SpannerTransactionContext.java:715)\n\ta
t com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory.performJobWithCommitResult(AtomicStorageTrackerSpanner.java:3792)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory.performJobWithCommitResult(AtomicStorageTrackerSpanner.java:3720)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerSpanner.atomicUpdate(StorageTrackerSpanner.java:159)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker.atomicUpdate(MetaTableTracker.java:2521)\n\tat com.google.cloud.helix.server.metadata.StatsRequ
estLoggingTrackers$LoggingStorageTracker.lambda$atomicUpdate$8(StatsRequestLoggingTrackers.java:494)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.record(StatsRequestLoggingTrackers.java:181)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.record(StatsRequestLoggingTrackers.java:158)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.access$500(StatsRequestLoggingTrackers.java:123)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$LoggingStorageTracker.atomicUpdate(StatsRequestLoggingTrackers.java:493)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects.apply(StorageSideEffects.java:1238)\n\tat
com.google.cloud.helix.server.rosy.MergeStorageImpl.commitChanges(MergeStorageImpl.java:936)\n\tat
com.google.cloud.helix.server.rosy.MergeStorageImpl.merge(MergeStorageImpl.java:729)\n\tat
com.google.cloud.helix.server.rosy.StorageStubby.mergeStorage(StorageStubby.java:937)\n\tat
com.google.cloud.helix.proto2.Storage$ServiceParameters$21.handleBlockingRequest(Storage.java:2100)\n\tat
com.google.cloud.helix.proto2.Storage$ServiceParameters$21.handleBlockingRequest(Storage.java:2098)\n\tat
com.google.net.rpc3.impl.server.RpcBlockingApplicationHandler.handleRequest(RpcBlockingApplicationHandler.java:28)\n\tat
....
BigQuery DML operations doesn't have support for multi-statement transactions; nevertheless, you can execute some concurrent statements:
UPDATE and INSERT
DELETE and INSERT
INSERT and INSERT
For example, you execute two UPDATES statements simultaneously against the table then only one of them will succeed.
Keeping this in mind, due you can execute concurrently UPDATE and INSERT statements, another possible cause is if you are executing multiple UPDATE statements simultaneously.
You could try using the Scripting feature to manage the execution flow to prevent DML concurrency.

The real function of the "GO" Statement in SQL?

I heard that GO statement separates the command batches in SQL. And the CREATE transaction should be the only query on a batch.
But when i try:
Create database dbTest
Create table tbSomething(ID int primary key,Name varchar(30))
GO
The output is still SUCCESS.
So how does the GO Statement affect the SQL batches?
GO is used to divide a script into multiple batches.
The word GO is not a sql statement. It is understood by the SQL batch processor (for example SSMS) not by SQL Server.
Simply put, if GO appears on a line on its own, SSMS sends each section delimited by GO as a separate batch. SQL Server never sees the GO lines, only the SQL between them.
Because SQL Server has a syntactic rule that stored procedures must be defined in a batch on their own, you will often find database creation scripts which use GO to delimit the batches so that multiple stored procedures can be created from one script. However it is the client software which understands GO and divides the batches, not SQL server.
'GO' statement in SQL server is to just sends a signal to take the current batch of SQL statements for execution.
To tell in simple words, it works like a delimiter.
It is an indication of end of SQL statement [i.e., 1 batch that needs to be executed].

Correct behavior of libpq's PQexecPrepared and SQL EXECUTE

It looks like with postgres, there are two ways to prepare and execute prepared statements. You can use the functions PQprepare and PQexecPrepared directly from libpq; or you can issue SQL statements: PREPARE and EXECUTE. The statement names are the same across either method so you can use PQPrepare to prepare a statement and then execute it by issuing an EXECUTE query (or use a PREPARE query and then execute it with PQexecPrepared).
So the two approaches (library functions vs SQL queries) are equivalent. However, it looks like when you use PQexecPrepared, the query column of pg_stat_activity is the underlying prepared statement with placeholders. So something like:
SELECT * from users where name in ($1, $2, $3);
But when you use an EXECUTE query, pg_stat_activity contains the SQL of the EXECUTE, like:
EXECUTE user_query('joe', 'bob', 'sally');
Questions
Is there a way to get the same output for the two different ways of executing prepared statements?
Is there a way to see both the query, and the bound parameters when executing prepared statements?
You are right that both ways to execute a prepared statement do the same thing under the hood, but since they are called in different ways on the SQL level, they look different in pg_stat_activity. There is no way to change that.
To get the statement and the parameters you must resort to the log file.
In the case of PQexecPrepared, you will see the statement as a LOG message and the parameters as its DETAIL if you turn on statement logging.
With PREPARE and EXECUTE, you have no choice but to find the PREPARE earlier in the session (both have the same session identifier, that is %c in log_line_prefix).

SQL Server - SKIP DML statements from a given SQL Script file

I have a huge sql server script which has mix of ddl, dml operations and there is a requirement to create a clean db structure (with no data). Is it possible for a transaction to skip DML scripts through some parameter or some other way.
Thanks in Advance.
Arun
Out of the box: no. But you can wrap the DML statements in your script with something like:
if ($(RUNDML) = 1)
begin
--your dml here
end
Where RUNDML is a sqlcmd variable. You'd invoke your script with differing values of RUNDML based on whether or not you wanted data in the database being built.
Alternatively, separate the DML out into another script (or scripts) so you can choose whether to run the data portion of the build or not.

Does the prepared SQL statements in a stored procedure make the performance better?

I know there have already been lots of question about stored procedure vs prepared SQL statements, but I want to find out something different - if the prepared statements inside a procedure contribute to the performance of this stored procedure, which means make it better.
I have this question because I was told following points when searching some introduction of these 2 skills.
Stored procedure will store and compile your series of statements in
db, which will reduce the overhead of transferring & compiling.
Prepare statements will be compiled and cached in db for multiple
access which lead to less overhead.
I am puzzled about these 'compile', 'store', and 'overhead' - a little bit abstract.
I use prepared statement to avoid re-parse if it will be called frequently.
However should I use prepared statements (to cache & compile) inside a procedure? Since my procedure would have already been stored and compiled in DB, prepare something inside seems meaningless. (compile what was compiled?)
edit with sample code:
Create or Replace procedure MY_PROCEDURE
Begin
//totally meaningless here?
declare sqlStmt varchar(300);
declare stmt statement;
set sqlStmt='update MY_TABLE set NY_COLUMN=? where NY_COLUMN=?';
prepare stmt from sqlStmt;
execute stmt using 2,1
execute stmt using 4,3
..............
END
Is the the above one better than below, since it only parse the statement once? Or same, because statements in procedure will have been pre-compiled.
Create or Replace procedure MY_PROCEDURE
Begin
update MY_TABLE set NY_COLUMN=2 where NY_COLUMN=1;
update MY_TABLE set NY_COLUMN=4 where NY_COLUMN=3;
..............
END
When you first run a stored procedure the database engine parses the procedure and works out the optimal query plan to use when executing it - it then stores this query plan so that every time you run the procedure it doesn't have to recalculate it.
You can see this youself in Management Studio. If you CREATE or ALTER the stored procedure in question, then open a new query and use:
SET STATISTICS TIME ON
In that same query window run the stored procedure. In the messages tab of the result the first message will be something like:
SQL Server parse and compile time:
CPU time = 1038 ms, elapsed time = 1058 ms.
This is the overhead, execute the query again and you will see that the parse and compile time is now 0.
When you prepare a statement in code you get to take advantage of the same benefit. If you query is 'SELECT * FROM table WHERE #var = '+$var, each time you run that query SQL Server has to parse it and calculate the optimal execution plan. If you use a prepared statement SELECT * FROM table WHERE ?, SQL Server will calculate the optimal execution plan the first time you run the prepared statement, and from then on it can reuse the execution plan as with a stored procedure. The same goes if the statement you are executing is 'EXEC dbo.myProc #var = '+$var, SQL Server would still have to parse this statement each time so a prepared statement should still be used.
You do not need to prepare statements that you write inside stored procedures because they are already compiled as shown above - they are prepared statements in themselves.
On thing you should be aware of when using stored procedure and prepared statements is parameter sniffing.
SQL Server calculates and stores the optimal execution plan for the first variables used, if you happen to execute the stored procedure with some unusual variable on the first run, the execution plan stored may be completely suboptimal for the sorts of variables you typically use.
If you find you can execute a stored procedure from Management Studio and it takes say 2 seconds to execute, but performing the same action in your application takes 20 seconds, it's probably as a result of parameter sniffing.
In DB2 actually the opposite may be true. Statements in an SQL routine are prepared when the routine is compiled. Dynamic SQL statements, as in your example, are prepared during the routine run time.
As a consequence, the preparation of dynamic statements will take into account the most current table and index statistics and other compilation environment settings, such as isolation level, while static statements will use the statistics that were in effect during the routine compilation or the latest bind.
If you want stable execution plans, use static SQL. If your statistics change frequently, you may want to use dynamic SQL (or make sure you rebind your routines' packages accordingly).
The same logic applies to Oracle PL/SQL routines, although the way to recompile static SQL differs -- you'll need to invalidate the corresponding routines.