Snowflake Query Killed: "SQL execution canceled" - sql

I've got a Talend job running with a couple of dataflows running in parallel against a Snowflake database. An update statement against Table A is causing an update on Table B to fail with the following error:
Transaction 'uuid-of-transaction', id 'a-very-long-integer-id', is being committed, SQL execution canceled.
Call END_OPERATION(999,'String1','String2','String3','String4','Success','0')
UPDATE TableB SET BATCH_KEY = 1234, LOAD_DT = current_timestamp::timestamp_ntz, KEY_HASH = MD5(TO_VARCHAR(ARRAY_CONSTRUCT(col1))), ROW_HASH = MD5(TO_VARCHAR(ARRAY_CONSTRUCT(col2, col3))) WHERE BATCH_KEY = -1 OR BATCH_KEY IS NULL;
The code for END_OPERATION is here:
var cmd =
"CALL END_OPERATION(:1,:2,:3,:4,:5,:6,null);";
try {
snowflake.execute (
{sqlText: cmd,binds: [BATCH_KEY,ENTITY,LAYER,SRC,OPERATION,OPERATION_STATUS].map(function(param){return param === undefined ? null : param})},
);
return "Succeeded.";
}
catch (err) {
return "Failed: " + err;
}
var cmd =
"UPDATE TableA SET OPERATION_STATUS=:6,END_DT=current_timestamp,ROW_COUNT=IFNULL(:7,ROW_COUNT) WHERE BATCH_KEY=:1 AND ENTITY_NAME=:2 AND LAYER_NAME=:3 AND SRC=:4 AND OPERATION_NAME=:5";
try {
snowflake.execute (
{sqlText: cmd,binds: [BATCH_KEY,ENTITY,LAYER,SRC,OPERATION,OPERATION_STATUS,ROW_COUNT].map(function(param){return param === undefined ? null : param})},
);
return "Succeeded.";
}
catch (err) {
return "Failed: " + err;
}
I'm failing to understand why the UPDATE statement against TableB is getting killed. It's getting killed nearly immediately.

Here we need to review the flow of all SQL statements coming from the Talend job within the same session in which the failing SQL command is run as well as all the statements coming from the other parallel job.
From the Query History we can get the SessionID of the session. From the History section of the Snowflake UI we can make a search based upon the SessionID. This will list all the commands run through this particular session.
We can review all the commands in their chronological order by sorting over the start_date column and try to observe the sequence of SQL statements.
Your point is indeed valid that an update on TableA should not affect an Update on TableB but after reviewing all the statements of both the sessions (we read that the Talend job is running a couple of dataflows in parallel) we may come across some SQL statement in one session which has taken a lock on tableB before the Update command is submitted against it from the other session.
Another thing which can be reviewed here is how the transaction is managed by the workflow. Within the same list of SQL queries in that session we need to check for any statements which sets the parameter Autocommit at the session level. If Autocommit it set to FALSE at the start of the session then the session will not release any of the table locks until an explicit commit is submitted.
Since the situation here sounds a bit unusual and complex, we may have to dig a little more deeper to review the execution logs of both the queries and for that we may have to contact the Snowflake support.

Related

Is there a Denodo 8 VQL function or line of VQL for throwing an error in a VDP scheduler job?

My goal is to load a cache when there is new data available. Data is loaded into the source table once a day but at an unpredictable time.
I've been trying to set up a data availability trigger VDP scheduler job like described in this Denodo community post:
https://community.denodo.com/answers/question/details?questionId=9060g0000004FOtAAM&title=Run+Scheduler+Job+Based+on+Value+from+a+Query
The post describes creating a scheduler job to fail whenever the condition is not satisfied. Now the only way I've found to force an error on certain conditions is to just use (1/0) and this doesn't always work for some reason. I was wondering if there is way to do this with a function like in normal SQL, couldn't find anything in the Denodo documentation.
This is what my code currently looks like:
--Trigger job
SELECT CASE
WHEN (
data_in_cache = current_data
)
THEN 1 % 0
ELSE 1
END
FROM database.table;
The cache job waits for the trigger job to be successful so the cache will only load when the data in the cache is outdated. This doesn't always work even though I feel it should.
Hoping someone has a function or line of VQL to make Denodo scheduler VDP job result in an error.
This would be easy by creating a custom function that, when executed, just throws an Exception. It doesn't need to be an Exception, you could create your own Exception to see it in the error trace. In any case, it could be something like this...
#CustomElement(type = CustomElementType.VDPFUNCTION, name = "ERROR_SAMPLE_FUNCTION")
public class ErrorSampleVdpFunction {
#CustomExecutor
public CustomArrayValue errorSampleFunction() throws Exception {
throw new Exception("This is an error");
}
}
So you will use it like:
--Trigger job SELECT CASE WHEN ( data_in_cache = current_data ) THEN errorSampleFunction() ELSE 1 END FROM database.table;

DbUpdateConcurrencyException on inserting a new row in SQL Server using EF Core (expected to affect 1 row(s) but actually affected 0 row(s))

I am trying to insert data in one table using Ef core 5 with repository pattern and Unit of work.
Code Sample :
var stateData = new State
{
StateId = state.StateId,
Action = state.Action,
Event = state.Event,
ExecutedOn = DateTime.Now
};
_unitOfWork.GetRepository<State>().Add(stateData);
var result = _unitOfWork.Commit();
GetRepository method used to get respective repo:
{
return (IRepository<TEntity>)GetOrAddRepository(typeof(TEntity), new
Repository<TEntity>(Context));
}
Commit Method :
{
return Context.SaveChanges();
}
I am trying to insert data in state table which has Id as primary and identity column. Rest other columns are StateId,Action,Event and ExecutedOn(datatype : datetime2).
Application is running on multiple nodes. So there will be multiple insert request at a same time from multiple node but different data.
I am getting DbUpdateConcurrencyException frequently while inserting the records of states in DB. Sometimes it works but most of time I get DbUpdateConcurrencyException with message "Database operation expected to affect 1 row(s) but actually affected 0 row(s). Data may have been modified or deleted since entities were loaded. See http://go.microsoft.com/fwlink/?LinkId=527962 for information on understanding and handling optimistic concurrency exceptions".
There is no update operation but still I am getting concurrency exception.
I have tried all others solutions on similar questions but no luck.

Doctrine deadlock with ORM updates

I'm trying to figure out what is causing deadlocks in my Symfony 2 application. I'm running a cronjob that does batch-updates on a fairly large dataset and one part of it causes this error:
Doctrine\DBAL\DBALException: An exception occurred while executing
'UPDATE SpotEvent SET ts = ?, current = ? WHERE id = ?' with params
["2015-12-28 00:35:27", 1, 39316]: SQLSTATE[40P01]: Deadlock
detected: 7 ERROR: deadlock detected DETAIL: Process 32030 waits for
ShareLock on transaction 2130787; blocked by process 32029. Process
32029 waits for ShareLock on transaction 2130786; blocked by process
32030. HINT: See server log for query details. CONTEXT: while updating tuple (105,68) in relation "spotevent" (uncaught exception)
at
/home/maf/symfony/vendor/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php
line 91 while running console command
The code causing it is basically this:
check event
if (already in database) {
update timestamp
} else {
create new
}
From what I see in the error, the first branch causes the deadlock, but from what I read about deadlocks, the second should be more likely. In any case I don't understand why I have a deadlock at all.
I should say I am running this job in 6 parallel processes. However, there is no overlap between them (i.e. job one is checking from 1-200, job 2 from 201 to 400, etc.)
I'm using PostgreSQL as the database backend. My "check event" step is done using DQL, everything else is pure ORM.

Redshift/Java: SQL execute hangs and never returns

The application that im working on runs a sequence of queries on AWS Redshift. Some of the queries take longer to execute due to the data volume.
The queries seem to finish on Redshift when i check the execution details on the server. However, the java application seems to hang indefinitely without throwing any exception or even terminating.
Here's the code that executes the query.
private void execSQLStrings(String[] queries, String dataset, String dbType) throws Exception {
Connection conn = null;
if (dbType.equals("redshift")) {
conn=getRedshiftConnection();
} else if (dbType.equals("rds")){
conn=getMySQLConnection();
}
Statement stmt=conn.createStatement();
String qry=null;
debug("Query Length: " + queries.length);
for (int ii=0;ii<queries.length;++ii) {
qry=queries[ii];
if (dataset != null) {
qry=qry.replaceAll("DATASET",dataset);
}
debug(qry);
stmt.execute(qry);
}
stmt.close();
conn.close();
}
I cant post the query that im running at the moment but it has multiple table joins and group by conditions and its an 800m row table. The query takes about 7~8 mins to complete on the server.
You need to update the DSN Timeout and/ or KeepAlive settings to make sure that your connections stay alive.
Refer: http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-firewall-guidance.html

Using SQL dB column as a lock for concurrent operations in Entity Framework

We have a long running user operation that is handled by a pool of worker processes. Data input and output is from Azure SQL.
The master Azure SQL table structure columns are approximated to
[UserId, col1, col2, ... , col N, beingProcessed, lastTimeProcessed ]
beingProcessed is boolean and lastTimeProcessed is DateTime. The logic in every worker role is as shown below and with multiple workers processing (each with their own Entity Framework layer), in essence beingProcessed is being used a lock for MutEx purposes
Question: How can I deal with concurrency issues on the beingProcessed "lock" itself based on the above load? I think read-modify-write operation on the beingProcessed needs to be atomic but I'm open to other strategies. Open to other code refinements too.
[Update]: I wonder if TransactionScope is what's needed here ... http://msdn.microsoft.com/en-US/library/system.transactions.transactionscope(v=vs.110).aspx
Code:
public void WorkerRoleMain()
{
while(true)
{
try
{
dbContext db = new dbContext();
// Read
foreach (UserProfile user in db.UserProfile
.Where(u => DateTime.UtcNow.Subtract(u.lastTimeProcessed)
> TimeSpan.FromHours(24) &
u.beingProcessed == false))
{
user.beingProcessed = true; // Modify
db.SaveChanges(); // Write
// Do some long drawn processing here
...
...
...
user.lastTimeProcessed = DateTime.UtcNow;
user.beingProcessed = false;
db.SaveChanges();
}
}
catch(Exception ex)
{
LogException(ex);
Sleep(TimeSpan.FromMinutes(5));
}
} // while ()
}
What we usually do is this:
At the beginning of a long operation we start a transaction:
BEGIN TRANSACTION
Then we select a row from the table we would like to update/delete using these hints:
SELECT * FROM Table WITH (ROWLOCK, NOWAIT) Where ID = 123;
Then we check that we have the row. If the row is locked by another process there will be an SQL Error. In this case we rollback the transaction and advise the user.
If the record is locked we process the record, and do the required updates, using the same transaction object we used to lock the record:
UPDATE Table SET Col1='value' WHERE ID = 123;
Then we COMMIT the transaction.
COMMIT;
This is just the Pseudo-code of the process. You will have to implement it in your program.
One small note regarding the above process. When you lock the record in SQL Server (or Azure), use the primary key in your WHERE Clause, otherwise the SQL Server will decide to use a Page lock, or Table lock