Using result from jedis transaction response before executing transaction - redis

Suppose I have a transaction t. I want to get something from the database that would have been inserted previously in this transaction and use that value in a subsequent operation in the same transaction as so:
byte[] data = t.get(key).get();
t.set(other_key, data);
However in Jedis, when I try this, I get a JedisDataException telling me that I need to execute the transaction before calling get() on the transaction response. Is there a way that I can use the result from the query in the same transaction without executing it.

No, you can't with Redis transaction. Redis's transaction is basically you send all the commands to Redis in one go, Redis executes these commands while commands from other connections wait for it to finish. After it finishes, it returns the results for you. Note, it doesn't return the result to you in the middle of the transaction, unlike MySQL. So you can't read the result of a query in the middle.
To do what you want, you'll need Lua Script( it takes about ten minutes to learn to write what you were asking for), it operates on the server side(doesn't need to return the result to the client in the middle of an operation).
You might want to read these
Redis Transaction
Redis Lua Script

Related

Retrieving ids of batch inserted rows in SQLite

I'm using SQLite's last_insert_rowid() to grab the last inserted row ID following a batch insert. Is there any risk of race conditions that could cause this value to not return the last id of the batch insert? For example, is it possible that in between the completion of the insert and the calling of last_insert_rowid() some other process may have written to the table again?
last_insert_rowid() is connection-dependent, so there is a risk when multiple threads are using the same connection, without SQLite switched to Serialized threading mode.
last_insert_rowid() returns information about the last insert done in this specific connection; it cannot return a value written by some other process.
To ensure that the returned value corresponds to the current state of the database, take advantage of SQLite's ACID guarantees (here: atomicity): wrap the batch inserts, the last_insert_rowid() call, and whatever you're doing with the ID inside a single transaction.
In any case, the return value of last_insert_rowid() changes only when some insert is done through this connection, so you should never access the same connection from multiple threads, or if you really want to do so, manually serialize entire transactions.

Redshift: Serializable isolation violation on table

I have a very large Redshift database that contains billions of rows of HTTP request data.
I have a table called requests which has a few important fields:
ip_address
city
state
country
I have a Python process running once per day, which grabs all distinct rows which have not yet been geocoded (do not have any city / state / country information), and then attempts to geocode each IP address via Google's Geocoding API.
This process (pseudocode) looks like this:
for ip_address in ips_to_geocode:
country, state, city = geocode_ip_address(ip_address)
execute_transaction('''
UPDATE requests
SET ip_country = %s, ip_state = %s, ip_city = %s
WHERE ip_address = %s
''')
When running this code, I often receive errors like the following:
psycopg2.InternalError: 1023
DETAIL: Serializable isolation violation on table - 108263, transactions forming the cycle are: 647671, 647682 (pid:23880)
I'm assuming this is because I have other processes constantly logging HTTP requests into my table, so when I attempt to execute my UPDATE statement, it is unable to select all rows with the ip address I'd like to update.
My question is this: what can I do to update these records in a sane way that will stop failing regularly?
Your code is violating the serializable isolation level of Redshift. You need to make sure that your code is not trying to open multiple transactions on the same table before closing all open transactions.
You can achieve this by locking the table in each transaction so that no other transaction can access the table for updates until the open transaction gets closed. Not sure how your code is architected (synchronous or asynchronous), but this will increase the run time as each lock will force others to wait till the transaction gets over.
Refer: http://docs.aws.amazon.com/redshift/latest/dg/r_LOCK.html
Just got the same issue on my code, and this is how I fixed it:
First things first, it is good to know that this error code means you are trying to do concurrent operations in redshift. When you do a second query to a table before the first query you did moments ago was done, for example, is a case where you would get this kind of error (that was my case).
Good news is: there is a simple way to serialize redshift operations! You just need to use the LOCK command. Here is the Amazon documentation for the redshift LOCK command. It works basically making the next operation wait until the previous one is closed. Note that, using this command your script will naturally get a little bit slower.
In the end, the practical solution for me was: I inserted the LOCK command before the query messages (in the same string, separated by a ';'). Something like this:
LOCK table_name; SELECT * from ...
And you should be good to go! I hope it helps you.
Since you are doing a point update in your geo codes update process, while the other processes are writing to the table, you can intermittently get the Serializable isolation violation error depending on how and when the other process does its write to the same table.
Suggestions
One way is to use a table lock like Marcus Vinicius Melo has suggested in his answer.
Another approach is to catch the error and re run the transaction.
For any serializable transaction, it is said that the code initiating the transaction should be ready to retry the transaction in the face of this error. Since all transactions in Redshift are strictly serializable, all code initiating transactions in Redshift should be ready to retry them in the face of this error.
Explanations
The typical cause of this error is that two transactions started and proceeded in their operations in such a way that at least one of them cannot be completed as if they executed one after the other. So the db system chooses to abort one of them by throwing this error. This essentially gives control back to the transaction initiating code to take an appropriate course of action. Retry being one of them.
One way to prevent such a conflicting sequence of operations is to use a lock. But then it restricts many of the cases from executing concurrently which would not have resulted in a conflicting sequence of operations. The lock will ensure that the error will not occur but will also be concurrency restricting. The retry approach lets concurrency have its chance and handles the case when a conflict does occur.
Recommendation
That said, I would still recommend that you don't update Redshift in this manner, like point updates. The geo codes update process should write to a staging table, and once all records are processed, perform one single bulk update, followed by a vacuum if required.
Either you start a new session when you do second update on the same table or you have to 'commit' once you transaction is complete.
You can write set autocommit=on before you start updating.

What is the difference between sql batch and transaction in orientdb?

I have read through the documentation, and it seems that a SQL BATCH command and a transaction accomplish the same purpose, that is committing all statements as an all-or-nothing transaction.
Is this correct, or am I missing something?
I am using Orient through the PhpOrient language binding, and see that it supports both transactions and batches, but I am using SQL exclusively and would like to perform transactions using SQL only. It seems the same from my testing, but I wanted to confirm.
SQL Batch
a) SQL Batch is just that a collection of commands that need to be executed without guaranteed of success or fail.
b) Batch Processing means things are put into queue and it is processed when a certain amount if items is reached, or when a certain period has passed. You can do undo/rollback in this.
In BATCH PROCESSING, the bank would just queue xyz's request to deposit amount. The bank would just put your request in queue with all the other requests and process them at the end of the day or when they reach a certain amount.
SQL Transaction
a) SQL Transaction is a collection of commands that are guaranteed to succeed or fail totally.Transactions won't complete half the commands and then fail on the rest, if one fails they all fail.
b) Transaction is like real time processing that allows you to rollback/undo changes.
In TRANSACTIONS, it's just like the batch, but you have the option to "cancel" it.
transaction
Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database, either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back.
Database transactions, as implemented by InnoDB, have properties that are collectively known by the acronym ACID, for atomicity, consistency, isolation, and durability.
Mysql Manual

Call commit on autoCommit=false connection for SELECT statements JDBC?

I do have a webapp written in Java on Tomcat, all connections should be autoCommit=false by default. Now, if I do run SELECT statement only in a transaction. Do I still need to call commit() or is it sufficient just to close the connection?
For what it's worth: I am on Oracle 11.2.
There is a similar question but does not actually give an answer for this case.
It is sufficient to close the connection, no need to call commit or rollback.
But according to connection.close(), it is recommended to call either commit or rollback.
Select statements do not disturb the underlying model or the data contained within the model. It is safe to close the connection without calling any commands related to transactions (like commit).
Actually strike that. I had not considered adjacent selects made to a model in my first answer. Say you execute select id from users where age > 20 and follow it up with select id from users where age = 20, any updates made between these queries would affect the ACID nature of the selects and return duplicate results within the 2 queries. To guarantee consistent results you would need to wrap both selects in the same transaction with a commit().
So yes, It makes sense to commit your selects.

BeginTransaction not calling BEGIN TRANSACTION

I've got a simple bit of code that uses BeginTransaction(). The resulting transaction is assigned to the connection that I'm using for some sql commands.
When I profile the resulting sql, I don't see a BEGIN TRANSACTION at any point. What might be happening that would prevent the transaction from being used?
Transactions are handled at a lower level when using ADO.NET. There are no "BEGIN TRANSACTION" statements sent to the server.
You need to ensure that you not only set the transaction on the connection object, but you also need to assign the transaction into the sqlCommand.
See this codeproject article for an example.
To reiterate Philippe's statement:
Transactions are handled at a lower level when using ADO.NET. There are no "BEGIN TRANSACTION" statements sent to the server.
At some point SQL has to be converted into actual calls. Most ADO.NET (all that I've worked with) often send a database specific command to BEGIN, COMMIT, and ROLLBACK transactions as sending ASCII (or whatever else) would be less efficient than something the server will have to parse.
This is why sending parameterised queries are often faster than pure SQL based ones as the library can send specific commands which results in less parsing and probably less data validation (?).
HTH!