I do have a webapp written in Java on Tomcat, all connections should be autoCommit=false by default. Now, if I do run SELECT statement only in a transaction. Do I still need to call commit() or is it sufficient just to close the connection?
For what it's worth: I am on Oracle 11.2.
There is a similar question but does not actually give an answer for this case.
It is sufficient to close the connection, no need to call commit or rollback.
But according to connection.close(), it is recommended to call either commit or rollback.
Select statements do not disturb the underlying model or the data contained within the model. It is safe to close the connection without calling any commands related to transactions (like commit).
Actually strike that. I had not considered adjacent selects made to a model in my first answer. Say you execute select id from users where age > 20 and follow it up with select id from users where age = 20, any updates made between these queries would affect the ACID nature of the selects and return duplicate results within the 2 queries. To guarantee consistent results you would need to wrap both selects in the same transaction with a commit().
So yes, It makes sense to commit your selects.
Related
Using MS SQL Server, a Trigger calls a Stored Procedure which internally makes a select, will the return values be the new or old ones?
I know that inside the trigger I can access them by FROM INSERTED i inner join DELETED, but in this case I want to reuse (cannot change it) an existing Stored Procedure that internally makes a select on the triggered table and processes some logic with them. I just want to know if I can be sure that the existing logic will work or not (by accessing the NEW values).
I can simply try to simulate it with one update... But maybe there are other cases (example: using transactions or something else) that I maybe not be aware and never test it that could result in a different case.
I decided to ask someone else that might know better. Thank you.
AFTER triggers (the default) fire after the DML action. When the proc is called within the trigger, the tables will reflect changes made by the statement that fired the trigger as well changes made within the trigger before calling the proc.
Note changes are uncommitted until the trigger completes or explict transaction later committed.
Since the procedure is running in the same transaction as the (presumably, "after") trigger, it will see the uncommitted data.
I hope you see the implications of that: the trigger is executing as part of the transaction started by the DML statement that caused it to fire, so the stored procedure is part of the same transaction, so a "complicated" stored procedure means that transaction stays open longer, holding locks longer, making responses back to users slower, etc etc.
Also, you said
internally makes a select on the triggered table and processes some logic with them.
if you just mean that the procedure is selecting the data in order to do some complex processing and then write it to somewhere else inside the database, ok, that's not great (for reasons given above), but it will "work".
But just in case you mean you are doing some work on the data in the procedure and then returning that back to the client application, Don't do that
The ability to return results from triggers will be removed in a future version of SQL Server. Triggers that return result sets may cause unexpected behavior in applications that aren't designed to work with them. Avoid returning result sets from triggers in new development work, and plan to modify applications that currently do. To prevent triggers from returning result sets, set the disallow results from triggers option to 1.
So, I can successfully run any SELECT statement, but doing any UPDATE statements just hang until they eventually time out. This occurs with trying to execute any stored procedures as well. Other users that connect to the database can run anything without running into this problem.
Is there a cache per user that I can dump or something along those lines? I usually get sick of waiting and cancel the operation, so I don't know if that has contributed to the problem or not.
Just for reference, it's things as simple as these:
UPDATE SOME_TABLE
SET SOME_COLUMN = 'TEST';
EXECUTE SOME_PROCEDURE(1234);
But this works:
SELECT * FROM SOME_TABLE; -- various WHERE clauses don't cause any problems.
UPDATE:
Probably a little disappointing for anyone who came here looking for an answer to a similar problem, but the issue ended up being twofold: The DBA didn't think it was important to give me many details, but there were limitations on the Oracle server that were intentionally set for procedures in general (temp space issues, and things of that ilk). And second, there was an update to the procedure that I wasn't aware of that'd run a sub-query for every record that's pulled in the query (thousands of records). That was removed and now it's running as expected.
In my experience this happens most often because there is another uncommitted operation on the table. For example: User 1 successfully issues an update but does not commit it or roll it back. User 2 (or even another session of User 1) issues another update which just hangs until the other pending update is committed or rolled back. You say that "other users" don't have the same problem, which makes me wonder if they are committing their changes. And if so, if they are updating the same table or a different one.
I have a very large Redshift database that contains billions of rows of HTTP request data.
I have a table called requests which has a few important fields:
ip_address
city
state
country
I have a Python process running once per day, which grabs all distinct rows which have not yet been geocoded (do not have any city / state / country information), and then attempts to geocode each IP address via Google's Geocoding API.
This process (pseudocode) looks like this:
for ip_address in ips_to_geocode:
country, state, city = geocode_ip_address(ip_address)
execute_transaction('''
UPDATE requests
SET ip_country = %s, ip_state = %s, ip_city = %s
WHERE ip_address = %s
''')
When running this code, I often receive errors like the following:
psycopg2.InternalError: 1023
DETAIL: Serializable isolation violation on table - 108263, transactions forming the cycle are: 647671, 647682 (pid:23880)
I'm assuming this is because I have other processes constantly logging HTTP requests into my table, so when I attempt to execute my UPDATE statement, it is unable to select all rows with the ip address I'd like to update.
My question is this: what can I do to update these records in a sane way that will stop failing regularly?
Your code is violating the serializable isolation level of Redshift. You need to make sure that your code is not trying to open multiple transactions on the same table before closing all open transactions.
You can achieve this by locking the table in each transaction so that no other transaction can access the table for updates until the open transaction gets closed. Not sure how your code is architected (synchronous or asynchronous), but this will increase the run time as each lock will force others to wait till the transaction gets over.
Refer: http://docs.aws.amazon.com/redshift/latest/dg/r_LOCK.html
Just got the same issue on my code, and this is how I fixed it:
First things first, it is good to know that this error code means you are trying to do concurrent operations in redshift. When you do a second query to a table before the first query you did moments ago was done, for example, is a case where you would get this kind of error (that was my case).
Good news is: there is a simple way to serialize redshift operations! You just need to use the LOCK command. Here is the Amazon documentation for the redshift LOCK command. It works basically making the next operation wait until the previous one is closed. Note that, using this command your script will naturally get a little bit slower.
In the end, the practical solution for me was: I inserted the LOCK command before the query messages (in the same string, separated by a ';'). Something like this:
LOCK table_name; SELECT * from ...
And you should be good to go! I hope it helps you.
Since you are doing a point update in your geo codes update process, while the other processes are writing to the table, you can intermittently get the Serializable isolation violation error depending on how and when the other process does its write to the same table.
Suggestions
One way is to use a table lock like Marcus Vinicius Melo has suggested in his answer.
Another approach is to catch the error and re run the transaction.
For any serializable transaction, it is said that the code initiating the transaction should be ready to retry the transaction in the face of this error. Since all transactions in Redshift are strictly serializable, all code initiating transactions in Redshift should be ready to retry them in the face of this error.
Explanations
The typical cause of this error is that two transactions started and proceeded in their operations in such a way that at least one of them cannot be completed as if they executed one after the other. So the db system chooses to abort one of them by throwing this error. This essentially gives control back to the transaction initiating code to take an appropriate course of action. Retry being one of them.
One way to prevent such a conflicting sequence of operations is to use a lock. But then it restricts many of the cases from executing concurrently which would not have resulted in a conflicting sequence of operations. The lock will ensure that the error will not occur but will also be concurrency restricting. The retry approach lets concurrency have its chance and handles the case when a conflict does occur.
Recommendation
That said, I would still recommend that you don't update Redshift in this manner, like point updates. The geo codes update process should write to a staging table, and once all records are processed, perform one single bulk update, followed by a vacuum if required.
Either you start a new session when you do second update on the same table or you have to 'commit' once you transaction is complete.
You can write set autocommit=on before you start updating.
I have a vendor reporting product executing queries to pull report data, no inserts, no updates just reading data.
We have double our heap size 3 times and are now at 1024 4k pages, The app will run fine for a week then we will begin to see DB2 SQL error: SQLCODE: -954, SQLSTATE: 57011 indicating the transaction log is not able to accomodate the request.
Its not the size of the reports since they run fine after a recycle. I spoke with another DBA on this. He believe the problem was in a difference between ORACLE and DB2 in that the vendor code is crappy and not issuing commits on the selects. This is causing the references to not be cleaned up and is slowly accumulating as garbage in the heap.
I wanted to know if this is accurate as I thought only inserts and updates needed to have commits included. Is there any IBM documentation on this?
We are currently recycling on a weekly basis to alleviate the problem but I would like to have a good handle on the issue before going back to the vendor asking them to alter their code.
Any transaction needs to be properly terminated -- why did you think that only applies to inserts and updates? Consider running transactionally a "select a from b where c > 12" and then "select a from b where c <= 12"; within a transaction the DB has to guarantee that every a gets returned exactly once either from the first or second select, not both (assuming c is never null;-). Without transactionality, some a's might fall between the cracks or be returned twice if their corresponding c was changed by a different transaction, and that's just not ACID!-)
So when you do not need separate SELECT queries to be transactional wrt each other, tell the DB! And the way you tell, is by terminating the transaction after each select (normally commit is what you use for the purpose, though I guess you could, indifferently, choose to use rollback here;-).
Per Alex's response, the first SQL activity after any CONNECT, COMMIT, or ROLLBACK initiates a transaction.
To get a handle on your resource issue (transaction logs full), you should investigate your application that issues the reports - ensure that transactions are being closed out explicitly in code. I've seen cases where application developers rely upon the Garbage Collector to clean up database objects - while those objects are waiting for cleanup, the database resources (transactions) are held open.
It's always good practice to explicitly COMMIT or ROLLBACK your transactions as soon as you are done with the data - regardless of the programming methodology you use.
I get this error when committing transaction on a SELECT query, but despite the error it does return a Result-Set that include queried data.
tran.Commit();
error [hy011] [ibm] cli0126e the operation is invalid sqlstate=hy011
I changed my code to tran.Rollback(); and the error disapered.
Can anyone explain this behavior?
In Firebird 2.0, is using an explicit transaction faster on a SELECT command than executing the command with an implicit one?
All SQL commands (SELECT, INSERT, UPDATE etc.) can be executed ONLY within some transaction. You cannot run a command with out transaction being started prior to it.
Explicit and Implicit transaction are a feature of the component set you're using to access the database, not a feature of Firebird itself. As mentioned before, Firebird always does everything within a transaction. This has a couple of implications for you:
Using a "Implicit" transaction can't be faster then using a "Explicit" transaction because from Firebird's point of view, a transaction is a transaction, doesn't matter who started it.
Getting the best performance sometimes requires fine control over "Commits". While the "Implicit" transaction can't be faster then the "Explicit" transaction, the Explicit might be faster because you can control your StartTransactions and Commits. While you usually want to do all updates to a database within one transaction (so they all succeed or fail as a set) you sometimes want to split operations into multiple groups: If you need to bulk-insert many-many records, you probably want to Commit one every 1000 records or so.
Firebird cannot execute SQL commands without a transaction.
PS: You get the best performance results if you commit transactions, rather than rolling them back. Even if you only called SELECT and changed nothing.
Besides what was already said, take into account that the transaction can be:
Read-Write
Read-Only
For a SELECT it would be best to use a Read-Only transaction
PS: There are other types of transactions but this two are the important ones for this topic.
Usually transaction adds some overhead. However, you should be careful if you do not have some default transaction started when you connect to Firebird.
In my experience the implicit transactions tend to default to Auto commit Retaining, so they should be slower. You can always change the default behaviour.
But I would recommend using explicit transactions as Commit Retaining may cause you grief further down the line if it blocks too many transactions. If it does then access to Firebird can slow down dramatically as it traverses through all the held-up/blocked transactions to determine the correct value of the data.
Here are some discussions on it
http://forums.devshed.com/firebird-sql-development-61/difference-active-transaction-863103.html
http://www.slideshare.net/ibsurgeon/3-how-transactionswork