What is the difference between sql batch and transaction in orientdb? - sql

I have read through the documentation, and it seems that a SQL BATCH command and a transaction accomplish the same purpose, that is committing all statements as an all-or-nothing transaction.
Is this correct, or am I missing something?
I am using Orient through the PhpOrient language binding, and see that it supports both transactions and batches, but I am using SQL exclusively and would like to perform transactions using SQL only. It seems the same from my testing, but I wanted to confirm.

SQL Batch
a) SQL Batch is just that a collection of commands that need to be executed without guaranteed of success or fail.
b) Batch Processing means things are put into queue and it is processed when a certain amount if items is reached, or when a certain period has passed. You can do undo/rollback in this.
In BATCH PROCESSING, the bank would just queue xyz's request to deposit amount. The bank would just put your request in queue with all the other requests and process them at the end of the day or when they reach a certain amount.
SQL Transaction
a) SQL Transaction is a collection of commands that are guaranteed to succeed or fail totally.Transactions won't complete half the commands and then fail on the rest, if one fails they all fail.
b) Transaction is like real time processing that allows you to rollback/undo changes.
In TRANSACTIONS, it's just like the batch, but you have the option to "cancel" it.

transaction
Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database, either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back.
Database transactions, as implemented by InnoDB, have properties that are collectively known by the acronym ACID, for atomicity, consistency, isolation, and durability.
Mysql Manual

Related

Do default Tranactions in postgresql provide any benefit when only the last statement is writing?

I just learned that in Postgresql the default transaction isolation level is "Read committed". I'm very used to MySQLs "REPEATABLE READ" isolation level. In postgresql by my understanding this means in a default transaction "two successive SELECT commands can see different data". With that in mind, is there any benefit to transactions when only the last statement in the transaction is writing?
The transaction does not prevent you from data changing between statements, the only benefit I see is rolling the transaction back on failure. But if only one writing statement exists at the end, then that would happen anyway.
To make a bit more clear what I'm referring to, lets take a generic simple sequence of (pseudo) queries to a table:
BEGIN TRANSACTION
SELECT userId FROM users WHERE username = "the provided username"
INSERT INTO activites (activity, user_fk) VALUES ("posted on SO", userId)
COMMIT
In this sequence and any general sequence of statments where only the last statement is writing, is there a benefit in postgresql to using a transaction with the default isolation level?
Bonus question, is there any overhead from it?
The difference between READ COMMITTED and REPEATABLE READ is that the former takes a new database snapshot for each statement, while the latter takes a snapshot only for the first SQL statement and uses that snapshot for the whole transaction. This implies that REPEATABLE READ actually performs better that READ COMMITTED, since it takes fewer snapshots.
The disadvantage of REPEATABLE READ is that you can get serialization errors. That does not affect your example, but if you had an UPDATE instead of an INSERT, it could be that the row you are trying to update has been modified by a concurrent transaction since the snapshot was taken. The serialization error that causes would mean that you have to repeat the transaction. Another disadvantage of REPEATABLE READ transactions is that a long-running read-only transaction can hinder the progress of VACUUM, which it wouldn't do in READ COMMITTED mode.
For read-only transactions or transactions like the one you are showing, REPEATABLE READ is often the better isolation level. The nice thing about READ COMMITTED is that you can get no serialization errors apart from deadlocks.
To explicitly answer your question: there is no advantage to running the statement from your example in a single transaction. You may as well use the default autocommit mode to run them in separate transactions.
Incidentally, the SQL standard decrees that the default transaction isolation level be SERIALIZABLE, but I don't know any database that implements that.

Delete and Insert Inside one Transaction SQL

I just want to ask if it is always the first query will be executed when encapsulate to a transaction? for example i got 500 k records to be deleted and 500 k to be inserted, is there a possibility of locking?
Actually I already test this query and it works fine but i want to make sure if my assumption is correct.
Note: this will Delete and Insert the same record with possible update on other columns.
BEGIN TRAN;
DELETE FROM OUTPUT TABLE WHERE ID = (1,2,3,4 etc)
INSERT INTO OUTPUT TABLE Values (1,2,3,4 etc)
COMMIT TRAN;
Within a transaction all write locks (all locks acquired for modifications) must obey the strict two phase locking rule. One of the consequences is that a write (X) lock acquired in a transaction cannot be released until the transaction commits. So yes, the DELETE and INSERT will execute sequentially and all locks acquired during the DELETE will be retained while executing the INSERT.
Keep in mind that deleting 500k rows in a transaction will escalate the locks to one table lock, see Lock Escalation.
Deleting 500k rows and inserting 500k rows in a single transaction, while maybe correct, is a bad idea. You should avoid such large units of works, long transaction, if possible. Long transactions pin the log in place, create blocking and contention, increase recovery and DB startup time, increase SQL Server resource consumption (locks require memory).
You should consider doing the operation in small batches (perhaps 10000 rows at time), use MERGE instead of DELETE/INSERT (if possible) and, last but not least, consider a partitioned sliding window
implementation, see How to Implement an Automatic Sliding Window in a Partitioned Table.
From the documentation on TRANSACTION (emphasis mine):
BEGIN TRANSACTION represents a point at which the data referenced by a
connection is logically and physically consistent. If errors are
encountered, all data modifications made after the BEGIN TRANSACTION
can be rolled back to return the data to this known state of
consistency. Each transaction lasts until either it completes without
errors and COMMIT TRANSACTION is issued to make the modifications a
permanent part of the database, or errors are encountered and all
modifications are erased with a ROLLBACK TRANSACTION statement.
BEGIN TRANSACTION starts a local transaction for the connection
issuing the statement. Depending on the current transaction isolation
level settings, many resources acquired to support the Transact-SQL
statements issued by the connection are locked by the transaction
until it is completed with either a COMMIT TRANSACTION or ROLLBACK
TRANSACTION statement. Transactions left outstanding for long periods
of time can prevent other users from accessing these locked resources,
and also can prevent log truncation.
Although BEGIN TRANSACTION starts a local transaction, it is not
recorded in the transaction log until the application subsequently
performs an action that must be recorded in the log, such as executing
an INSERT, UPDATE, or DELETE statement. An application can perform
actions such as acquiring locks to protect the transaction isolation
level of SELECT statements, but nothing is recorded in the log until
the application performs a modification action.

batch procedure, when to commit transactions?

I'm pretty new to PL-SQL although I've got lots of db experience with other RDBMS's. Here's my current issue.
procedure CreateWorkUnit
is
update workunit
set workunitstatus = 2 --workunit loaded
where
SYSDATE between START_DATE and END_DATE
and workunitstatus = 1 --workunit created;
--commit here?
call loader; --loads records based on status, will have a commit of its own
update workunit wu
set workunititemcount = (select count(*) from workunititems wui where wui.wuid = wu.wuid)
where workunitstatus = 2
So the behaviour I'm seeing, with or without commit statements is that I have to execute twice. Once to flip the statuses, then the loader will run on the second execution. I'd like it all to run in one go.
I'd appreciate any words of oracle wisdom.
Thanks!
When to commit transactions in a batch procedure? It is a good question, although it only seems vaguely related to the problems with the code you post. But let's answer it anyway.
We need to commit when the PL/SQL procedure has completed a unit of work. A unit of work is a business transaction. This would normally be at the end of the program, the last statement before the EXCEPTION section.
Sometimes not even then. The decision to commit or rollback properly lies with the top of the calling stack. If our PL/SQL is being called from a client (may a user clicking a button on a screen) then perhaps the client should issue the commit.
But it is not unreasonable for a batch process to manage its own commit (and rollback in the case of errors). But the main point is that the only the toppermost procedure should issue COMMIT. If a procedure calls other procedures those called programs should not issue commits or rollbacks. If they should handle any errors (log etc) and re-raise them to the calling program. Let it decode whether to rollback. Because all the called procedures run in the same session and hence the same transaction: a rollback in a called program will revert all the changes in the batch process. That's not right. The same reasoning applies to commits.
You will sometimes read advice on using intermittent commits to break up long running processes into smaller units e.g. every 1000 inserts. This is bad advice for several reasons, not all of them related to transactions. The pertinent ones are:
Issuing a commit frees locks on resources. This is the cause of ORA-1555 Snapshot too old errors.
It also affects read consistency, which only applies at the statement and/or transaction level. This is the cause of ORA-1002 Fetch out of sequence errors.
It affects re-startability. If the program fails having processed 30% of the records, can we be confident it will only process the remaining 70% when we re-run the batch?
Once we commit records other sessions can see those changes: does it make sense for other users to see a partially changed view of the data?
So, the words of "Oracle wisdom" are: always align the database transaction with the business transaction, with a single commit per unit of work.
Somebody mentioned autonmous transactions as a way of issuing commits in sub-processes. This is usually a bad idea. Changes made in an autonomous transaction are visible to other sessions but not to our own. That very rarely makes sense. It also creates the same problems with re-startability which I discussed earlier.
The only acceptable use for automomous transactions is recording activity (error log, trace, audit records). We need that data to persist regardless of what happens in the wider transaction. Any other use of the pragma is almost certainly a workaround for a porr design, which actually just makes the problem worse.
You may not need to commit in pl/sql procedure. the procedures that you call inside another procedure will use same session so you don't need to commit. by the way procedure must completely rollback if it session rollbacked or has an exception.
I mis-classfied my problem. I thought this was a transaction problem and really it was one of my flags not being set as expected.A number field was null when I was expecting 0.
Sorry for that.
Josh Robinson

With TSQL SNAPSHOT ISOLATION, is it possible to update only non-locked rows?

A large SQL Server 2008 table is normally being updated in (relatively) small chunks using a SNAPSHOT ISOLATION transaction. Snapshot works very well for those updates since the chunks never overlap. These updates aren't a single long running operation, but many small one-row insert/update grouped by the transaction.
I would like a lower priority transaction to update all the rows which aren't currently locked. Does anyone know how I can get this behavior? Will another SNAPSHOT ISOLATION transaction fail as soon as it a row clashes, or will it update everything it can before failing?
Could SET DEADLOCK_PRIORITY LOW with a try-catch be of any help? Maybe in a retry loop with a WHERE which targets only rows which haven't been updated?
Snapshot isolation doesn't really work that way; the optimistic locking model means it won't check for locks or conflicts until it's ready to write/commit. You also can't set query 'priority' per se, nor can you use the READPAST hint on an update.
Each update is an implicit atomic transaction so if 1 update out of 10 fails (in a single transaction) they all roll back.
SET DEADLOCK_PRIORITY only sets a preference for which transaction is rolled back in the event of a dealdlock (otherwise the 'cheapest' rollback is selected).
A try-catch is pretty much a requirement if you're expecting regular collisions.
The retry loop would work as would using a different locking model and the NOWAIT hint to skip queries that would be blocked.
SNAPSHOT ISOLATION transaction fails as soon as it encounters an update conflict. However, I would use some queue outside the database to prioritize updates.

When is a row actually inserted into DB?

When is a row actually inserted into the database? Is it when "INSERT" statement is finished? or when "COMMIT" statement is finished after "INSERT" statement?
Later than you think. The principles here apply generally.
The whole point of the transaction log is to ensure ACID works in case of a power failure just as the INSERT finishes. The INSERT will be rolled forward or rolled back as part of the recovery phase (in most RDBMS)
So, it's more important that the transaction log entry is acknowledged as stored on the media. Then the INSERT can commit.
The data page containing the changed row will end up on disk eventually (checkpoint etc) but not necessarily at the point of successful commit.
However, the data page is in memory and available for use.
Note, an INSERT could cause a page split, indexes to be updated, triggers to fire etc so what I've said is simplified.
And it doesn't matter one way or the other when the data ends up on disk: as long as I can get the data and it's safe in case of, say, power failure
An oldie but still relevant for SQL Server: SQL Server 2000 I/O Basics
And what I've summarized is Write Ahead Logging
If you are running inside a transaction, when the transaction is committed. Otherwise, immediately.
Depends on the database/table implementation. It might just be when the transaction log is integrated - until which time the row is only inserted in the transaction log, and in memory.