Should I run VACUUM in transaction or after? - sql

I have a mobile application sync process. The transaction does a lot of modification on the database. Since this is done on mobile I need to issue a VACUUM to compact the database.
I am wondering when should I issue a VACUUM
in the transaction, as final statement
or after the transaction?
I am currently looking for SQLite, but if it's different for other engines, let me know in the answers (PostgreSQL, MySQL, Oracle, SQLServer)

Want it or not when using PostgreSQL you can't run VACUUM in transaction as stated in the manual:
VACUUM cannot be executed inside a transaction block.

I would say outside of the transaction. Certainly in PostgreSQL, VACUUM is designed to remove the "dead" tuples (i.e. the old row when a record has been changed or deleted.)
If you're running VACUUM in a transaction that has modified records, these dead rows won't have been marked for deletion.
Depending on which type of VACUUM you're doing, it may also require a table lock which will block if there are other transactions running, so you could potentially end up in a deadlock situation (transaction 1 is blocked waiting for a table lock to do its VACUUM, transaction 2 gets blocked waiting for a row to be released that transaction 1 has locked.)
I'd also recommend that this isn't done in an application (perhaps as a scheduled task) as it can take a while to complete and can negatively affect speed of other queries.
As for SQL Server, there is no VACUUM - what you're looking for is shrink. You can turn on auto shrink in 2005 which will automatically reclaim space when it the server decides, or issue a DBCC statement to shrink the database and log file, but this depends on your backup routine and strategy on a per-database level.

Vacuum is like defrag, it's good to do if youve recently deleted a lot of stuff, or maybe after youve inserted a lot of stuff, but by no means should you do it in every transaction. It's slower than almost any other database command and is more of a maintenance task.
We sometimes add/remove the majority of our db file, so then a vacuum would be a good idea, but I still would not consider it a part of the same transaction that did the work.

How frequently is the transaction run?
It's really a daily sort of process not a query by query process, but if you use it without full then it can be used in a transaction since it doesn't acquire a lock.
If your going to do it then it should be outside the transaction, since it is independent of the transactions data integrity.

Related

Kill delete sessions in informix

By mistake, I performed this query in informix using dbaccess session.
Delete from table #without where condition
Realizing my mistake, that I should have used TRUNCATE, I did another foolishness.
I killed the dbaccess session. But the table is exclusively locked and I am not able to do any action on that table.
What are the steps I can do to remove the lock and truncate the table.
1) Restart Informix server
2) onmode -z <sessionid> # Does not work.
I see hell lot of sessions created for the delete query
Is there any other easy way to fix this issue?
Assuming that you are not using Informix SE...
Is the database logged? If so, did you run the statement inside an explicit (BEGIN WORK) transaction?
Analysis
If you've got an unlogged database, then each row that the server's deleted is gone. If you stop the DELETE, it will not undo the partially complete changes. Using an unlogged database means that you do not want guaranteed statement level recovery.
If you've got a regular logged database and no explicit transaction, then the statement is probably still running after the DB-Access session is terminated. Because it is running as a singleton statement, it will complete and commit. Until it does that, if you forcibly take the server down, then fast recovery will rollback the statement (transaction). Given that I see '5 hours ago', I fear your chances of taking the server down in time now are limited.
If you've got a logged database with an explicit transaction, or a MODE ANSI database (where you're always in a transaction), then when the DELETE statement completes, the server will wait for the COMMIT, realize that the session connection is terminated, and will rollback the uncommitted work.
Recovery
If you've got an unlogged database, you can only recover to your last archive. Because it is unlogged, you can't recover it from the logical logs (but other databases in the same instance that are logged can be recovered up to the last logical log).
If you've got a logged database and you can take the server down — preferably under control, but crashing it if necessary — before the DELETE statement completes, then fast recovery will deal with the issue.
If the DELETE has completed and committed and you have good backups, you can consider a point-in-time restore of the database. It will take it offline while you do that (but if the data from the table is all missing, your DB is not going to be functional for a while).
If none of these scenarios applies, then you should contact IBM Technical Support, who may be able to perform minor (and not so minor) miracles.
But, as you may have noticed, a lot depends on the type of database (unlogged, logged, MODE ANSI) and whether there was an explicit transaction in effect when you ran the statement.
The trouble with DBMS is that they're trusting creatures. If you're authorized to do an operation, they assume that you intend to do what you say you want to do, and they go ahead and do it to the best of their ability. When you don't ask it to do what you intended to request, life gets tricky; the DBMS still trusts you and does what you actually asked it to do.

Firebird backup restore is frustrating, is there a way to avoid it?

I am using Firebird, but lately the database grows really seriously.
There is really a lot of delete statements running, as well update/inserts, and the database file size grows really fast.
After tons of deleting records the database size doesn't decrease, and even worse, i have the feeling that actually the query getting slowed down a bit.
In order to fix this a daily backup/restore process have been involved, but because of it's time to complete - i could say that it is really frustrating to use Firebird.
Any ideas on workarounds or solution on this will be welcome.
As well, I am considering switching to Interbase because I heard from a friend that it is not having this issue - it is so ?
We have a lot of huge databases on Firebird in production but never had an issue with a database growth. Yes, every time a record being deleted or updated an old version of it will be kept in the file. But sooner or later a garbage collector will sweap it away. Once both processes will balance each other the database file will grow only for the size of new data and indices.
As general precaution to prevent an enormous database growth try to make your transactions as short as possible. In our applications we use one READ ONLY transaction for reading all the data. This transaction is open through whole application life time. For every batch of insert/update/delete statements we use short separate transactions.
Slowing of database operations could be resulted from obsolete indices stats. Here you can find an example of how to recalculate statistics for all indices: http://www.firebirdfaq.org/faq167/
Check if you have unfinished transactions in your applications. If transaction is started but not committed or rolled back, database will have own revision for each transaction after the oldest active transaction.
You can check the database statistics (gstat or external tool), there's oldest transaction and the next transaction. If the difference between those numbers keeps growing, you have the stuck transaction problem.
There are also monitoring tools the check situation, one I've used is Sinatica Monitor for Firebird.
Edit: Also, database file doesn't shrink automatically ever. Parts of it get marked as unused (after sweep operation) and will be reused. http://www.firebirdfaq.org/faq41/
The space occupied by deleted records will be re-used as soon as it is garbage collected by Firebird.
If GC is not happening (transaction problems?), DB will keep growing, until GC can do its job.
Also, there is a problem when you do a massive delete in a table (ex: millions of records), the next select in that table will "trigger" the garbage collection, and the performance will drop until GC finishes. The only way to workaround this would be to do the massive deletes in a time when the server is not very used, and run a sweep after that, making sure that there are no stuck transactions.
Also, keep in mind that if you are using "standard" tables to hold temporary data (ie: info is inserted and delete several times), you can get corrupted database in some circumstances. I strongly suggest you to start using Global Temporary Tables feature.

SQL Server 2005 becomes blocked with no locked or locking processes

We have a database (let's call it database A) which becomes unusable every some days and we have to restart it. When I say unusable means all applications using it just block there waiting for the database to respond but it never does.
By luck it was noticed that executing a SELECT statement against a specific table using the SQL Server Management Studio seems to bring some records but at some point it blocks.
The funny thing is that there are no LOCKED or LOCKING processes on the specific database. I found out that the application uses the following transaction isolation:
ALLOW_SNAPSHOT_ISOLATION ON
which explains why we can't see Locked or Locking processes right?
We have another database (let's call it database B) which actually has the same schema and we never had this issue. The only difference between these databases is the isolation I mentioned earlier. This one uses the default transaction isolation and we never had this odd thing of the database blocking. But also database A has a lot more transactions opening per day; much much more. So what I can think of is that the SNAPSHOT ISOLATION should be avoided for a big number of concurrent transactions in this case.
Can someone confirm that most probably it's the SNAPSHOT ISOLATION causing the problems?
I mean we have no locks and we just have a database blocking with no actual exceptions or something that will help us detect the root cause of the problem.
Are my assumptions right? I surely hope so.
Have you tried to monitor your tempdb usage ? (AFAIK, ALLOWSNAPSHOT_ISOLATION ON relies heavily on tempdb, which isn't the case for standard locking strategies)
This MS technet page gives some tips on how to do this (see the section 'Monitoring space')
you can also use this quick query to check your tempdb isn't full :
use tempdb
exec sp_spaceused

How can I get dead lock in this situation?

In my client application I have a method like this (in practice it's more complex, but I've left the main part):
public void btnUpdate_Click(...)
{
...
dataAdapter.Update(...);
...
dataAdapter.Fill(...); // here I got exception one time
}
The exception I found in logs says "Deadlock found when trying to get lock; try restarting transaction". I met this exception only time, so it wasn't repeated.
As I understand, DataAdapter.Fill() method executes only select query. I don't make an explicit transaction and I have autocommit enabled.
So how can I get dead lock on a simple select query which is not a part of bigger transaction?
As I understand, to get a dead lock, two transactions should wait for each other. How is that possible with a single select not inside a transaction? Maybe it's a bug in MySql?
Thank you in advance.
You are right it takes two transactions to make a deadlock. That is to say, No statement or statements within a single transaction can deadlock with other statements within the same transaction.
But it only take one transaction to notice a report of a deadlock. How do you know that the transaction you are seeing the deadlock reported in is the only transaction being executed in the database? Isn't there other activity going on in this database?
Also. your statement "I don't make an explicit transaction", and "... which is not a part of bigger transaction" implies that you do not understand that every SQL statement executed is always in an implicit transaction, even if you do not explicitly start one.
Most databases have reporting mechanisms specifically designed to track, report and/or log instances of deadlocks for diagnostic purposes. In SQL server there is a trace flag that causes a log entry with much detail about each deadlock that occurs, including details about each of the two transactions involved, like what sql statements were being executed, what objects in the database were being locked, and why the lock could not be obtained. I'd guess mySQL has similar disgnostic tool. Find out what it is and turn it on so that the next time this occurs you can look and find out exactly what happened.
You can deadlock a simple SELECT against other statements, like an UPDATE. On my blog I have an example explaining a deadlock between two well tunned statements: Read/Write deadlock. While the example is SQL Server specific, the principle is generic. I don't have enough knowledge of MySQL to claim this is necessarily the case or not, specially in the light of various engines MySQL can deploy, but none the less a simple SELECT can be the victim of a deadlock.
I haven't looked into how MySQL transaction works, but this is based on how MSSQL transactions work:
If you are not using a transaction, each query has a transaction by itself. Otherwise you would get a mess every time an update failed in the middle.
The reason for the deadlock might be lock escalation. The database tries to lock as little as possible for each query, so it starts out by locking only the single rows affected. When most of the rows in a page is locked by the query it might decide that escalating the lock into locking the entire page would be better, which may have the side effect of locking some rows not otherwise affected by the query.
If a select query and an update query are trying to escalate locks on the same table, they may cause a deadlock eventhough only a single table is involved.
I agree that in this particular issue this is unlikely to be the issue but this is supplemental to the other answers in terms of limiting their scope, recorded for posterity in case someone finds it useful.
MySQL can in rare cases have single statements periodically deadlock against themselves. This seems to happen particularly on bulk inserts and the issues are almost certainly a deadlock between different threads relating to the operation. I would expect bulk updates to have the same problem. In the past when faced with this sort of issue I have generally just cut down on the number of rows being inserted (or updated) in a single statement. You won't usually get a deadlock when trying to obtain the lock in this case but other messages.
A colleague of mine and I were discussing similar problems in MS SQL Server (so this is not unique to MySQL!) and he pointed out that the solution there is to tell the server not to parallelize the insert or update. The problems here are spinlock-related deadlocks, not logical lock deadlocks in the RDBMS.

Is there a difference between commit and rollback in a transaction only having selects?

The in-house application framework we use at my company makes it necessary to put every SQL query into transactions, even though if I know that none of the commands will make changes in the database. At the end of the session, before closing the connection, I commit the transaction to close it properly. I wonder if there were any particular difference if I rolled it back, especially in terms of speed.
Please note that I am using Oracle, but I guess other databases have similar behaviour. Also, I can't do anything about the requirement to begin the transaction, that part of the codebase is out of my hands.
Databases often preserve either a before-image journal (what it was before the transaction) or an after-image journal (what it will be when the transaction completes.) If it keeps a before-image, that has to be restored on a rollback. If it keeps an after-image, that has to replace data in the event of a commit.
Oracle has both a journal and rollback space. The transaction journal accumulates blocks which are later written by DB writers. Since these are asychronous, almost nothing DB writer related has any impact on your transaction (if the queue fills up, then you might have to wait.)
Even for a query-only transaction, I'd be willing to bet that there's some little bit of transactional record-keeping in Oracle's rollback areas. I suspect that a rollback requires some work on Oracle's part before it determines there's nothing to actually roll back. And I think this is synchronous with your transaction. You can't really release any locks until the rollback is completed. [Yes, I know you aren't using any in your transaction, but the locking issue is why I think a rollback has to be fully released then all the locks can be released, then your rollback is finished.]
On the other hand, the commit is more-or-less the expected outcome, and I suspect that discarding the rollback area might be slightly faster. You created no transaction entries, so the db writer will never even wake up to check and discover that there was nothing to do.
I also expect that while commit may be faster, the differences will be minor. So minor, that you might not be able to even measure them in a side-by-side comparison.
I agree with the previous answers that there's no difference between COMMIT and ROLLBACK in this case. There might be a negligible difference in the CPU time needed to determine that there's nothing to COMMIT versus the CPU time needed to determine that there's nothing to ROLLBACK. But, if it's a negligible difference, we can safely forget about about it.
However, it's worth pointing out that there's a difference between a session that does a bunch of queries in the context of a single transaction and a session that does the same queries in the context of a series of transactions.
If a client starts a transaction, performs a query, performs a COMMITor ROLLBACK, then starts a second transaction and performs a second query, there's no guarantee that the second query will observe the same database state as the first query. Sometimes, maintaining a single consistent view of the data is of the essence. Sometimes, getting a more current view of the data is of the essence. It depends on what you are doing.
I know, I know, the OP didn't ask this question. But some readers may be asking it in the back of their minds.
In general a COMMIT is much faster than a ROLLBACK, but in the case where you have done nothing they are effectively the same.
The documentation states that:
Oracle recommends that you explicitly end every transaction in your application programs with a COMMIT or ROLLBACK statement, including the last transaction, before disconnecting from Oracle Database. If you do not explicitly commit the transaction and the program terminates abnormally, then the last uncommitted transaction is automatically rolled back. A normal exit from most Oracle utilities and tools causes the current transaction to be committed. A normal exit from an Oracle precompiler program does not commit the transaction and relies on Oracle Database to roll back the current transaction.
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_4010.htm#SQLRF01110
If you want o choose to do one or the other then you might as well do the one that is the same as doing nothing, and just commit it.
Well, we must take into account what an SELECT returns in Oracle. There are two modes. By default an SELECT returns data as that data looked in the very moment the SELECT statement started executing (this is default behavior in READ COMMITTED isolation mode, the default transactional mode). So if an UPDATE/INSERT was executed after SELECT was issued that won't be visible in result set.
This can be a problem if you need to compare two result sets (for example debta and credit sides of an general ledger app). For that we have a second mode. In that mode SELECT returns data as it looked at the moment the current transaction began (default behavior in READ ONLY and SERIALIZABLE isolation levels).
So, at least sometimes it is necessary to execute SELECTs in transaction.
Since you've not done any DML, I suspect there'd be no difference between a COMMIT and ROLLBACK in Oracle. Either way there's nothing to do.
I'd think a Commit would be more efficient; since generally you'd expect most DB transactions to be committed; so you would think the DB optimizes for this case (as opposed to trying to be more efficient for a rollback).