I was asked this question in an interview. Why is it important to close a database connection? Is it just good practice because it might be wasting resources or there is something more to it?
You already mentioned first reason: resource leaks. This would mean that the usage of memory, sockets and file descriptors on your system is constantly increasing until your program or the database crashes, gets killed or brings down the operating system to its knees. Even before that happens, your system would likely become unresponsive, slow and prone to various timeouts, network disconnects and so on.
If your code depends on implicit commits (which is a bad idea anyway), you would be losing the data that your application writes to the database.
Not closing a connection could also leave locks and transactions in the database, which would mean that other connections get stuck while waiting on a lock held by the zombie connection. For example, if you have an external reporting system, it might stop working. Database backups might also stop working, leaving you vulnerable to loss of data.
Depending on circumstances, unfinished transactions could also fill up database transaction logs and/or temporary space, potentially bringing the database offline in a state that requires manual intervention.
If you are using connection pools, not closing a connection could be preventing a connection being returned to the pool. This would mean that connection pool would eventually get depleted, preventing your program from opening new connections.
Related
On a single page load I see several connections open. Although there are a at least 20 calls to the database, I see around 8 connections open before slowly dropping off. Each call is wrapped in a using statement, uses OpenStatelessSession and a singleton factory for nhibernate object. Shouldn't I only see a single connection open or is this normal behavior? I'm concerned because this is a high traffic site.
Each session always runs on a separate connection. If your site is exhausting the connection pool under load, you can switch to a session-per-request architecture, but that will of course consume more memory and CPU cycles due to the increased size of the first level cache.
In Go, when using a SQL database, does one need to close the DB (db.Close) before closing an application? Will the DB automatically detect that the connection has died?
DB will do its best to detect but with no luck, it may not be able to detect. Better to release what is acquired as soon as possible.
send() system call will wait for TCP connection to send data but client won't receive anything.
Power failure, network issue or bare exit happened without properly releasing resources. TCP keepalive mechanism will kick in and try to detect that connection is dead.
Client is paused and doesn't receive any data, in this case send() will block.
As a result, it may prevent
Graceful shutdown of cluster.
Advancing event horizon if it was holding exclusive locks as a part of transaction such as auto vacuum in postgresql.
Server keepalive config could be shortened to detect it earlier. (For example, ~2h 12m default in postgresql will be very long according to workload).
There may be a hard limit on max open connections, until detection, some connections will be zombie (there, unusable but decreases limit).
The database will notice the connection had died and take appropriate actions: for instance, all uncommitted transactions active on that connection will be rolled back and the user session will be terminated.
But notice that this is a "recovery" scenario from the point of view of the database engine: it cannot just throw up when a client disconnects; it rather have to take explicit actions to have consistent state.
On the other hand, to shut down property when the program goes down "the normal way" (that is, not because of a panic or log.Fatal()) is really not that hard. And since the sql.DB instance is usually a program-wide global variable, its even simpler: just try closing it in main() like Matt suggested.
If you're initialising a connection in any function, you're normally better off deferring the call to close immediately, i.e.
conn := sql.Connect() // for example
defer conn.Close()
Which will close the connection once the enclosing function exits.
This is handy when used in a main function since once the program exits, the call to Close() will happen.
While working on my current development product I have setup SQL server mirroring between the primary data center and the secondary data center. In the primary data center the SQL .mdf and .ldf files are stored on the SAN.
Now admittedly it should be very unlikely for us to lose the SAN but if for example the connection to the SAN was lost and the database integrity was lost. Would the mirroring still happen? I.e. would SQL now mirror the broken database and now both are equally broken?
From googling its not clear when mirroring will and will not happen so I was hoping that the community may be able to share some of there experiences.
I also have backup schedules setup which would be a final fail safe but realistically I would hope that the mirrored database would be our quickest way to bring everything back online.
In this scenario at present there is no witness server in the mirroring process although with the benefits of automatic failover I am thinking of adding one.
Thanks
As far as mirroring corruption between PRIMARY and SECONDARY goes: unfortunately, it depends. If the corruption is immediate and physical, then not normally -- the corruption is typically picked up by checks done at the end of the transaction and rolled back.
However, a database can exist in a corrupted state for some time before anything realises it is corrupted. If the underlying data pages are not touched, the engine never has cause to check them. So it is possible that underlying storage issues may mean that either database can become corrupted and you won't know until you attmept to access the affected pages. Traditionally, this would be a write operation, since your client connection will only read from the current active database (and not the partner).
This is why it is important to perform regular maintenance checks on your databases (e.g. DBCC CHECKDB). This becomes harder in a mirrored environment because only PRIMARY can typically be checked, so you really have to induce a manual failover to test your SECONDARY (unless you are running Enterprise, where you might be able to snapshot the mirror and check that -- I've not tried).
Starting with SQL Server 2008, the engine will attempt something called Automatic Page Repair, where it tries to automatically recover corrupted pages it encounters during the mirroring process. You should probably keep an eye on sys.dm_db_mirroring_auto_page_repair if this is something you are worried about.
If it is logical corruption, where the wrong data is entered, this will push across to SECONDARY without any means of stopping it.
However, I should point out that your approach might leave you with other issues. Mirroring isn't backup. And mirroring isn't great over WAN links.
In synchronous mode, it receives the client request, then writes to PRIMARY, then writes to SECONDARY, gets the OK back from SECONDARY and then sends an OK back to the client. If it can't write to SECONDARY, or doesn't get the response from SECONDARY, it rolls back the operation on PRIMARY (even though it was successful) and sends a failure back to the client.
A failing WAN link (even temporarily) can cause PRIMARY to choose not to accept connections (because it can't see SECONDARY). A failover mid-connection can leave you in an invalid logical data state, so make sure your transactions are sound.
With a WITNESS server, this can be a little more robust -- placing the witness server alongside PRIMARY in the same LAN allows WITNESS and PRIMARY to form quorum and agree that PRIMARY is still working, even though it can't see SECONDARY (thus not locking you out of a perfectly functioning database).
Instead, over my slower site-to-site links, I prefer to use log shipping between PRIMARY and SECONDARY. With a bit of effort I can control the transport between sites so as to rate-limit over the WAN link and it is possible keep the log-shipped SECONDARY in a single-user standby mode. This allows me to run the standard DBCC CHECKDB commands against SECONDARY, as well as also querying the SECONDARY for data reconcilliation purposes, too. I can also put a delay on the restoration, too, so I have some leeway to failover before a major logical data error reaches the SECONDARY (although that really depends on the RDO).
If I have a high-availability requirement, I might put in mirroring at the main site only -- i.e. two servers + witness. The relatively-quick few-second automatic failover time provided by the witnessed environment has saved me a few late-night calls, in the past.
Hope this helps.
J.
Here's the sequence of events my hypothetical program makes...
Open a connection to server.
Run an UPDATE command.
Go off and do something that might take a significant amount of time.
Run another UPDATE that reverses the change in step 2.
Close connection.
But oh-no! During step 3, the machine running this program literally exploded. Other machines querying the same database will now think that the exploded machine is still working and doing something.
What I'd like to do, is just as the connection is opened, but before any changes have been made, tell the server that should this connection close for whatever reason, to run some SQL. That way, I can be sure that if something goes wrong, the closing update will run.
(To pre-empt the answer, I'm not looking for table/record locks or transactions. I'm not doing resource claims here.)
Many thanks, billpg.
I'm not sure there's anything built in, so I think you'll have to do some bespoke stuff...
This is totally hypothetical and straight off the top of my head, but:
Take the SPID of the connection you
opened and store it in some temp
table, with the text of the reversal
update.
Use an a background process (either
SSIS or something else) to monitor
the temp table and check that the
SPID is still present as an open connection.
If the connection dies then the background process can execute the stored revert command
If the connection completes properly then the SPID can be removed from the temp table so that the background process no longer reverts it when the connection closes.
Comments or improvements welcome!
I'll expand on my comment. In general, I think you should reconsider your approach. All database access code should open a connection, execute a query then close the connection where you rely on connection pooling to mitigate the expense of opening lots of database connections.
If it is the case that we are talking about a single SQL command whose rows on which it operates should not change, that is a problem that should be handled by the transaction isolation level. For that you might investigate the Snapshot isolation level in SQL Server 2005+.
If we are talking about a series of queries that are part of a long running transaction, that is more complicated and can be handled via storage of a transaction state which other connections read in order to determine whether they can proceed. Going down this road, you need to provide users with tools where they can cancel a long running transaction that might no longer be applicable.
Assuming it's even possible... this will only help you if the client machine explodes during the transaction. Also, there's a risk of false positives - the connection might get dropped for a few seconds due to network noise.
The approach that I'd take is to start a process on another machine that periodically pings the first one to check if it's still on-line, then takes action if it becomes unreachable.
I am working on a VB.NET project that grabs data from an Access DB file. All the code snipeets I have come across open the DB, do stuff and close it for each operation. I currently have the DB open for the entire time the application is running and only close it when the application exits.
My question is: Is there a benefit to opening the connection to the DB file for each operation instead of keeping it open for the duration the application is running?
In many database systems it is good practice to keep connections open only when they are in use, since an open connection consumes resources in the database. It is also considered good practice for your code to have as little knowledge as possible about the concrete database in use (for instance by programming against interfaces such as IDbConnection rather than concrete types as OleDbConnection.
For this reason, it could be a good idea to follow the practice of keeping the connection open as little as possible regardless of whether it makes sense or not for the particular database that you use. It simply makes your code more portable, and it increases your chance of not getting it wrong, in case you in your next project happen to work against a system where keeping connections open is a bad thing to do.
So, your question should really be reversed: is there anything to gain by keeping the connection open?
There is no benefit with the Jet/ACE database engine. The cost of creating the LDB file (the record locking file) is very high. You could perhaps avoid that by opening the file exclusively (if it's a single user), but my sense is that opening exclusive is slower than opening multi-user.
The advice for opening and closing connections is based around an assumption of a database server being on the other end of the connection. If you consider how that works, the cost of opening and closing the connection is very slight, as the database deamon has the data files already open, and handles locking on the fly via structures in memory (and perhaps on disk -- I really don't know about how it's implemented in any particular server database) that already exist once the server is up and running.
With Jet/ACE, all users are contending for two files, the data file and the locking file, and setting that up is much more expensive than the incremental cost of creating a new connection to a server database.
Now, in situations where you're aiming for high concurrency with a Jet/ACE data store, there might have to be a trade-off among these factors, and you might get higher concurrency by being much more miserly with your connections. But I would say that if you're into that realm with Jet/ACE, you should probably be contemplating upsizing to a server-based back end in the first place, rather than wasting time on optimizing Jet/ACE for an environment it was not designed for.