I'm making a stateless web service which uses Microsoft SQL Server to read data (and only to read), without using transactions. What will be the best of the following:
Starting SqlConnection at each call to the web service, or
Storing SqlConnection in a static field.
IMHO, the first approach will cost too much resources (if ten clients make ten requests to the web service, it opens one hundred times the database...), but the second one has maybe some problems? Maybe race condition? Or security issues?
Adding a max/min pool size allows SQL server to pool connections so even though you create a new SqlConnection object, the db can pool connections for you.
(MSDN, min pool size)
eg. ConnectionString="Catalog=MyDb; MinPoolSize=10; MaxPoolSize=10..."
Personally, by default I would open the connection with each call, and rely on connection pooling to sort it out for me.
I wouldn't worry about the connection cost unless you are finding that to be problematic for some reason. Connection Pooling will make most of this less of a concern.
If you are working heavily with the database, or retrieving a large amount of static data you might want to consider implementing some kind of caching to reduce the overall cost of contacting the database period. The more data you can keep in the application cache the better, even with stateless web services.
Related
I am having an application where different users may connect to different databases (those can be either MySQL or Postgres), what might be the best way to cache those connections across different databases? I saw some connection pools but seems like they are more for one db multiple connections than for multiple db multiple connections.
PS:
For adding more context, I am designing a multi tenant architecture where each tenant connects to one or multiple databases, I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases. Or should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
An example for the software that I want to build is https://www.sigmacomputing.com/ where the user can connects to multiple databases for working with different tables.
Both MySQL and Postgres do not allow to connection sharing between multiple database users, single database user is specified in connection credentials. If you mean that your different users have their own database credentials, then it is not possible to share connections between them.
If by "different users" you mean your application users and if they share single database user to access DB deeper in the app, then you don't need to do anything particular to "cache" connections. sql.DB keeps and reuses open connections in its pool by default.
Go automatically opens, closes and reuses DB connections with a *database/sql.DB. By default it keeps up to 2 connections open (idle) and opens unlimited number of new connections under concurrency when all opened connections are already busy.
If you need some fine tuning on pool efficiency vs database load, you may want to alter sql.DB config with .Set* methods, for example SetMaxOpenConns.
You seem to have to many unknowns. In cases like this I would apply good, old agile and start with prototype of what you want to achieve with tools that you already know and then benchmark the performance. I think you might be surprised how much go can handle.
Since you understand how to use map[string]*sql.DB for that purpose I would go with that. You reach some limits? Add another machine behind haproxy. Solving scaling problem doesn't necessary mean writing new db pool in go. Obviously if you need this kind of power you can always do it - pgx postgres driver has it's own pool implementation so you can get your inspiration there. However doing this right now seems to be pre-mature optimization - solving problem you don't have yet. Building prototype with map[string]*sql.DB is easy, test it, benchmark it, you will see if you need more.
p.s. BTW you will most likely hit first file descriptor limit before you will be able to exhaust memory.
Assuming you have multiple users with multiple databases with an N to N relation, you could have a map of a database URL to database details (explained below).
The fact that which users have access to which databases should be handled anyway using configmap or a core database; For Database Details, we could have a struct like this:
type DBDetail {
sync.RWMutex
connection *sql.DB
}
The map would be database URL to database's details (dbDetail) and if a user is write it calls this:
dbDetail.Lock()
defer dbDetail.Unock()
and for reads instead of above just use RLock.
As said by vearutop the connections could be a pain but using this you could have a single connection or set the limit with increment and decrement of another variable after Lock.
There isn’t necessarily a correct architectural answer here. It depends on some of the constraints of the system.
I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases.
Whether this will scale sufficiently depends on the expectation of how numerous the databases will be. If there are expected to be tens or hundreds of concurrent users in the near future, is probably sufficient. Often a good next step after using a map is to transition over to a more full featured cache (for example https://github.com/dgraph-io/ristretto).
A factor in the decision of whether to use a map or cache is how you imagine the lifecycle of a database connection. Once a connection is opened, can that connection remain opened for the remainder of the lifetime of the process or do connections need to be closed after minutes of no use to free up resources.
Should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
The right answer here depends on how many processing nodes are expected and whether there will be gain additional benefits from routing requests to specific machines. For example, row-level caching and isolating users from each other’s requests is an advantage that would be gained by sharing users across the pool. But a disadvantage is that you might end up with “hot” nodes because a single user might generate a majority of the traffic.
Usually, a good strategy for situations like this is to be really explicit about the constraints of the problem. A rule of thumb was coined by Jeff Dean for situations like this:
Ensure your design works if scale changes by 10X or 20X but the right solution for X [is] often not optimal for 100X
https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf
So, if in the near future, the system needs to support tens of concurrent users. The simplest that will support tens to hundreds of concurrent users (probably a map or cache with no user sharding is sufficient). That design will have to change before the system can support thousands of concurrent users. Scaling a system is often a good problem to have because it usually indicates a successful project.
I'm new to Azure SQL Database as this is my first project to migrate from a on premise setup to everything on Azure. So the first thing that got my concern is that there is a limit on concurrent login to the Azure SQL Database, and if it exist that number, then it'll start dropping the subsequent request. For the current service tier (S0), it caps at 60 concurrent logins, which I have already reached multiple times as I've encountered a few SQL failures from looking at my application log.
So my question is:
Is it normal to exceed that number of concurrent login? I'd like to get an idea of whether or not my application is having some issue, or my current service tier is too low.
I've looked into our database code to make sure we are not leaving database connection open. We use Enterprise Library, every use of DBCommand and IDataReader are being wrapped within a using block, thus they will get disposed once it runs out of scope.
Note: my web application consists of a front end app with multiple web services supporting the underlying feature, and each service will connect to the same database for specific collection of data, which makes me think hitting 60 concurrent login might be normal since a page or an action might involve multiple calls behind the scene, thus multiple connection to the database from a few api, and if there are more than one user on the application, then 60 is really easy to reach.
Again, in the past with on prem setup, I never notice this kind of limitation.
Thanks.
To answer your question, the limit is 60 on an S(0)
http://gavinstevens.com/2016/11/30/sql-server-vs-azure-sql/
I would like to keep a min number of connections alive in the pool for performance reasons.
The DB is read only and exposed via a stateless WCF service, so I think setting MinPoolSize from a default of 0 makes sense in this scenario.
At the same time I don't want to set the value too high unnecessarily wasting memory by keeping a large number of idle connections.
What would be the best practice or recommended setting a custom MinPoolSize in this kind of scenario?
The minimum number of connections in a connection pool cannot be less than two
I have to work with a Project made by another developer. A project Win-Form with Visual-Basic code, with MS-Access as db and some OleDbConnections. There is a bug: sometimes the application can't open the OleDbConnection because the max number of connections has been reached on the db. I know the best way to use the connections is this:
Using cn As New OleDbConnction(s)
...
cn.Close()
End Using
But in the project there are many classes to work with the db, and in many of these classes there are OleDbConnections with "Friend" visibility, that are opened and closed in different times. For this reason it's impossible to put all the OleDbConnections in a Using construct, and it's very very hard to find what operation "forgets" to close one of these OleDbConnection.
A possible solution could be to use only one unique public OleDbConnection, and to check, before opening it, if it isn't already opened.
But someone have told me it's a very bad practice. I suppose he told me this about the performance, but I don't know it exactly.
Can you tell me why one unique public OleDbConnection is so deprecated?
Have you got, for me, an "easy" solution for my problem?
Thank you,
Pileggi
From your description, I see a couple of possible issues that could result in your problem:
nested connections:
You open multiple connections within each-other
open/release connections too fast:
As David-W-Fenton mentionned, with access, every time you open/close a single connection, the lock file will be created/removed. This operation is quite slow and if you quickly open/close the database within you application (execute lots of atomic queries), you may get this issue.
A few possible ways to investigate and solve the issue:
Trace all open/close calls
Add some debug traces that show every time you open and close a connection.
It will allow you to detect nested connections and where your connection pool is being wasted.
Force connection polling
An easy 'fix' may be to explicitely set connection pooling in your connection string. It should be the default behaviour, so maybe it won't do anything to solve your problem, but it's so simple that there is no reason not to try it:
OLE DB Services=-1
Use a connection manager class to create/release connections for you.
Replace all the explicit creations of new OleDbConnection and close operations by your own code.
This would allow you to always re-use a single existing connection throughout your application and allow you to quickly make tweaks for the whole of your app by centralising the behaviour in a single place.
So why holding a single connection is generally deprecated?
Generally, you should not keep connections open throughout your application as they force the database server to keep resources available for you, and it decreases the number of client that can connect (there is always a limited number of connections available).
For Access though -a file-based database without server part- keeping a single connection open is actually preferable because of the delay associated with opening new connections (creation of the lock file). Since Access is not meant to be used with a large number of concurrent users, the resource cost of keeping the connection open is not significant enough to be an issue.
From simple tests, it can be shown that keeping a connection always open allows subsequent connections to open about 10x faster!
The OleDb driver does connection pooling for you, so it is able to re-use connections when they are freed.
By keeping your connections and database operations small and contained, you would be less likely to run into concurrency issues when using threads. Keeping a global connection may become an issue if you are executing multiple operations using the same pipeline to the database.
Just adding some information that works for years successfully for me (it is somewhat similar to what David-W-Fenton suggests)
First, an OleDbConnection to Microsoft Access (MDB, JET) is not using connection pooling. As Microsoft states in KB191572:
Connections that use the Jet OLE DB providers and ODBC drivers are not
pooled because those providers and drivers do not support pooling.
Regarding connection pooling, there is also this blog post from Ivan Mitev that states:
So what does this mean? It is apparent that that the presence of an
actively opened connection made the test with multiple connection
closing and opening finish a lot faster (2-3 times). The only possible
explanation for me is that the connection pool is released each time
there are no active connections. I have to make further investigations
and read something like Pooling in the Microsoft Data Access
Components. Or maybe hold a single opened connection just for the sake
of keeping the pool alive. This would be ugly, but still it is a good
enough workaround! If anyone has a better idea, please share it.
And Microsoft notes in MSDN:
The ADO Connection object implicitly uses IDataInitialize. However,
this means your application needs to keep at least one instance of a
Connection object instantiated for each unique user—at all times.
Otherwise, the pool will be destroyed when the last Connection object
for that string is closed.
Based on all this and my own tests, my solution to "simulate" connection pooling even with Microsoft Access databases roughly follows these steps:
Open one OleDbConnection to the Access database as early as possible in application lifecycle.
Do your normal SQL queries, disposing OleDbConnections as early as possible, just like recommended.
Dispose that one always-open OleDbConnection as late as possible in application lifecycle.
This sped up my applications (mostly WinForms) tremendously.
Please note that this also works for Sqlite which seems to not support connection pooling, too.
I'm using PHP's PDO layer for data access in a project, and I've been reading up on it and seeing that it has good innate support for persistent DB connections. I'm wondering when/if I should use them. Would I see performance benefits in a CRUD-heavy app? Are there downsides to consider, perhaps related to security?
If it matters to you, I'm using MySQL 5.x.
You could use this as a rough "ruleset":
YES, use persistent connections, if:
There are only few applications/users accessing the database, i.e. you will not result in 200 open (but probably idle) connections, because there are 200 different users shared on the same host.
The database is running on another server that you are accessing over the network
An (one) application accesses the database very often
NO, don't use persistent connections, if:
Your application only needs to access the database 100 times an hour.
You have many webservers accessing one database server
You're using Apache in prefork mode. It uses one connection for each child process, which can ramp up fairly quickly. (via #Powerlord in the comments)
Using persistent connections is considerable faster, especially if you are accessing the database over a network. It doesn't make so much difference if the database is running on the same machine, but it is still a little bit faster. However - as the name says - the connection is persistent, i.e. it stays open, even if it is not used.
The problem with that is, that in "default configuration", MySQL only allows 1000 parallel "open channels". After that, new connections are refused (You can tweak this setting). So if you have - say - 20 Webservers with each 100 Clients on them, and every one of them has just one page access per hour, simple math will show you that you'll need 2000 parallel connections to the database. That won't work.
Ergo: Only use it for applications with lots of requests.
In brief, my experience says that persistent connections should be avoided as far as possible.
Note that mysql_close is a no-operation (no-op) for connections that are created using mysql_pconnect. This means persistent connection cannot be closed by client at will. Such connection will be closed by mysqldb server when no activity occurs on the connection for duration more than wait_timeout. If wait_timeout is large value (say 30 min) then mysql db server can easily reach max_connections limit. In such case, mysql db will not accept any future connection request. This is when your pager starts beeping.
In order to avoid reaching max_connections limit, use of Persistent connection need careful balancing of following variables...
Number of apache processes on one host
Total number of hosts running apache
wait_timout variable in mysql db server
max_connections variable in mysql db server
Number of requests served by one apache process before it is re-spawned
So, pl use persistent connection after enough deliberation. You may not want to invite complex runtime issues for a small gain that you get from persistent connection.
Creating connections to the database is a fairly expensive operation. Persistent connections are a good idea. In the ASP.Net and Java world, we have "connection pooling", which is roughly the same thing, and also a good idea.
IMO, The real answer to this question is whatever works best for you app. I would recommend you benchmark your app using both persistent and non-persistent connections.
Maggie Nelson # Objectively Oriented posted about this in August and Robert Swarthout made an accompanying post with some hard numbers. Both are pretty good reads.
In my humble opinion:
When using PHP for web development, most of your connection will only "live" for the life of the page executing. A persistant connection is going to cost you a lot of overhead as you'll have to put it in the session or some such thing.
99% of the time a single non-persistant connection that dies at the end of the page execution will work just fine.
The other 1% of the time, you probably should not be using PHP for the app, and there is no perfect solution for you.
In general, you'll need to use non-persistent connections sometimes, and it's nice to have a single pattern to apply to db connection design (as long as there's relatively little upside to using persistent connections in your context.)
I was going to ask this same question but rather than ask the same question again I'll just add some information that I've found.
Are PHP persistent connections evil ?
Persistent Database Connections
It is also worth noting that the newer mysqli extension does not even include the option to use persistent database connections.
I'm still using persitent connections at the moment but plan to switch to non-persistent in the near future.