Performance difference between global database connection and opening connection everytime on Golang - sql

In my current project I was opening a new database connection every time when user makes request. For example:
func login(w http.ResponseWriter, r *http.Request) {
...
db, err := sqlx.Connect("postgres", "user=postgres password=*** dbname=postgres")
if err != nil {
ErrorWithJSON(w, err.Error(), http.StatusBadRequest)
return
}
db.SetMaxIdleConns(0)
db.SetConnMaxLifetime(time.Second * 30)
user, err := loginManager(db, m)
...
err = db.Close()
}
When I searched for other people's code, I've seen that most of the developers create a global variable for database connection, set it on the main and use this variable on entire project.
I was wondering is there any difference between these approaches? If I use global variable will there be any latency when 5 different users makes requests for register/login etc. If there will be latency, should I create multiple database connections and store them in a slice for future requests so I can pick randomly when users make request. Like a simple load balancer, I don't know?
Sorry for multiple questions. Thank you!

Yes, there might be a huge performance difference (might be several order of magnitude depending on the nature of queries you run and on system and server configuration).
The sqlx.DB type wraps (embeds) an sql.DB type, which manages a pool of connections:
DB is a database handle representing a pool of zero or more underlying connections. It's safe for concurrent use by multiple goroutines.
The sql package creates and frees connections automatically; it also maintains a free pool of idle connections. If the database has a concept of per-connection state, such state can only be reliably observed within a transaction.
Every time you open a new connection, a lot of things have to happen in the "background": connection string has to be parsed, a TCP connection has to be estabilished, authentication / authorization must be performed, resources must be allocated at both sides (client and server) etc. These are just the main, obvious things. Even though some of these may be provided / implemented optimized, cached, there is still a significant overhead compared to having a single DB instance which might have multiple established, authenticated connections ready in a pool, waiting to be used / utilized.
Also quoting from sql.Open():
The returned DB is safe for concurrent use by multiple goroutines and maintains its own pool of idle connections. Thus, the Open function should be called just once. It is rarely necessary to close a DB.
sqlx.Connect() which you used calls sqlx.Open() which is "the same as sql.Open, but returns an *sqlx.DB instead".
So all in all, use a single, global sqlx.DB or sql.DB instance, and share / use that everywhere. It provides you automatic connection- and connection pool management. This will provide you the best performance. You may fine-tune the connection pool with the DB.SetConnMaxLifetime(), DB.SetMaxIdleConns() and DB.SetMaxOpenConns() methods.
Idle connections (DB.SetMaxIdleConns()) are those that are not in-use currently, but sitting in the pool, waiting for someone to pick them up. You should definitely have some of these, e.g. 5 or 10 of them, or even more. DB.SetConnMaxLifetime() controls how long a new connection may be used. Once it grows older than this, it will be closed (and a new one will be opened if needed). You shouldn't change this, default behavior is never to expire connections. Basically all defaults are sensible, you should only play with them if you experience performance problems. Also, read docs of these methods to have a clear picture.
See this similar, possible duplicate question:
mgo - query performance seems consistently slow (500-650ms)

Related

Pooling, Client Checked out, idleTimeoutMillis

This is my understanding after reading the Documents:
Pooling, like many other DBs, we have only a number of allowed connections, so you guys all line-up and wait for a free connection returned to the pool. (a connection is like a token in a sense)
at any given time, number of active and/or available connections is controlled in the range of 0-max.
idleTimeoutMillis said is "milliseconds a client must sit idle in the pool and not be checked out before it is disconnected from the backend and discarded." Not clear on this. Generally supposed when a client say a web app has done its CRUD but not return the connection voluntarily believed is idle. node-postgres will start the clock, once reaches the number of milliseconds will take the connection back to the pool for next client. So what is not be checked out before it is disconnected from the backend and discarded?
Say idleTimeoutMillis: 100, does it mean this connection will be literally disconnected (log-out) after idle for 100 millisecond? If yes then it's not returning to the pool and will result in frequent login connection as the doc said below:
Connecting a new client to the PostgreSQL server requires a handshake
which can take 20-30 milliseconds. During this time passwords are
negotiated, SSL may be established, and configuration information is
shared with the client & server. Incurring this cost every time we
want to execute a query would substantially slow down our application.
Thanks in advance for the stupid questions.
Sorry this question was not answered for so long but I recently came across a bug which questioned my understanding of this library too.
Essentially when you're pooling you're saying to the library you can have a maximum of X connections to the Database simultaneously open. So every request that comes into a CRUD API for example will open a new connection and you will have a total of X requests possible as each request opens a new connection. Now that means as soon as a request comes in it 'checks out' a connection from the pool. This also means another request cannot use this connection as it is currently blocked by another request.
So in order to let's say 'reuse' the same connection when one request is done with that connection you have to release it and say it's ready to use again 'checking out'. Now when another request comes in it is able to use this connection and do the aforementioned query.
idleTimeoutMillis this variable to me is very confusing to me and took a while to get my head around. When there is an open connection to a DB which has been released or 'checked out' it is in an IDLE state, which means that anyone wanting to make a request can make a request with this connection as it is not being used. This variable says that when a connection is in an IDLE state how long do we wait until we can close this connection. For various things this may be used. Obviously having open DB connections requires memory and so forth so closing them might be beneficial. Also when autoscaling - let's say you been at max requests/second and and you're using all DB conns then this is useful to keep IDLE connections open for a bit. However if this is too long and you scale down then you can run into prolonged memory as each IDLE connection will require some memory space.
The benefit of this is when you have an open connection and just send a query with it you don't need to re-authenticate with the DB it's authenticated and ready to go.

How to cache connections to different Postgres/MySQL databases in Golang?

I am having an application where different users may connect to different databases (those can be either MySQL or Postgres), what might be the best way to cache those connections across different databases? I saw some connection pools but seems like they are more for one db multiple connections than for multiple db multiple connections.
PS:
For adding more context, I am designing a multi tenant architecture where each tenant connects to one or multiple databases, I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases. Or should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
An example for the software that I want to build is https://www.sigmacomputing.com/ where the user can connects to multiple databases for working with different tables.
Both MySQL and Postgres do not allow to connection sharing between multiple database users, single database user is specified in connection credentials. If you mean that your different users have their own database credentials, then it is not possible to share connections between them.
If by "different users" you mean your application users and if they share single database user to access DB deeper in the app, then you don't need to do anything particular to "cache" connections. sql.DB keeps and reuses open connections in its pool by default.
Go automatically opens, closes and reuses DB connections with a *database/sql.DB. By default it keeps up to 2 connections open (idle) and opens unlimited number of new connections under concurrency when all opened connections are already busy.
If you need some fine tuning on pool efficiency vs database load, you may want to alter sql.DB config with .Set* methods, for example SetMaxOpenConns.
You seem to have to many unknowns. In cases like this I would apply good, old agile and start with prototype of what you want to achieve with tools that you already know and then benchmark the performance. I think you might be surprised how much go can handle.
Since you understand how to use map[string]*sql.DB for that purpose I would go with that. You reach some limits? Add another machine behind haproxy. Solving scaling problem doesn't necessary mean writing new db pool in go. Obviously if you need this kind of power you can always do it - pgx postgres driver has it's own pool implementation so you can get your inspiration there. However doing this right now seems to be pre-mature optimization - solving problem you don't have yet. Building prototype with map[string]*sql.DB is easy, test it, benchmark it, you will see if you need more.
p.s. BTW you will most likely hit first file descriptor limit before you will be able to exhaust memory.
Assuming you have multiple users with multiple databases with an N to N relation, you could have a map of a database URL to database details (explained below).
The fact that which users have access to which databases should be handled anyway using configmap or a core database; For Database Details, we could have a struct like this:
type DBDetail {
sync.RWMutex
connection *sql.DB
}
The map would be database URL to database's details (dbDetail) and if a user is write it calls this:
dbDetail.Lock()
defer dbDetail.Unock()
and for reads instead of above just use RLock.
As said by vearutop the connections could be a pain but using this you could have a single connection or set the limit with increment and decrement of another variable after Lock.
There isn’t necessarily a correct architectural answer here. It depends on some of the constraints of the system.
I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases.
Whether this will scale sufficiently depends on the expectation of how numerous the databases will be. If there are expected to be tens or hundreds of concurrent users in the near future, is probably sufficient. Often a good next step after using a map is to transition over to a more full featured cache (for example https://github.com/dgraph-io/ristretto).
A factor in the decision of whether to use a map or cache is how you imagine the lifecycle of a database connection. Once a connection is opened, can that connection remain opened for the remainder of the lifetime of the process or do connections need to be closed after minutes of no use to free up resources.
Should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
The right answer here depends on how many processing nodes are expected and whether there will be gain additional benefits from routing requests to specific machines. For example, row-level caching and isolating users from each other’s requests is an advantage that would be gained by sharing users across the pool. But a disadvantage is that you might end up with “hot” nodes because a single user might generate a majority of the traffic.
Usually, a good strategy for situations like this is to be really explicit about the constraints of the problem. A rule of thumb was coined by Jeff Dean for situations like this:
Ensure your design works if scale changes by 10X or 20X but the right solution for X [is] often not optimal for 100X
https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf
So, if in the near future, the system needs to support tens of concurrent users. The simplest that will support tens to hundreds of concurrent users (probably a map or cache with no user sharding is sufficient). That design will have to change before the system can support thousands of concurrent users. Scaling a system is often a good problem to have because it usually indicates a successful project.

Multiple open sql connections

On a single page load I see several connections open. Although there are a at least 20 calls to the database, I see around 8 connections open before slowly dropping off. Each call is wrapped in a using statement, uses OpenStatelessSession and a singleton factory for nhibernate object. Shouldn't I only see a single connection open or is this normal behavior? I'm concerned because this is a high traffic site.
Each session always runs on a separate connection. If your site is exhausting the connection pool under load, you can switch to a session-per-request architecture, but that will of course consume more memory and CPU cycles due to the increased size of the first level cache.

Why one Public OleDbConnection is deprecated? Alternative to solve the bug: too many connections opened

I have to work with a Project made by another developer. A project Win-Form with Visual-Basic code, with MS-Access as db and some OleDbConnections. There is a bug: sometimes the application can't open the OleDbConnection because the max number of connections has been reached on the db. I know the best way to use the connections is this:
Using cn As New OleDbConnction(s)
...
cn.Close()
End Using
But in the project there are many classes to work with the db, and in many of these classes there are OleDbConnections with "Friend" visibility, that are opened and closed in different times. For this reason it's impossible to put all the OleDbConnections in a Using construct, and it's very very hard to find what operation "forgets" to close one of these OleDbConnection.
A possible solution could be to use only one unique public OleDbConnection, and to check, before opening it, if it isn't already opened.
But someone have told me it's a very bad practice. I suppose he told me this about the performance, but I don't know it exactly.
Can you tell me why one unique public OleDbConnection is so deprecated?
Have you got, for me, an "easy" solution for my problem?
Thank you,
Pileggi
From your description, I see a couple of possible issues that could result in your problem:
nested connections:
You open multiple connections within each-other
open/release connections too fast:
As David-W-Fenton mentionned, with access, every time you open/close a single connection, the lock file will be created/removed. This operation is quite slow and if you quickly open/close the database within you application (execute lots of atomic queries), you may get this issue.
A few possible ways to investigate and solve the issue:
Trace all open/close calls
Add some debug traces that show every time you open and close a connection.
It will allow you to detect nested connections and where your connection pool is being wasted.
Force connection polling
An easy 'fix' may be to explicitely set connection pooling in your connection string. It should be the default behaviour, so maybe it won't do anything to solve your problem, but it's so simple that there is no reason not to try it:
OLE DB Services=-1
Use a connection manager class to create/release connections for you.
Replace all the explicit creations of new OleDbConnection and close operations by your own code.
This would allow you to always re-use a single existing connection throughout your application and allow you to quickly make tweaks for the whole of your app by centralising the behaviour in a single place.
So why holding a single connection is generally deprecated?
Generally, you should not keep connections open throughout your application as they force the database server to keep resources available for you, and it decreases the number of client that can connect (there is always a limited number of connections available).
For Access though -a file-based database without server part- keeping a single connection open is actually preferable because of the delay associated with opening new connections (creation of the lock file). Since Access is not meant to be used with a large number of concurrent users, the resource cost of keeping the connection open is not significant enough to be an issue.
From simple tests, it can be shown that keeping a connection always open allows subsequent connections to open about 10x faster!
The OleDb driver does connection pooling for you, so it is able to re-use connections when they are freed.
By keeping your connections and database operations small and contained, you would be less likely to run into concurrency issues when using threads. Keeping a global connection may become an issue if you are executing multiple operations using the same pipeline to the database.
Just adding some information that works for years successfully for me (it is somewhat similar to what David-W-Fenton suggests)
First, an OleDbConnection to Microsoft Access (MDB, JET) is not using connection pooling. As Microsoft states in KB191572:
Connections that use the Jet OLE DB providers and ODBC drivers are not
pooled because those providers and drivers do not support pooling.
Regarding connection pooling, there is also this blog post from Ivan Mitev that states:
So what does this mean? It is apparent that that the presence of an
actively opened connection made the test with multiple connection
closing and opening finish a lot faster (2-3 times). The only possible
explanation for me is that the connection pool is released each time
there are no active connections. I have to make further investigations
and read something like Pooling in the Microsoft Data Access
Components. Or maybe hold a single opened connection just for the sake
of keeping the pool alive. This would be ugly, but still it is a good
enough workaround! If anyone has a better idea, please share it.
And Microsoft notes in MSDN:
The ADO Connection object implicitly uses IDataInitialize. However,
this means your application needs to keep at least one instance of a
Connection object instantiated for each unique user—at all times.
Otherwise, the pool will be destroyed when the last Connection object
for that string is closed.
Based on all this and my own tests, my solution to "simulate" connection pooling even with Microsoft Access databases roughly follows these steps:
Open one OleDbConnection to the Access database as early as possible in application lifecycle.
Do your normal SQL queries, disposing OleDbConnections as early as possible, just like recommended.
Dispose that one always-open OleDbConnection as late as possible in application lifecycle.
This sped up my applications (mostly WinForms) tremendously.
Please note that this also works for Sqlite which seems to not support connection pooling, too.

Persistent DB Connections - Yea or Nay?

I'm using PHP's PDO layer for data access in a project, and I've been reading up on it and seeing that it has good innate support for persistent DB connections. I'm wondering when/if I should use them. Would I see performance benefits in a CRUD-heavy app? Are there downsides to consider, perhaps related to security?
If it matters to you, I'm using MySQL 5.x.
You could use this as a rough "ruleset":
YES, use persistent connections, if:
There are only few applications/users accessing the database, i.e. you will not result in 200 open (but probably idle) connections, because there are 200 different users shared on the same host.
The database is running on another server that you are accessing over the network
An (one) application accesses the database very often
NO, don't use persistent connections, if:
Your application only needs to access the database 100 times an hour.
You have many webservers accessing one database server
You're using Apache in prefork mode. It uses one connection for each child process, which can ramp up fairly quickly. (via #Powerlord in the comments)
Using persistent connections is considerable faster, especially if you are accessing the database over a network. It doesn't make so much difference if the database is running on the same machine, but it is still a little bit faster. However - as the name says - the connection is persistent, i.e. it stays open, even if it is not used.
The problem with that is, that in "default configuration", MySQL only allows 1000 parallel "open channels". After that, new connections are refused (You can tweak this setting). So if you have - say - 20 Webservers with each 100 Clients on them, and every one of them has just one page access per hour, simple math will show you that you'll need 2000 parallel connections to the database. That won't work.
Ergo: Only use it for applications with lots of requests.
In brief, my experience says that persistent connections should be avoided as far as possible.
Note that mysql_close is a no-operation (no-op) for connections that are created using mysql_pconnect. This means persistent connection cannot be closed by client at will. Such connection will be closed by mysqldb server when no activity occurs on the connection for duration more than wait_timeout. If wait_timeout is large value (say 30 min) then mysql db server can easily reach max_connections limit. In such case, mysql db will not accept any future connection request. This is when your pager starts beeping.
In order to avoid reaching max_connections limit, use of Persistent connection need careful balancing of following variables...
Number of apache processes on one host
Total number of hosts running apache
wait_timout variable in mysql db server
max_connections variable in mysql db server
Number of requests served by one apache process before it is re-spawned
So, pl use persistent connection after enough deliberation. You may not want to invite complex runtime issues for a small gain that you get from persistent connection.
Creating connections to the database is a fairly expensive operation. Persistent connections are a good idea. In the ASP.Net and Java world, we have "connection pooling", which is roughly the same thing, and also a good idea.
IMO, The real answer to this question is whatever works best for you app. I would recommend you benchmark your app using both persistent and non-persistent connections.
Maggie Nelson # Objectively Oriented posted about this in August and Robert Swarthout made an accompanying post with some hard numbers. Both are pretty good reads.
In my humble opinion:
When using PHP for web development, most of your connection will only "live" for the life of the page executing. A persistant connection is going to cost you a lot of overhead as you'll have to put it in the session or some such thing.
99% of the time a single non-persistant connection that dies at the end of the page execution will work just fine.
The other 1% of the time, you probably should not be using PHP for the app, and there is no perfect solution for you.
In general, you'll need to use non-persistent connections sometimes, and it's nice to have a single pattern to apply to db connection design (as long as there's relatively little upside to using persistent connections in your context.)
I was going to ask this same question but rather than ask the same question again I'll just add some information that I've found.
Are PHP persistent connections evil ?
Persistent Database Connections
It is also worth noting that the newer mysqli extension does not even include the option to use persistent database connections.
I'm still using persitent connections at the moment but plan to switch to non-persistent in the near future.