I have heard in the past that persistent connections are not good to use on a high traffic web server. Is this true, or does it only apply to apache's prefork mode? Would CGI mode have this problem?
This involves PHP, Apache, and Postgresql.
Are PHP persistent connections evil ? -- in context of PHP and MySQL.
The reason behind using persistent connections is of course reducing number of connects which are rather expensive, even though they are much faster with MySQL than with most other databases.
The first problem with persistent connections...
If you’re establishing thousands of connections per second you normally do not keep it open for long time, but Operation System does. According to TCP/IP protocol Ports can’t be recycled instantly and have to spend some time in “FIN” stage waiting before they can be recycled.
The second problem... using too many MySQL server connections.
Some people simply do not realize you can increase max_connections variable and get over 100 concurrent connections with MySQL others were beaten by older Linux problems of not being able to have more than 1024 connections with MySQL.
Lets talk now about why Persistent connections were disabled in mysqli extension. Even though you could misuse persistent connections and get poor performance that was not the reason. The real reason is – you could get much more problems with it.
Persistent connections were added to PHP during times of MySQL 3.22/3.23 when MySQL was simple enough so you could recycle connections easily without any problems. In later versions number of problems however arose – If you recycle connection which has uncommitted transactions you run into trouble. If you happen to recycle connections with custom character set settings you’re in trouble back again, not to mention about possibly changed per session variables.
One problem with using persistent connections is that it doesn't really scale that well. If you have 5000 people connected, you need 5000 persistent connections. If you take away the need for persistence, you might be able to serve 10000 people with the same number of connections because they're able to share those connections when they're not using them.
Related
I am having an application where different users may connect to different databases (those can be either MySQL or Postgres), what might be the best way to cache those connections across different databases? I saw some connection pools but seems like they are more for one db multiple connections than for multiple db multiple connections.
PS:
For adding more context, I am designing a multi tenant architecture where each tenant connects to one or multiple databases, I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases. Or should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
An example for the software that I want to build is https://www.sigmacomputing.com/ where the user can connects to multiple databases for working with different tables.
Both MySQL and Postgres do not allow to connection sharing between multiple database users, single database user is specified in connection credentials. If you mean that your different users have their own database credentials, then it is not possible to share connections between them.
If by "different users" you mean your application users and if they share single database user to access DB deeper in the app, then you don't need to do anything particular to "cache" connections. sql.DB keeps and reuses open connections in its pool by default.
Go automatically opens, closes and reuses DB connections with a *database/sql.DB. By default it keeps up to 2 connections open (idle) and opens unlimited number of new connections under concurrency when all opened connections are already busy.
If you need some fine tuning on pool efficiency vs database load, you may want to alter sql.DB config with .Set* methods, for example SetMaxOpenConns.
You seem to have to many unknowns. In cases like this I would apply good, old agile and start with prototype of what you want to achieve with tools that you already know and then benchmark the performance. I think you might be surprised how much go can handle.
Since you understand how to use map[string]*sql.DB for that purpose I would go with that. You reach some limits? Add another machine behind haproxy. Solving scaling problem doesn't necessary mean writing new db pool in go. Obviously if you need this kind of power you can always do it - pgx postgres driver has it's own pool implementation so you can get your inspiration there. However doing this right now seems to be pre-mature optimization - solving problem you don't have yet. Building prototype with map[string]*sql.DB is easy, test it, benchmark it, you will see if you need more.
p.s. BTW you will most likely hit first file descriptor limit before you will be able to exhaust memory.
Assuming you have multiple users with multiple databases with an N to N relation, you could have a map of a database URL to database details (explained below).
The fact that which users have access to which databases should be handled anyway using configmap or a core database; For Database Details, we could have a struct like this:
type DBDetail {
sync.RWMutex
connection *sql.DB
}
The map would be database URL to database's details (dbDetail) and if a user is write it calls this:
dbDetail.Lock()
defer dbDetail.Unock()
and for reads instead of above just use RLock.
As said by vearutop the connections could be a pain but using this you could have a single connection or set the limit with increment and decrement of another variable after Lock.
There isn’t necessarily a correct architectural answer here. It depends on some of the constraints of the system.
I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases.
Whether this will scale sufficiently depends on the expectation of how numerous the databases will be. If there are expected to be tens or hundreds of concurrent users in the near future, is probably sufficient. Often a good next step after using a map is to transition over to a more full featured cache (for example https://github.com/dgraph-io/ristretto).
A factor in the decision of whether to use a map or cache is how you imagine the lifecycle of a database connection. Once a connection is opened, can that connection remain opened for the remainder of the lifetime of the process or do connections need to be closed after minutes of no use to free up resources.
Should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
The right answer here depends on how many processing nodes are expected and whether there will be gain additional benefits from routing requests to specific machines. For example, row-level caching and isolating users from each other’s requests is an advantage that would be gained by sharing users across the pool. But a disadvantage is that you might end up with “hot” nodes because a single user might generate a majority of the traffic.
Usually, a good strategy for situations like this is to be really explicit about the constraints of the problem. A rule of thumb was coined by Jeff Dean for situations like this:
Ensure your design works if scale changes by 10X or 20X but the right solution for X [is] often not optimal for 100X
https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf
So, if in the near future, the system needs to support tens of concurrent users. The simplest that will support tens to hundreds of concurrent users (probably a map or cache with no user sharding is sufficient). That design will have to change before the system can support thousands of concurrent users. Scaling a system is often a good problem to have because it usually indicates a successful project.
On a single page load I see several connections open. Although there are a at least 20 calls to the database, I see around 8 connections open before slowly dropping off. Each call is wrapped in a using statement, uses OpenStatelessSession and a singleton factory for nhibernate object. Shouldn't I only see a single connection open or is this normal behavior? I'm concerned because this is a high traffic site.
Each session always runs on a separate connection. If your site is exhausting the connection pool under load, you can switch to a session-per-request architecture, but that will of course consume more memory and CPU cycles due to the increased size of the first level cache.
I have the need to access a sybase database (12.5) from oversea. The high latency is definitely a problem.
I already optimized the connection parameters to make better use of the network and achieved a 20x performance increase, but it's still not enough : 1 minute to get 3Mb of data.
We need another 10x or 20x increase for our application.
Technical data :
the data are flowing through a single TCP connection using the TDS protocol
the client app is an excel sheet with macros, using the default Sybase driver
the corporate environment makes it difficult to push big changes in the 10+ years architecture, so solutions need to be the least intrusive. But some changes may be bargained due to the importance of this project.
Can anyone give me pointers ?
I already thought of :
splitting SQL requests over several concurrent connections to the database. The problem is data consistency : what if records are modified at the same time since requests will not be exactly executed at the same time ? Is there an existing mechanism to spread a request over several calls on different connections ?
using some kind of database "cache" or "local replication" oversea, but I don't know what is possible.
Thanks.
Try to install local database (ASE or ASA) and synchronize this databases with Sybase Mobilink (or Sybase Replication Server if you need small replication latency and you have a lot of money).
(I know I answer to my own question)
Eventually, we settled to designing our own database remote access protocol. It's not complicated since we are only using a basic subset of SQL (SELECT and UPDATE), and the protocol doesn't have to understand SQL anyway.
By using our own protocol, we'll be able to use compression, make the client able to use several TCP links at the same time, maximize network utilisation and add some functionnal caching secific to our application.
The client will be our app and the server will be a "proxy" to the real database, sitting next to it (like #Tim suggested in the comments).
It's not the only solution, but we feel that it's a good balance between enormous replication price, development complexity and expected benefits.
Is there a limit with the number of SSL connections?
We are trying to connect through SSL with 2000 sessions. We have tried it a couple of times but it always dies at 1062nd. Is there a limit?
Your operating system will have a limit on the number of open files if you are on linux
ulimit -a will show your various limits.
I imagine yours is set to 1024 and some of the sessions just happened to have closed allow the figure of 1062 (this last bit is a guess)
Yes, everything has a limit. As far as I'm aware, there is no inherit limit with "SSL".. it is after all just a protocol.
But, there is a limited amount of memory, ports, CPU on the machine you are connected to, from and every single one in between.
The actually server you are connected to may have an arbitrary limit set too.
This question doesn't have enough information to answer beyond "YES".
SSL itself doesn't have any limitations, but there are some practical limits you may be running into:
SSL connections require more resources on both ends of the connection, so you may be hitting some built-in server limit.
TCP/IP uses a 16-bit port number to identify connections, only some of which (around 16,000) are used for dynamic client connections. This would limit the number of active connections a single client could make to the same server.
On Linux, each process has a maximum number of file descriptors that it can have open, and each network connection uses one file descriptor. I imagine Windows has a similar limit.
I'm using PHP's PDO layer for data access in a project, and I've been reading up on it and seeing that it has good innate support for persistent DB connections. I'm wondering when/if I should use them. Would I see performance benefits in a CRUD-heavy app? Are there downsides to consider, perhaps related to security?
If it matters to you, I'm using MySQL 5.x.
You could use this as a rough "ruleset":
YES, use persistent connections, if:
There are only few applications/users accessing the database, i.e. you will not result in 200 open (but probably idle) connections, because there are 200 different users shared on the same host.
The database is running on another server that you are accessing over the network
An (one) application accesses the database very often
NO, don't use persistent connections, if:
Your application only needs to access the database 100 times an hour.
You have many webservers accessing one database server
You're using Apache in prefork mode. It uses one connection for each child process, which can ramp up fairly quickly. (via #Powerlord in the comments)
Using persistent connections is considerable faster, especially if you are accessing the database over a network. It doesn't make so much difference if the database is running on the same machine, but it is still a little bit faster. However - as the name says - the connection is persistent, i.e. it stays open, even if it is not used.
The problem with that is, that in "default configuration", MySQL only allows 1000 parallel "open channels". After that, new connections are refused (You can tweak this setting). So if you have - say - 20 Webservers with each 100 Clients on them, and every one of them has just one page access per hour, simple math will show you that you'll need 2000 parallel connections to the database. That won't work.
Ergo: Only use it for applications with lots of requests.
In brief, my experience says that persistent connections should be avoided as far as possible.
Note that mysql_close is a no-operation (no-op) for connections that are created using mysql_pconnect. This means persistent connection cannot be closed by client at will. Such connection will be closed by mysqldb server when no activity occurs on the connection for duration more than wait_timeout. If wait_timeout is large value (say 30 min) then mysql db server can easily reach max_connections limit. In such case, mysql db will not accept any future connection request. This is when your pager starts beeping.
In order to avoid reaching max_connections limit, use of Persistent connection need careful balancing of following variables...
Number of apache processes on one host
Total number of hosts running apache
wait_timout variable in mysql db server
max_connections variable in mysql db server
Number of requests served by one apache process before it is re-spawned
So, pl use persistent connection after enough deliberation. You may not want to invite complex runtime issues for a small gain that you get from persistent connection.
Creating connections to the database is a fairly expensive operation. Persistent connections are a good idea. In the ASP.Net and Java world, we have "connection pooling", which is roughly the same thing, and also a good idea.
IMO, The real answer to this question is whatever works best for you app. I would recommend you benchmark your app using both persistent and non-persistent connections.
Maggie Nelson # Objectively Oriented posted about this in August and Robert Swarthout made an accompanying post with some hard numbers. Both are pretty good reads.
In my humble opinion:
When using PHP for web development, most of your connection will only "live" for the life of the page executing. A persistant connection is going to cost you a lot of overhead as you'll have to put it in the session or some such thing.
99% of the time a single non-persistant connection that dies at the end of the page execution will work just fine.
The other 1% of the time, you probably should not be using PHP for the app, and there is no perfect solution for you.
In general, you'll need to use non-persistent connections sometimes, and it's nice to have a single pattern to apply to db connection design (as long as there's relatively little upside to using persistent connections in your context.)
I was going to ask this same question but rather than ask the same question again I'll just add some information that I've found.
Are PHP persistent connections evil ?
Persistent Database Connections
It is also worth noting that the newer mysqli extension does not even include the option to use persistent database connections.
I'm still using persitent connections at the moment but plan to switch to non-persistent in the near future.