Multiple open sql connections

Multiple open sql connections - nhibernate

On a single page load I see several connections open. Although there are a at least 20 calls to the database, I see around 8 connections open before slowly dropping off. Each call is wrapped in a using statement, uses OpenStatelessSession and a singleton factory for nhibernate object. Shouldn't I only see a single connection open or is this normal behavior? I'm concerned because this is a high traffic site.

Each session always runs on a separate connection. If your site is exhausting the connection pool under load, you can switch to a session-per-request architecture, but that will of course consume more memory and CPU cycles due to the increased size of the first level cache.

Related

Why is there more DB connection pool than main thread in Webflux?

I use Webflux R2DBC.
I know, the default value creates 1 thread per CPU core.
However, I know that the default value of R2DBC's connection pool is 10.
I can't understood that....
DB connections operate asynchronously. Then, DB connection don't need many thread pool.
Because the DB connection only sends and receives, connection can do other work(other sending or receiving) while waiting for it.
Rather, since the main thread(1 per CPU core) performs various calculations, the number of main threads should be large.
What am I missing?

I think you kind of have it backwards.
since the main thread(1 per CPU core) performs various calculations, the number of main threads should be large.
A (logical) CPU can perform only one task at a time and since threads in a reactive application should never just wait, since all calls are non-blocking, it doesn't make sense to have more threads than CPU's.
DB connections operate asynchronously. Then, DB connection don't need many thread pool.
Correct, you don't need a thread pool for your database connectivity. But a connection pool isn't a thread pool. I holds connections. Database connections are still expensive to create, so you want to reuse them and database transactions are bound to the database connection, so you need multiple connections in order to process different requests in separate transactions.
Of course how big a connection pool should be is a completely different and rather complex question.

Performance difference between global database connection and opening connection everytime on Golang

In my current project I was opening a new database connection every time when user makes request. For example:
func login(w http.ResponseWriter, r *http.Request) {
...
db, err := sqlx.Connect("postgres", "user=postgres password=*** dbname=postgres")
if err != nil {
ErrorWithJSON(w, err.Error(), http.StatusBadRequest)
return
}
db.SetMaxIdleConns(0)
db.SetConnMaxLifetime(time.Second * 30)
user, err := loginManager(db, m)
...
err = db.Close()
}
When I searched for other people's code, I've seen that most of the developers create a global variable for database connection, set it on the main and use this variable on entire project.
I was wondering is there any difference between these approaches? If I use global variable will there be any latency when 5 different users makes requests for register/login etc. If there will be latency, should I create multiple database connections and store them in a slice for future requests so I can pick randomly when users make request. Like a simple load balancer, I don't know?
Sorry for multiple questions. Thank you!

Yes, there might be a huge performance difference (might be several order of magnitude depending on the nature of queries you run and on system and server configuration).
The sqlx.DB type wraps (embeds) an sql.DB type, which manages a pool of connections:
DB is a database handle representing a pool of zero or more underlying connections. It's safe for concurrent use by multiple goroutines.
The sql package creates and frees connections automatically; it also maintains a free pool of idle connections. If the database has a concept of per-connection state, such state can only be reliably observed within a transaction.
Every time you open a new connection, a lot of things have to happen in the "background": connection string has to be parsed, a TCP connection has to be estabilished, authentication / authorization must be performed, resources must be allocated at both sides (client and server) etc. These are just the main, obvious things. Even though some of these may be provided / implemented optimized, cached, there is still a significant overhead compared to having a single DB instance which might have multiple established, authenticated connections ready in a pool, waiting to be used / utilized.
Also quoting from sql.Open():
The returned DB is safe for concurrent use by multiple goroutines and maintains its own pool of idle connections. Thus, the Open function should be called just once. It is rarely necessary to close a DB.
sqlx.Connect() which you used calls sqlx.Open() which is "the same as sql.Open, but returns an *sqlx.DB instead".
So all in all, use a single, global sqlx.DB or sql.DB instance, and share / use that everywhere. It provides you automatic connection- and connection pool management. This will provide you the best performance. You may fine-tune the connection pool with the DB.SetConnMaxLifetime(), DB.SetMaxIdleConns() and DB.SetMaxOpenConns() methods.
Idle connections (DB.SetMaxIdleConns()) are those that are not in-use currently, but sitting in the pool, waiting for someone to pick them up. You should definitely have some of these, e.g. 5 or 10 of them, or even more. DB.SetConnMaxLifetime() controls how long a new connection may be used. Once it grows older than this, it will be closed (and a new one will be opened if needed). You shouldn't change this, default behavior is never to expire connections. Basically all defaults are sensible, you should only play with them if you experience performance problems. Also, read docs of these methods to have a clear picture.
See this similar, possible duplicate question:
mgo - query performance seems consistently slow (500-650ms)

What is application state?

This is a very general question. I am a bit confused with the term state. I would like to know what do people mean by "state of an application"? Why do they call webserver as "stateless" and database as "stateful"?
How is the state of an application (in a VM) transferred, when the VM memory is moved from one machine to another during live migration.
Is transferring the memory, caches and register values of a system enough to transfer the state of the running application?

You've definitely asked a mouthful -- it's unfortunate that the word state is used in so many different contexts, but each one is a valid use of the word.
State of an application
An application's state is roughly the entire contents of its memory. This can be a difficult concept to get behind until you've seen something like Erlang's server loops, which explicitly pass all the state of the application in a variable from one invocation of the function to the next. In more "normal" programming languages, the "state" of the program is all its global variables, static variables, objects allocated on the heap, objects allocated on the stack, registers, open file descriptors and file offsets, open network sockets and associated kernel buffers, and so forth.
You can actually save that state and resume execution of the process elsewhere. The BLCR checkpoint tools for Linux do exactly this. (Though it is an extremely uncommon task to perform.)
State of a protocol
The state of a protocol is a different sort of meaning -- the statelessness of HTTP requests means that every web browser communication with webservers essentially starts over, from scratch -- every cookie is re-transmitted in both directions to try to "fake" some amount of a "session" for the user's sake. The servers don't hold any resources open for any given client across requests -- each one starts from scratch.
Networked filesystems might also be stateless (earlier versions of NFS) or stateful (newer versions of NFS). The earlier versions assumed every individual packet of reading, writing, or metadata control would be committed as it arrived, and every time a specific byte was needed from a file, it would be re-requested. This allowed the servers to be very simple -- they would do what the client packets told them to do and no effort was required to bring servers and clients back to consistency if a server rebooted or routers disappeared. However, this was bad for performance -- every client requested static data hundreds or thousands of times each day. So newer versions of NFS allowed some amount of data caching on the clients, and persistent file handles between servers and clients, and the servers had to keep track of the state of the clients that were connected -- and vice versa: the clients also had to know what promises they had made to the servers.
A stateful firewall will keep track of active TCP sessions. It knows which sessions the system administrators want to allow through, so it looks for those initial packets specifically. Once the session is set up, it then tracks the established connections as entities in their own rights. (This was a real advancement upon previous stateless firewalls which considered packets in isolation -- the rulesets on previous firewalls were much more permissive to achieve the same levels of functionality, but allowed through far too many malicious packets that pretended a session was already active.)

An application state is simply the state at which an application resides with regards to where in a program is being executed and the memory that is stored for the application. The web is "stateless," meaning everytime you reload a page, no information remains from the previous version of the page. All information must be resent from the server in order to display the page.
Technically, browsers get around the statelessness of the web by utilizing techniques like caching and cookies.

Application state is a data repository available to all classes. Application state is stored in memory on the server and is faster than storing and retrieving information in a database. Unlike session state, which is specific to a single user session, application state applies to all users and sessions. Therefore, application state is a useful place to store small amounts of often-used data that does not change from one user to another.
Resource:http://msdn.microsoft.com/en-us/library/ms178594.aspx

Is transferring the memory, caches and register values of a system enough to transfer the state of the running application?
Does the application have a file open, positioned at byte 225? If so, that file is part of the application's state because the next byte written should go to position 226.
Has the application authenticated itself to a secure server with a time-based key? Then that connection is part of the application's state, because if the application were to be suspended for 24 hours after saving memory, cache, and register values, when it resumes it will no longer have a valid connection to the secure server because it will have timed out.
Things which make an application stateful are easy to overlook.

Downside to using persistent connections?

I have heard in the past that persistent connections are not good to use on a high traffic web server. Is this true, or does it only apply to apache's prefork mode? Would CGI mode have this problem?
This involves PHP, Apache, and Postgresql.

Are PHP persistent connections evil ? -- in context of PHP and MySQL.
The reason behind using persistent connections is of course reducing number of connects which are rather expensive, even though they are much faster with MySQL than with most other databases.
The first problem with persistent connections...
If you’re establishing thousands of connections per second you normally do not keep it open for long time, but Operation System does. According to TCP/IP protocol Ports can’t be recycled instantly and have to spend some time in “FIN” stage waiting before they can be recycled.
The second problem... using too many MySQL server connections.
Some people simply do not realize you can increase max_connections variable and get over 100 concurrent connections with MySQL others were beaten by older Linux problems of not being able to have more than 1024 connections with MySQL.
Lets talk now about why Persistent connections were disabled in mysqli extension. Even though you could misuse persistent connections and get poor performance that was not the reason. The real reason is – you could get much more problems with it.
Persistent connections were added to PHP during times of MySQL 3.22/3.23 when MySQL was simple enough so you could recycle connections easily without any problems. In later versions number of problems however arose – If you recycle connection which has uncommitted transactions you run into trouble. If you happen to recycle connections with custom character set settings you’re in trouble back again, not to mention about possibly changed per session variables.

One problem with using persistent connections is that it doesn't really scale that well. If you have 5000 people connected, you need 5000 persistent connections. If you take away the need for persistence, you might be able to serve 10000 people with the same number of connections because they're able to share those connections when they're not using them.

Persistent DB Connections - Yea or Nay?

I'm using PHP's PDO layer for data access in a project, and I've been reading up on it and seeing that it has good innate support for persistent DB connections. I'm wondering when/if I should use them. Would I see performance benefits in a CRUD-heavy app? Are there downsides to consider, perhaps related to security?
If it matters to you, I'm using MySQL 5.x.

You could use this as a rough "ruleset":
YES, use persistent connections, if:
There are only few applications/users accessing the database, i.e. you will not result in 200 open (but probably idle) connections, because there are 200 different users shared on the same host.
The database is running on another server that you are accessing over the network
An (one) application accesses the database very often
NO, don't use persistent connections, if:
Your application only needs to access the database 100 times an hour.
You have many webservers accessing one database server
You're using Apache in prefork mode. It uses one connection for each child process, which can ramp up fairly quickly. (via #Powerlord in the comments)
Using persistent connections is considerable faster, especially if you are accessing the database over a network. It doesn't make so much difference if the database is running on the same machine, but it is still a little bit faster. However - as the name says - the connection is persistent, i.e. it stays open, even if it is not used.
The problem with that is, that in "default configuration", MySQL only allows 1000 parallel "open channels". After that, new connections are refused (You can tweak this setting). So if you have - say - 20 Webservers with each 100 Clients on them, and every one of them has just one page access per hour, simple math will show you that you'll need 2000 parallel connections to the database. That won't work.
Ergo: Only use it for applications with lots of requests.

In brief, my experience says that persistent connections should be avoided as far as possible.
Note that mysql_close is a no-operation (no-op) for connections that are created using mysql_pconnect. This means persistent connection cannot be closed by client at will. Such connection will be closed by mysqldb server when no activity occurs on the connection for duration more than wait_timeout. If wait_timeout is large value (say 30 min) then mysql db server can easily reach max_connections limit. In such case, mysql db will not accept any future connection request. This is when your pager starts beeping.
In order to avoid reaching max_connections limit, use of Persistent connection need careful balancing of following variables...
Number of apache processes on one host
Total number of hosts running apache
wait_timout variable in mysql db server
max_connections variable in mysql db server
Number of requests served by one apache process before it is re-spawned
So, pl use persistent connection after enough deliberation. You may not want to invite complex runtime issues for a small gain that you get from persistent connection.

Creating connections to the database is a fairly expensive operation. Persistent connections are a good idea. In the ASP.Net and Java world, we have "connection pooling", which is roughly the same thing, and also a good idea.

IMO, The real answer to this question is whatever works best for you app. I would recommend you benchmark your app using both persistent and non-persistent connections.
Maggie Nelson # Objectively Oriented posted about this in August and Robert Swarthout made an accompanying post with some hard numbers. Both are pretty good reads.

In my humble opinion:
When using PHP for web development, most of your connection will only "live" for the life of the page executing. A persistant connection is going to cost you a lot of overhead as you'll have to put it in the session or some such thing.
99% of the time a single non-persistant connection that dies at the end of the page execution will work just fine.
The other 1% of the time, you probably should not be using PHP for the app, and there is no perfect solution for you.

In general, you'll need to use non-persistent connections sometimes, and it's nice to have a single pattern to apply to db connection design (as long as there's relatively little upside to using persistent connections in your context.)

I was going to ask this same question but rather than ask the same question again I'll just add some information that I've found.
Are PHP persistent connections evil ?
Persistent Database Connections
It is also worth noting that the newer mysqli extension does not even include the option to use persistent database connections.
I'm still using persitent connections at the moment but plan to switch to non-persistent in the near future.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas