In a WAS Liberty connection pool, can I validate connections on borrow? - datasource

We are currently migrating an applications to run on a Liberty server (8.5.5.9). We have found that connections between the app server and the database are occasionally terminated by the firewall, for being idle for an extended period of time. When this happens, on the next HTTP request, the application will receive one of these broken connections.
Previously, we had been using Apache Commons DBCP to manage the connection pool. One of the configuration parameters in a DBCP conneciton pool is to "testOnBorrow", which prevents the application from being handed one of these bad connections.
Is there such a configuration parameter in a Liberty-managed datasource?
So far, we have configured our datasource like this:
<dataSource jndiName="jdbc/ora" type="javax.sql.DataSource">
<properties.oracle
user="example" password="{xor}AbCdEfGh123="
URL="jdbc:oracle:thin:#example.com:1521:mydb"
/>
<connectionManager
minPoolSize="3" maxPoolSize="10" maxIdleTime="10m"
purgePolicy="ValidateAllConnections"
/>
<jdbcDriver id="oracle-driver" libraryRef="oracle-libs"/>
</dataSource>
The purgePolicy currently is set to validate all connections if one bad one is found (e.g., overnight when all connection have been idle for a long time). But all this does is prevent multiple bad connection from being sequentially handed to the applications.
One option in the connectionManager would be to set an agedTimout="20m" to automatically remove connections that are old enough to have already been terminated by the firewall. However, this would also terminate connections that have been recently used (which prevents the firewall from breaking them).
Am I missing something obvious here?
Thanks!

In this scenario I would reccommend using the maxIdleTime, which you are already using, but reduce your minPoolSize to 0 (or remove it, since the default value is 0).
Per the maxIdleTime doc:
maxIdleTime: Amount of time after which an unused or idle connection can be discarded during pool maintenance, if doing so does not reduce the pool below the minimum size.
Since you have your minPoolSize=3, the pool maintenence won't kick in if there are only 3 bad connections in the pool for example, because the maintenance thread won't won't take the pool size below the minimum according the the doc. So setting the minPoolSize=0 should allow the maxIdleTime to clean up all of the bad connections like you would expect in this scenario.
So here is the final configuration that I would suggest for you:
<dataSource jndiName="jdbc/ora" type="javax.sql.DataSource">
<properties.oracle user="example" password="{xor}AbCdEfGh123="
URL="jdbc:oracle:thin:#example.com:1521:mydb"/>
<connectionManager maxPoolSize="10" maxIdleTime="18m"/>
<jdbcDriver id="oracle-driver" libraryRef="oracle-libs"/>
</dataSource>
The value of maxIdleTime assumes that your firewall kills the connections after 20 mins, and to trigger cleanup after 18 mins in order to give the cleanup thread a 2 minute window to clean up the soon-to-be-bad connections.

it's an old question but it should be usefull to someone else :
you can use "validationTimeout" property of "dataSource". According to the documentation "when specified, pooled connections are validated before being reused from the connection pool.".
This will not close the connections as soon as they will be cut by the firewall but this will prevent the application to crash beacause of a stale connection.
You can then combine this with purgePolicy="ValidateAllConnections" to revalidate all connections as soon as one is detected as stale.
Reference : https://openliberty.io/docs/21.0.0.1/reference/config/dataSource.html#dataSource

Related

Pooling, Client Checked out, idleTimeoutMillis

This is my understanding after reading the Documents:
Pooling, like many other DBs, we have only a number of allowed connections, so you guys all line-up and wait for a free connection returned to the pool. (a connection is like a token in a sense)
at any given time, number of active and/or available connections is controlled in the range of 0-max.
idleTimeoutMillis said is "milliseconds a client must sit idle in the pool and not be checked out before it is disconnected from the backend and discarded." Not clear on this. Generally supposed when a client say a web app has done its CRUD but not return the connection voluntarily believed is idle. node-postgres will start the clock, once reaches the number of milliseconds will take the connection back to the pool for next client. So what is not be checked out before it is disconnected from the backend and discarded?
Say idleTimeoutMillis: 100, does it mean this connection will be literally disconnected (log-out) after idle for 100 millisecond? If yes then it's not returning to the pool and will result in frequent login connection as the doc said below:
Connecting a new client to the PostgreSQL server requires a handshake
which can take 20-30 milliseconds. During this time passwords are
negotiated, SSL may be established, and configuration information is
shared with the client & server. Incurring this cost every time we
want to execute a query would substantially slow down our application.
Thanks in advance for the stupid questions.
Sorry this question was not answered for so long but I recently came across a bug which questioned my understanding of this library too.
Essentially when you're pooling you're saying to the library you can have a maximum of X connections to the Database simultaneously open. So every request that comes into a CRUD API for example will open a new connection and you will have a total of X requests possible as each request opens a new connection. Now that means as soon as a request comes in it 'checks out' a connection from the pool. This also means another request cannot use this connection as it is currently blocked by another request.
So in order to let's say 'reuse' the same connection when one request is done with that connection you have to release it and say it's ready to use again 'checking out'. Now when another request comes in it is able to use this connection and do the aforementioned query.
idleTimeoutMillis this variable to me is very confusing to me and took a while to get my head around. When there is an open connection to a DB which has been released or 'checked out' it is in an IDLE state, which means that anyone wanting to make a request can make a request with this connection as it is not being used. This variable says that when a connection is in an IDLE state how long do we wait until we can close this connection. For various things this may be used. Obviously having open DB connections requires memory and so forth so closing them might be beneficial. Also when autoscaling - let's say you been at max requests/second and and you're using all DB conns then this is useful to keep IDLE connections open for a bit. However if this is too long and you scale down then you can run into prolonged memory as each IDLE connection will require some memory space.
The benefit of this is when you have an open connection and just send a query with it you don't need to re-authenticate with the DB it's authenticated and ready to go.

JDBC Connection Pooling in a Tomcat Cluster Environment

I'm relatively very new to this, but I have a Tomcat cluster set up (using mod_proxy from httpd) with session replication (separate redis server) for fault-tolerance.
I have a couple of questions about this setup:
My application (spring/hibernate) has a different database per user. So the problem here is that the data source (using spring along with hibernate for persistence) is created at Tomcat level. Thus, whatever connection pooling I do will be at server level.
As per the cluster configuration the Tomcat instances will create their own Connection Pool.
I'd like to know if connection pooling is possible at a cluster level using Tomcat i.e. is there a way to make sure that all the servers in the cluster are using the shared Connection Pool?
I do not want to configure a DataSource on every Tomcat instance because of performance issues. Before the cluster setup, the application was deployed on a single server and the DataSource was configured such that it allowed only a few (50) connections in a connection pool per DataSource.
Now in a clustered environment, I cannot afford to create or split those number of connections on every Tomcat, and also dynamic registration of nodes will create further problems. I'd also like to know is there some alternative solution to this problem if connection pooling is not possible or inefficient?
I'm going to handle your questions in reverse order, since the second one is more simple.
Database connection pooling in Tomcat cannot be configured cluster-wide: you have to configure a separate pool for each node in the cluster. But this doesn't have to be bad news... there's nothing wrong with configuring a node to have 5 or 10 or 100 connections in the connection pool on each node.
It's true, you might end up with a situation where you have too many users connecting to the database at a single time which overwhelms your database, but that could also happen with a single node as well. There isn't anything conceptually different about multiple-nodes that wouldn't also be true for a single node.
the key is to make sure that your cluster balances users appropriately so that you don't have a limit of e.g. 5 database connections per node, but 100 users end up on one node while the other nodes only have 5 users per node. In that case, the popular node (100 users) will have to share those 5 connections while on the other nodes, each user gets a connection all to themselves.
Back to your first item, which is more complicated. If you have a separate database per user, then connection-pooling is an impossible thing to accomplish because you will absolutely have to establish a new connection for every user every time. Those connections aren't poolable, at least not without being quite careful about it. It sounds like you have an architectural issue that you might have to solve before you can identify a technical solution to that issue.

Multiple open sql connections

On a single page load I see several connections open. Although there are a at least 20 calls to the database, I see around 8 connections open before slowly dropping off. Each call is wrapped in a using statement, uses OpenStatelessSession and a singleton factory for nhibernate object. Shouldn't I only see a single connection open or is this normal behavior? I'm concerned because this is a high traffic site.
Each session always runs on a separate connection. If your site is exhausting the connection pool under load, you can switch to a session-per-request architecture, but that will of course consume more memory and CPU cycles due to the increased size of the first level cache.

If you have two distinct Data Source Connections in ColdFusion with the same settings do they share the same pool?

I have created 2 distinct data source connections (to MS SQL Server 2008) in the ColdFusion Administrator that have exactly the same settings except for the actual name of the connection. My question is will this create two distinct connection pools or will they share one?
They will have different pools. The pools are defined at the data source level and you have two distinct data sources as far as ColdFusion is concerned. Why would you have two different data sources with the exact same settings? I guess if you wanted to force them to use different connection pools. I can't think of a reason why though.
I found this page that documents how database connections are handled in ColdFusion. Note that the "Maintain Database Connections" setting is set for each data source.
Here is the section related to connection pooling from that page (in case it goes away):
If the "Maintain Database Connections" is set for a data source, how does ColdFusion Server maintain the connection pool?
When "Maintain Database Connections" is set for a data source, ColdFusion keeps the connection open after its first connection to the database. It does not log out of the database after this first connection. You can change this setting according to the instructions in step d above. Another setting in the ColdFusion Administrator, called "Limit cached database connection inactive time to X minutes," closes a "maintained" database connection after X inactive minutes. This setting is server wide and determines when a "maintained" connection is finally closed. You can modify this setting by going to the "Caching" tab within the ColdFusion Administrator. The interface for modifying the "Limit cached database connection inactive time to X minutes" looks like the following:
If a request is using a data source connection that is already opened, and another request to the data source comes in, a new connection is established. Since only one request can use a connection at any time, the simultaneous request will open up a new connection because no idle cached connections are available. The connection pool can increase up to the setting for simultaneous connections limit which is set for each data source. This setting, called, "Limit Connections," is in the ColdFusion Administrator. Click on one of the data source tabs and then click on one of your data sources. Click on "CF Settings" and put a check next to "Limit Connections" and enter a number in the sentence, "Enable the limit of X simultaneous connections." Please note that if you do not set this under the data source setting, ColdFusion Server will use the server wide "Simultaneous Requests" setting.
At this point, there is a pool of two database connections that ColdFusion Server maintains. Each connection remains in the pool until either the "Connection Timeout" period is reached or exceeds the inactivity time. If neither of the first two options are implemented, the connections remain in the pool until ColdFusion is restarted.
The "Connection Timeout" setting closes the connection and eliminates it from the pool whether or not it has been active or inactive. If the process is active, it will not terminate the connection. You can change this setting by going to "CF Settings" for your data source in the ColdFusion Administrator. Note: Only the "Cached database connection inactive time" setting will end the connection and eliminate it from the pool if it hasn't been used. You can also use the "Connection Timeout" to override the"Cached database connection inactive" setting as it applies only to a single data source, not all data sources.
They have different pools. Pooling is implemented by cf java code. (Or was that part in the jrun code.... ). It doesn't use any jdbc based pooling. Cf10 could have switched to jdbc based pooling but I doubt it.
As a test
Set the 'verify connection' sql to wait-for delay '00:01:00' or similar (wait for 1 minute) on both pools. As pool access is single-threaded for each pool - including the time taken to run the verify - have 2 pages each accessing a different data source , request both. If they complete after 1 minute it's 2 pools, if one page takes 1 minute and the other takes 2 minutes - it's one pool
As a side note, if during this 1 minute verify you yank out the network cable (causing the jdbc socket to stay open forever waiting for a response ) your thread pool is now dead and you need to restart CF
Try to create temporary table with two different datasource, if you get error for second query it use same pool and run perfectly file means different pool.

Downside to using persistent connections?

I have heard in the past that persistent connections are not good to use on a high traffic web server. Is this true, or does it only apply to apache's prefork mode? Would CGI mode have this problem?
This involves PHP, Apache, and Postgresql.
Are PHP persistent connections evil ? -- in context of PHP and MySQL.
The reason behind using persistent connections is of course reducing number of connects which are rather expensive, even though they are much faster with MySQL than with most other databases.
The first problem with persistent connections...
If you’re establishing thousands of connections per second you normally do not keep it open for long time, but Operation System does. According to TCP/IP protocol Ports can’t be recycled instantly and have to spend some time in “FIN” stage waiting before they can be recycled.
The second problem... using too many MySQL server connections.
Some people simply do not realize you can increase max_connections variable and get over 100 concurrent connections with MySQL others were beaten by older Linux problems of not being able to have more than 1024 connections with MySQL.
Lets talk now about why Persistent connections were disabled in mysqli extension. Even though you could misuse persistent connections and get poor performance that was not the reason. The real reason is – you could get much more problems with it.
Persistent connections were added to PHP during times of MySQL 3.22/3.23 when MySQL was simple enough so you could recycle connections easily without any problems. In later versions number of problems however arose – If you recycle connection which has uncommitted transactions you run into trouble. If you happen to recycle connections with custom character set settings you’re in trouble back again, not to mention about possibly changed per session variables.
One problem with using persistent connections is that it doesn't really scale that well. If you have 5000 people connected, you need 5000 persistent connections. If you take away the need for persistence, you might be able to serve 10000 people with the same number of connections because they're able to share those connections when they're not using them.