How to establish persistent connection with log4php Appender PDO? - pdo

Hi guys I'm new here and this is my first post. I have a project with log4php and I can't make a persistent connection happen.
Is it possible to establish a persistent connection (pconnect) with the LoggerAppenderPDO through the configuration in log4php? Is it possible at all?
I've tried to add 'options' to the following configuration, but it doesn't work:
array(
'appenders' => array(
'default' => array(
'class' => 'LoggerAppenderPDO',
'params' => array(
'dsn' => "mysql:host=localhost;dbname=logdb",
'user' => 'logger',
'password' => 'admin',
'options' => array(PDO::ATTR_PERSISTENT => true),
'table' => 'log4php'
)
)
),
'rootLogger' => array(
'appenders' => array('default'),
),
);
Can somebody help me out? I would really appreciate it.

The question is: "Why do you want a persistent connection?"
Log4PHP will only attempt to create ONE connection for logging during the request, so a persistent connection wouldn't make any difference during the request.
The second thing is: Persistent connections can only be created if PHP runs as an Apache module. All other situations do not support them.
Third: Persistent connections do more harm than good if not used correctly. One aspect is that for every apache child that eventually runs, the database must offer a permanent connection ability. So if you have Apache allowing 100 childs, your database must support at least 100 concurrent connections just for logging. And you probably need more connections because you do not only do logging, but also some useful stuff with the database. The other point is that if your script crashes, the connection stays open, which might mean that a lock is not released or a transaction is not canceled. And last but not least: Re-using a persistent connection uses it in the state it was left - which might not be what you think it is.
Fourth: Persistent connections to MySQL are not that much faster than regular connections, if at all. MySQL is optimized to support quick connection times (which might not be true for other databases).
So in the end, there isn't really much incentive left to use persistent connections. If you insist using them for vague "performance reasons", you should prove they are worth the hassle by measuring the impact they have on your application performance.
Update: I think I can add this performance comparison states that using persistent connections does NOT improve performance in any way. Your mileage may still vary.

Related

Performance difference between global database connection and opening connection everytime on Golang

In my current project I was opening a new database connection every time when user makes request. For example:
func login(w http.ResponseWriter, r *http.Request) {
...
db, err := sqlx.Connect("postgres", "user=postgres password=*** dbname=postgres")
if err != nil {
ErrorWithJSON(w, err.Error(), http.StatusBadRequest)
return
}
db.SetMaxIdleConns(0)
db.SetConnMaxLifetime(time.Second * 30)
user, err := loginManager(db, m)
...
err = db.Close()
}
When I searched for other people's code, I've seen that most of the developers create a global variable for database connection, set it on the main and use this variable on entire project.
I was wondering is there any difference between these approaches? If I use global variable will there be any latency when 5 different users makes requests for register/login etc. If there will be latency, should I create multiple database connections and store them in a slice for future requests so I can pick randomly when users make request. Like a simple load balancer, I don't know?
Sorry for multiple questions. Thank you!
Yes, there might be a huge performance difference (might be several order of magnitude depending on the nature of queries you run and on system and server configuration).
The sqlx.DB type wraps (embeds) an sql.DB type, which manages a pool of connections:
DB is a database handle representing a pool of zero or more underlying connections. It's safe for concurrent use by multiple goroutines.
The sql package creates and frees connections automatically; it also maintains a free pool of idle connections. If the database has a concept of per-connection state, such state can only be reliably observed within a transaction.
Every time you open a new connection, a lot of things have to happen in the "background": connection string has to be parsed, a TCP connection has to be estabilished, authentication / authorization must be performed, resources must be allocated at both sides (client and server) etc. These are just the main, obvious things. Even though some of these may be provided / implemented optimized, cached, there is still a significant overhead compared to having a single DB instance which might have multiple established, authenticated connections ready in a pool, waiting to be used / utilized.
Also quoting from sql.Open():
The returned DB is safe for concurrent use by multiple goroutines and maintains its own pool of idle connections. Thus, the Open function should be called just once. It is rarely necessary to close a DB.
sqlx.Connect() which you used calls sqlx.Open() which is "the same as sql.Open, but returns an *sqlx.DB instead".
So all in all, use a single, global sqlx.DB or sql.DB instance, and share / use that everywhere. It provides you automatic connection- and connection pool management. This will provide you the best performance. You may fine-tune the connection pool with the DB.SetConnMaxLifetime(), DB.SetMaxIdleConns() and DB.SetMaxOpenConns() methods.
Idle connections (DB.SetMaxIdleConns()) are those that are not in-use currently, but sitting in the pool, waiting for someone to pick them up. You should definitely have some of these, e.g. 5 or 10 of them, or even more. DB.SetConnMaxLifetime() controls how long a new connection may be used. Once it grows older than this, it will be closed (and a new one will be opened if needed). You shouldn't change this, default behavior is never to expire connections. Basically all defaults are sensible, you should only play with them if you experience performance problems. Also, read docs of these methods to have a clear picture.
See this similar, possible duplicate question:
mgo - query performance seems consistently slow (500-650ms)

JDBC Connection Pooling in a Tomcat Cluster Environment

I'm relatively very new to this, but I have a Tomcat cluster set up (using mod_proxy from httpd) with session replication (separate redis server) for fault-tolerance.
I have a couple of questions about this setup:
My application (spring/hibernate) has a different database per user. So the problem here is that the data source (using spring along with hibernate for persistence) is created at Tomcat level. Thus, whatever connection pooling I do will be at server level.
As per the cluster configuration the Tomcat instances will create their own Connection Pool.
I'd like to know if connection pooling is possible at a cluster level using Tomcat i.e. is there a way to make sure that all the servers in the cluster are using the shared Connection Pool?
I do not want to configure a DataSource on every Tomcat instance because of performance issues. Before the cluster setup, the application was deployed on a single server and the DataSource was configured such that it allowed only a few (50) connections in a connection pool per DataSource.
Now in a clustered environment, I cannot afford to create or split those number of connections on every Tomcat, and also dynamic registration of nodes will create further problems. I'd also like to know is there some alternative solution to this problem if connection pooling is not possible or inefficient?
I'm going to handle your questions in reverse order, since the second one is more simple.
Database connection pooling in Tomcat cannot be configured cluster-wide: you have to configure a separate pool for each node in the cluster. But this doesn't have to be bad news... there's nothing wrong with configuring a node to have 5 or 10 or 100 connections in the connection pool on each node.
It's true, you might end up with a situation where you have too many users connecting to the database at a single time which overwhelms your database, but that could also happen with a single node as well. There isn't anything conceptually different about multiple-nodes that wouldn't also be true for a single node.
the key is to make sure that your cluster balances users appropriately so that you don't have a limit of e.g. 5 database connections per node, but 100 users end up on one node while the other nodes only have 5 users per node. In that case, the popular node (100 users) will have to share those 5 connections while on the other nodes, each user gets a connection all to themselves.
Back to your first item, which is more complicated. If you have a separate database per user, then connection-pooling is an impossible thing to accomplish because you will absolutely have to establish a new connection for every user every time. Those connections aren't poolable, at least not without being quite careful about it. It sounds like you have an architectural issue that you might have to solve before you can identify a technical solution to that issue.

Multiple Redis Instances

Most folks seem to recommend running separate Redis instances on different ports (6379 and 6380). Why is this more commonly recommended over creating a second database? I'm not completely through the documentation yet, but most examples don't really mention 'selection of a Redis database' when connecting. An example from the Ruby client, nrk/predis's README:
$redis = new Predis\Client(array(
'scheme' => 'tcp',
'host' => '10.0.0.1',
'port' => 6379,
));
We currently run Hubot in our office with Campfire, and I'm working on a second one for GTalk since you can only have a single adapter in use for each Hubot instance. So I'm considering creating a second database or instance of Redis so that data between the two hubots is isolated. But before I got much further, I wanted to understand why you would use separate instances instead of just creating a second database.
Two main reasons:
using multiple databases is considered generally bad and to be deprecated some day, and they have some performance penalties, though pretty minor.
the main reason is that redis is single threaded, if you need two different data sources, another redis instance will improve performance since it will utilize another CPU you probably have, whereas one instance will always utilize just one.
Also different redis instances can have distinct persistence settings. For example one instance can use only memory and other can use files as storage
Redis Persistence
Then there are other advantages as having separate auth passwords, LRU strategies, etc - which can only be done at the instance level.

Downside to using persistent connections?

I have heard in the past that persistent connections are not good to use on a high traffic web server. Is this true, or does it only apply to apache's prefork mode? Would CGI mode have this problem?
This involves PHP, Apache, and Postgresql.
Are PHP persistent connections evil ? -- in context of PHP and MySQL.
The reason behind using persistent connections is of course reducing number of connects which are rather expensive, even though they are much faster with MySQL than with most other databases.
The first problem with persistent connections...
If you’re establishing thousands of connections per second you normally do not keep it open for long time, but Operation System does. According to TCP/IP protocol Ports can’t be recycled instantly and have to spend some time in “FIN” stage waiting before they can be recycled.
The second problem... using too many MySQL server connections.
Some people simply do not realize you can increase max_connections variable and get over 100 concurrent connections with MySQL others were beaten by older Linux problems of not being able to have more than 1024 connections with MySQL.
Lets talk now about why Persistent connections were disabled in mysqli extension. Even though you could misuse persistent connections and get poor performance that was not the reason. The real reason is – you could get much more problems with it.
Persistent connections were added to PHP during times of MySQL 3.22/3.23 when MySQL was simple enough so you could recycle connections easily without any problems. In later versions number of problems however arose – If you recycle connection which has uncommitted transactions you run into trouble. If you happen to recycle connections with custom character set settings you’re in trouble back again, not to mention about possibly changed per session variables.
One problem with using persistent connections is that it doesn't really scale that well. If you have 5000 people connected, you need 5000 persistent connections. If you take away the need for persistence, you might be able to serve 10000 people with the same number of connections because they're able to share those connections when they're not using them.

Persistent DB Connections - Yea or Nay?

I'm using PHP's PDO layer for data access in a project, and I've been reading up on it and seeing that it has good innate support for persistent DB connections. I'm wondering when/if I should use them. Would I see performance benefits in a CRUD-heavy app? Are there downsides to consider, perhaps related to security?
If it matters to you, I'm using MySQL 5.x.
You could use this as a rough "ruleset":
YES, use persistent connections, if:
There are only few applications/users accessing the database, i.e. you will not result in 200 open (but probably idle) connections, because there are 200 different users shared on the same host.
The database is running on another server that you are accessing over the network
An (one) application accesses the database very often
NO, don't use persistent connections, if:
Your application only needs to access the database 100 times an hour.
You have many webservers accessing one database server
You're using Apache in prefork mode. It uses one connection for each child process, which can ramp up fairly quickly. (via #Powerlord in the comments)
Using persistent connections is considerable faster, especially if you are accessing the database over a network. It doesn't make so much difference if the database is running on the same machine, but it is still a little bit faster. However - as the name says - the connection is persistent, i.e. it stays open, even if it is not used.
The problem with that is, that in "default configuration", MySQL only allows 1000 parallel "open channels". After that, new connections are refused (You can tweak this setting). So if you have - say - 20 Webservers with each 100 Clients on them, and every one of them has just one page access per hour, simple math will show you that you'll need 2000 parallel connections to the database. That won't work.
Ergo: Only use it for applications with lots of requests.
In brief, my experience says that persistent connections should be avoided as far as possible.
Note that mysql_close is a no-operation (no-op) for connections that are created using mysql_pconnect. This means persistent connection cannot be closed by client at will. Such connection will be closed by mysqldb server when no activity occurs on the connection for duration more than wait_timeout. If wait_timeout is large value (say 30 min) then mysql db server can easily reach max_connections limit. In such case, mysql db will not accept any future connection request. This is when your pager starts beeping.
In order to avoid reaching max_connections limit, use of Persistent connection need careful balancing of following variables...
Number of apache processes on one host
Total number of hosts running apache
wait_timout variable in mysql db server
max_connections variable in mysql db server
Number of requests served by one apache process before it is re-spawned
So, pl use persistent connection after enough deliberation. You may not want to invite complex runtime issues for a small gain that you get from persistent connection.
Creating connections to the database is a fairly expensive operation. Persistent connections are a good idea. In the ASP.Net and Java world, we have "connection pooling", which is roughly the same thing, and also a good idea.
IMO, The real answer to this question is whatever works best for you app. I would recommend you benchmark your app using both persistent and non-persistent connections.
Maggie Nelson # Objectively Oriented posted about this in August and Robert Swarthout made an accompanying post with some hard numbers. Both are pretty good reads.
In my humble opinion:
When using PHP for web development, most of your connection will only "live" for the life of the page executing. A persistant connection is going to cost you a lot of overhead as you'll have to put it in the session or some such thing.
99% of the time a single non-persistant connection that dies at the end of the page execution will work just fine.
The other 1% of the time, you probably should not be using PHP for the app, and there is no perfect solution for you.
In general, you'll need to use non-persistent connections sometimes, and it's nice to have a single pattern to apply to db connection design (as long as there's relatively little upside to using persistent connections in your context.)
I was going to ask this same question but rather than ask the same question again I'll just add some information that I've found.
Are PHP persistent connections evil ?
Persistent Database Connections
It is also worth noting that the newer mysqli extension does not even include the option to use persistent database connections.
I'm still using persitent connections at the moment but plan to switch to non-persistent in the near future.