Database temporarily disconnected after a lots of transactions by pgbench - sql

I am using (PostgreSQL) 9.2.1 and test the database with pgbench.
pgbench -h 192.168.39.38 -p 5433 -t 1000 -c 40 -j 8 -C -U admin testdb
When I use the -C parameter(Establish a new connection for each transaction), the transactions are always lost after the 16381th transaction.
Connection to database "testdb" failed
could not connect to server: Can't assign requested address
Is the server running on host "192.168.39.38" and accepting
TCP/IP connections on port 5433?
Client 19 aborted in establishing connection.
Connection to database "testdb" failed
could not connect to server: Can't assign requested address
Is the server running on host "192.168.39.38" and accepting
TCP/IP connections on port 5433?
Client 19 aborted in establishing connection.
....
transaction type: TPC-B (sort of)
scaling factor: 30
query mode: simple
number of clients: 40
number of threads: 8
number of transactions per client: 1000
number of transactions actually processed: 16381/40000
tps = 1665.221801 (including connections establishing)
tps = 9487.779510 (excluding connections establishing)
And the number of transactions actually processed is always 16381 in each test.
However, pgbench can success and all transactions are processed in the circumstances that
-C is not used
or
the total transactions are less than 16381
After dropping these transactions, the database can continue to accept connection in few seconds.
I wonder if I miss some configuration of PostgreSQL.
Thanks
Edit I found that the client is blocked to connect for few seconds, but the others still can access the database. Does that mean the same client cannot send too many transactions in a short time?

I found the reason why it losses the connections after about 16000 transactions. TCP wait_time takes the blame for this mistake. The following command will show the status of TCP connections:
$ netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
Nevertheless, it does NOT show the TIME_WAIT in MAC OS X. Therefore I missed it. After I adjust the TCP wait_time by the following command, pgbench works properly.
$ sudo sysctl -w net.inet.tcp.msl=1500
net.inet.tcp.msl: 15000 -> 1500
Thanks for helping.

There is indeed a limit of maximum connections imposed by the OS. Read up on max-connections in the documentation: (bolded relevant parts)
Determines the maximum number of concurrent connections to the database server. The default is typically 100 connections, but might be less if your kernel settings will not support it (as determined during initdb). This parameter can only be set at server start.
Increasing this parameter might cause PostgreSQL to request more System V shared memory or semaphores than your operating system's default configuration allows. See Section 17.4.1 for information on how to adjust those parameters, if necessary.
That you can open only 16381 connections, is explicable by there being 2^14 (=16384) possible maximum connections minus 3 connections reserved by default for super-user connections (see documentation).

It's interesting that 16381 is so close to a power of 2.
This is largely speculation:
I'm wondering whether it's an OS thing. Looking at the TPS figures, is a new connection being created for every transaction? [Edit yes, now that I read your question properly.]
Perhaps the OS has only so many connection resources it can use, and it cannot immediately create a new connection after having made 16381 (plus a few additional ones) in the recent past?
There may be an OS setting for specifying the number of connection resources to make available, which could allow more connections to be used. Can you add some OS details to the question?
In particular I would suspect that the port number you connect from is increasing all the time and you're hitting a limit. Try "lsof -i" and see if you can catch a connection as-it-happens and see if the number is going up.

I soleved by setting to /etc/sysctl.conf:
net.ipv4.ip_local_port_range = 32768 65000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10

Related

JMeter - Execute SSH Commands in parallel

I need to simulate the below:
1. SSH (only once)
2. Execute a command on all the rows in a csv file at once.
Number of rows in the csv file is dynamic. If 10, the command needs to be executed over all the 10 rows in parallel.
Am not sure of using SSH Command Sampler here. SSH and Command are to be entered in the same sampler. How do I separate these? i.e. SSH only once and then executing the commands in parallel. Which JMeter components do I use here?
Note: Increasing the number of Threads is not an efficient option. While doing this many sessions get created. In turn hanging the terminal. This option works fine up to 10 users. Not sure if there's a limit on the number of sessions.
Thanks for your support.
Regards,
Ajith
Why do you think that Increasing the number of Threads is not an efficient option?
I would suggest moving the SSH (only once) to setUp Thread Group and put Execute a command on all the rows in a csv file at once. bit under the normal Thread Group
If the number of rows in the CSV file is dynamic - you can make the number of threads dynamic as well using __groovy() function like:
${__groovy(new File('/path/to/your/file.csv').readLines().size,)}
If you want to execute all the 10 requests (or whatever is the number of lines) at exactly the same moment you can add a Synchronizing Timer

How to login to postgresql db - After session kill (for copy database)

I tried to copy a database within the same server using postgresql server
I tried the below query
CREATE DATABASE newdb WITH TEMPLATE originaldb OWNER dbuser;
And got the below error
ERROR: source database "originaldb" is being accessed by 1 other user
So, I executed the below command
SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'originaldb' AND pid <> pg_backend_pid();
Now none of us are able to login/connect back to the database.
When I provide the below command
psql -h 192.xx.xx.x -p 9763 -d originaldb -U postgres
It prompts for a password and upon keying password, it doesn't return any response
May I understand why does this happen? How can I connect to the db back? How do I restart/make the system to let us login back?
Can someone help us with this?
It sounds like something is holding an access exclusive lock on a shared catalog, such as pg_database. If that is the case, no one will be able to log in until that lock gets released. I wouldn't think the session-killing code you ran would cause such a situation, though. Maybe it was just a coincidence.
If you can't find an active session, you can try using system tools to figure out what is going on, like ps -efl|fgrep postgre. Or you can just restart the whole database instance, using whatever method you would usually use to do that, like pg_ctl restart -D <data_directory> or sudo service postgresql restart or some GUI method if you are on an OS that does that.

ERROR : FAILED: Error in acquiring locks: Error communicating with the metastore org.apache.hadoop.hive.ql.lockmgr.LockException

Getting the Error in acquiring locks, when trying to run count(*) on partitioned tables.
The table has 365 partitions when filtered on <= 350 partitions, the queries are working fine.
when tried to include more partitions for the query, it's failing with the error.
working on Hive-managed ACID tables, with the following default values
hive.support.concurrency=true //cannot make it as false, it's throwing <table> is missing from the ValidWriteIdList config: null, should be true for ACID read and write.
hive.lock.manager=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.txn.strict.locking.mode=false
hive.exec.dynamic.partition.mode=nonstrict
Tried increasing/decreasing values for these following with a beeline session.
hive.lock.numretries
hive.unlock.numretries
hive.lock.sleep.between.retries
hive.metastore.batch.retrieve.max={default 300} //changed to 10000
hive.metastore.server.max.message.size={default 104857600} // changed to 10485760000
hive.metastore.limit.partition.request={default -1} //did not change as -1 is unlimited
hive.metastore.batch.retrieve.max={default 300} //changed to 10000.
hive.lock.query.string.max.length={default 10000} //changed to higher value
Using the HDI-4.0 interactive-query-llap cluster, the meta-store is backed by default sql-server provided along.
The problem is NOT due to service tier of the hive metastore database.
It is most probably due to too many partitions in one query based on the symptom.
I meet the same issue several times.
In the hivemetastore.log, you shall able to see such error:
metastore.RetryingHMSHandler: MetaException(message:Unable to update transaction database com.microsoft.sqlserver.jdbc.SQLServerException: The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:578)
This is due to in Hive metastore, each partition involved in the hive query requires at most 8 parameters to acquire a lock.
Some possible workarounds:
Decompose the the query into multiple sub-queries to read from fewer
partitions.
Reduce the number of partitions by setting different partition keys.
Remove partitioning if partition keys don't have any filters.
Following are the parameters which manage the batch size for INSERT query generated by the direct SQL. Their default value is 1000. Set both of them to 100 (as a good starting point) in the Custom hive-site section of Hive configs via. Ambari and restart ALL Hive related components (including Hive metastore).
hive.direct.sql.max.elements.values.clause=100
hive.direct.sql.max.elements.in.clause=100
We also faced the same error in HDInsight and after doing many configuration changes similar to what you have done, the only thing that worked is scaling our Hive Metastore SQL DB server.
We had to scale it all the way to a P2 tier with 250 DTUs for our workloads to work without these Lock Exceptions. As you may know, with the tier and DTU count, the SQL server's IOPS and response time improves thus we suspected that the Metastore performance was the root cause for these Lock Exceptions with the increase in workloads.
Following link provides information about the DTU based performance variation in SQL servers in Azure.
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu
Additionally as I know, the default Hive metastore that gets provisioned when you opt to not provide an external DB in cluster creation is just an S1 tier DB. This would not be suitable for any high capacity workloads. At the same time, as a best practice always provision your metastores external to the cluster and attach at cluster provisioning time, as this gives you the flexibility to connect the same Metastore to multiple clusters (so that your Hive layer schema can be shared across multiple clusters, e.g. Hadoop for ETLs and Spark for Processing / Machine Learning), and you have the full control to scale up or down your metastore as per your need anytime.
The only way to scale the default metastore is by engaging the Microsoft support.
We faced the same issue in HDINSIGHT. We solved it by upgrading the metastore.
The Default metastore had only 5 DTU which is not recommended for production environments. So we migrated to custom Metastore and spin the Azure SQL SERVER (P2 above 250 DTUs) and the setting the below properties:
hive.direct.sql.max.elements.values.clause=200
hive.direct.sql.max.elements.in.clause=200
Above values are set because SQL SERVER cannot process more than 2100 parameter. When you have partitions more than 348, you faced this issue as 1 partition creates 8 parameters for metastore 8 x 348

Redis mass insertions on remote server

I have a remote server running Redis where I want to push a lot of a data from a Java application. Until now I used Webdis to push one command at the time which is not efficient, but I did not have any security issues because I could define the IPs that were accepted as connections and coomand authorizations while redis was not accepting requests from outside (protected mode).
I want to try to use jedis(Java API) and the implementation of pipeline for faster insertion but that means I have to open my Redis to accept requests from outside.
My question is this: Is it possible to use webdis in a similar way(pipilined mass insertion)? And if not what are the security configurations I need to make to use something like Jedis over the internet?
Thanks in advance for any answer
IMO it should be transparent for Redis driver how you set up the security. No driver or password protection will be so secure as specifically designed protocols or technologies.
In the most simple way I'd handle the security, is letting Redis listening on 127.0.0.1:<some port> and using an SSH tunnel to the machine. At least this way you can test the performance agains your current scenario.
You can also use IPSec or OpenVPN afterwards to organize private network which is able to communicate with Redis server.
This question is almost 4 years old so I hope its author has moved on by now, but in case someone else has the same issue I thought I might suggest a way to send data to Webdis more efficiently.
You can indeed make data ingest faster by batching your inserts, meaning you can use MSET to insert multiple keys in a single request (or HMSET for hashes, etc).
As an example, here's ApacheBench (ab) inserting one key 100,000 times using 100 clients:
$ ab -c 100 -n 100000 -k 'http://127.0.0.1:7379/SET/foo/bar'
[...]
Requests per second: 82235.15 [#/sec] (mean)
We're measuring 82,235 single-key inserts per second. Keep in mind that there's a lot more to HTTP benchmarking than just looking at averages (the latency distribution is still important, etc.) but this example is only about showing the difference that batching can make.
You can send commands to Webdis in one of three ways (documented here):
GET /COMMAND/arg0/.../argN
POST / with COMMAND/arg0/.../argN in the HTTP body (demonstrated below)
PUT /COMMAND/arg0.../argN-1 with argN in the HTTP body
If instead of inserting one key per request we create a file containing the MSET command to write 100 keys in a single request, we can significantly increase the write rate.
# first showing what the command looks like for 3 keys
$ echo -n 'MSET' ; for i in $(seq 1 3); do echo -n "/key-${i}/value-${i}"; done
MSET/key-1/value-1/key-2/value-2/key-3/value-3
# then saving the command to write 100 keys to a file:
$ (echo -n 'MSET' ; for i in $(seq 1 100); do echo -n "/key-${i}/value-${i}"; done) > batch-contents.txt
With this file, we can use ab to send this multi-insert file as a POST request (-p) to Webdis:
$ ab -c 100 -n 10000 -k -p ./batch-contents.txt -T 'application/x-www-form-urlencoded' 'http://127.0.0.1:7379/'
[...]
Requests per second: 18762.82 [#/sec] (mean)
This is showing 18,762 requests per second… with each request performing 100 inserts, for a total of 1,876,282 actual key inserts per second.
If you track the CPU usage of Redis while ab is running, you'll find that the MSET use case pegs it at 100% CPU while sending individual SET does not.
Once again keep in mind that this is a rough benchmark, just enough to show that there is a significant difference when you batch inserts. This is true regardless of whether Webdis is used, by the way: batching inserts from a client connecting directly to Redis should also be much faster than individual inserts.
Note: (I am the author of Webdis)

2 SQL servers but different tempdb IO pattern on 1 spikes up and down 5MB/sec-0.2MB/sec

I have 2 MSSQL servers (lets call then SQL1 and SQL2) running a total of 1866 databases
SQL1 has 993 databases (993203 registered users)
SQL2 have 873 databases (931259 registered users)
Each SQL server has a copy of a InternalMaster database (for some shared table data) and then multiple customers, 1 database per customer (Customer/client not registered user).
At the time of writing this we had just over 10,000 users online using our software.
SQL2 behaves as expected and Database I/O is generally 0.2MB/sec and goes up and down in a normal flow, IO's goes up on certain reports and queries and so on in a random fashion.
However SQL1 has a constant pattern almost like a life support machine.
I don't understand why both servers which have the same infrastructure, work so differently? The spike starts at around 2MB/sec and then increases to a max of around 6MB/sec. Both servers have identical IOPS provisions of data, log and transaction partitions and identical AWS specs. The Data file I/O shows that tempdb is the culprit of this spike.
Any advice would be great as I just can't get my head around how 1 tempdb would act different to another when running the same software and setup on both servers.
Regards
Liam
Liam,
Please see this website that explains how to configure TEMPDB. By looking at the image, you only have one file for the TEMPDB database.
http://www.brentozar.com/sql/tempdb-performance-and-configuration/
Hope this helps