Aerospike read times out before max retries is reached - aerospike

I have the following config for aerospike read policy:
clientPolicy.timeout = 200; // timeout for refreshing cluster status, shouldn't affect reads
clientPolicy.readPolicyDefault.socketTimeout = 30;
clientPolicy.readPolicyDefault.totalTimeout = 110;
clientPolicy.readPolicyDefault.maxRetries = 2;
clientPolicy.readPolicyDefault.sleepBetweenRetries = 0;
According to what I found in Aerospike docs this should result in 3 read attempts 30ms max for each (1 initial + 2 retries), which in total is 90 ms and it is less than total timeout of 110 ms.
But in application logs I see timeout exceptions after 1 retry:
org.springframework.dao.QueryTimeoutException: Client timeout: iteration=1 connect=0 socket=30 total=110 maxRetries=2 node= inDoubt=false;
nested exception is com.aerospike.client.AerospikeException$Timeout: Client timeout: iteration=1 connect=0 socket=30 total=110 maxRetries=2 node= inDoubt=false
...
Caused by: com.aerospike.client.AerospikeException$Timeout: Client timeout: iteration=1 connect=0 socket=30 total=110 maxRetries=2 node= inDoubt=false
Is there anything I'm missing? Maybe there are more actions that occur and are included in this total timeout?

Seems like to me you don't have a connection to a node. Try with clientPolicy.timeout = 1000 (default). You may have timed out in trying to establish an initial connection to a node.

Related

aerospike connect timeout works incorrectly?

I'm using aerospike java client v 6.0.1 and use the following configs from client read policy:
clientPolicy.readPolicyDefault.connectTimeout = 1000;
clientPolicy.readPolicyDefault.socketTimeout = 30;
clientPolicy.readPolicyDefault.totalTimeout = 110;
clientPolicy.readPolicyDefault.maxRetries = 2;
clientPolicy.readPolicyDefault.sleepBetweenRetries = 0;
but I'm getting the following errors from time to time, which say that not all retries were used and timeout occurred:
org.springframework.dao.QueryTimeoutException: Client timeout: iteration=0 connect=1000 socket=30 total=110 maxRetries=2 node=null inDoubt=false; nested exception is com.aerospike.client.AerospikeException$Timeout: Client timeout: iteration=0 connect=1000 socket=30 total=110 maxRetries=2 node=null inDoubt=false
org.springframework.dao.QueryTimeoutException: Client timeout: iteration=1 connect=1000 socket=30 total=110 maxRetries=2 node=A2 node_ip 3000 inDoubt=false; nested exception is com.aerospike.client.AerospikeException$Timeout: Client timeout: iteration=1 connect=1000 socket=30 total=110 maxRetries=2 node=A2 node_ip 3000 inDoubt=false
Does it mean that total operation timeout also involves connect to Aerospike node? Aerospike docs state that total timeout starts after connect timeout finishes:
If connectTimeout is greater than zero, it will be applied to creating a connection plus optional user authentication and TLS handshake. When the connect completes, socketTimeout/totalTimeout is then applied. In this case, totalTimeout starts after the connection completes. see https://discuss.aerospike.com/t/understanding-timeout-and-retry-policies/2852
99% of all my requests to aerospike take less than 20 ms and it doesn't make sense for me to increate total timeout.
Originally I had 200-300 ms connect timeout and I increased it to 1000 ms, but it didn't help much
Transactions can sometimes timeout before the transaction has started. For example, async transactions can be throttled and can exist in the delay queue for longer than totalTimeout. If this occurs, a timeout exception is generated with iteration=0.
Anytime totalTimeout is reached, the transaction is cancelled regardless of the number of retries.
If connectTimeout is used and a new connection is required (no available connections in the pool) for the transaction, the connectTimeout is applied to connection creation and the totalTimeout stopwatch does not start until the new connection is created.
If connectTimeout is used and an existing connection is available from the pool, the connectTimeout is not applicable and the totalTimeout stopwatch starts from the beginning of the transaction.
Since most transactions are able to obtain connections from the pool, it's not surprising that increasing connectTimeout has little effect.

JedisConnectionException Read timed out intermittently

My application is running on ECS cluster and Redis is ap part of docker on ECS .
The application runs fine for a week or more but all of a sudden it started throwing Time out exception .
The issue reported in below block of query
api.query("MATCH (ag:dGrp{v:" + rec.DocGroupId + "}),(pg:resUGrp{v:" + rec.userGroupUID + "}) CREATE (pg)-[:dgE{ppv:" + ppv + ","+IdentifierFlag+":1}]->(ag)");
Full stack Trace
redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:205)
at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:43)
at redis.clients.jedis.Protocol.process(Protocol.java:155)
at redis.clients.jedis.Protocol.read(Protocol.java:220)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:283)
at redis.clients.jedis.Connection.getOne(Connection.java:261)
at redis.clients.jedis.Jedis.sendCommand(Jedis.java:4119)
at com.redislabs.redisgraph.impl.api.ContextedRedisGraph.sendQuery(ContextedRedisGraph.java:52)
at com.redislabs.redisgraph.impl.api.RedisGraph.sendQuery(RedisGraph.java:68)
at com.redislabs.redisgraph.impl.api.AbstractRedisGraph.query(AbstractRedisGraph.java:46)
and this
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:127)
at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:199)
when we restart our ECS task the problem disappear and comes back after a week .
We have increased max connection to 160 .
When we try to re produce this issue in Lower env by putting heavy request and in bulk for a week also but this issue we were not able to re produce .
We are using Radis 3.5.1 version .
jedis redis -client.jar(version 3.5.1)
Redis Time out Configuration set in redis.conf file:
timeout 0
bind 127.0.0.1
tcp-backlog 511
tcp-keepalive 300
lua-time-limit 5000
loadmodule /usr/lib64/redis/modules/redisgraph.so
Socket Time out in my Code
socket.setSoTimeout(60000);//keep connection for max 60s when idle
this time out is from import java.net.Socket;
RedisGraph Query
And at Reis Graph Query Level we use this Method that does not Time out parameter so i assume it might be taking default one .
/**
* Execute a Cypher query.
* #param graphId a graph to perform the query on
* #param query Cypher query
* #return a result set
*/
public ResultSet query(String graphId, String query) {
return sendQuery(graphId, query);
}
But it has one method that has Timeout parameter also
/**
* Execute a Cypher query with timeout.
* #param graphId a graph to perform the query on
* #param timeout
* #param query Cypher query
* #return a result set
*/
#Override
public ResultSet query(String graphId, String query, long timeout) {
return sendQuery(graphId, query, timeout);
}
I see you are doing graph query which could take more time than general Redis commands. So, for this purpose you can try increasing socket timeout.

Some issues with AspNet Core SignalR KeepAlive timeout

In our project I have set the SignalR as follow :
services.AddSignalR()
.AddHubOptions<NotificationHub>(options =>
{
const int keepAliveIntervalInSeconds=60;
options.EnableDetailedErrors=true;
options.ClientTimeoutInterval = TimeSpan.FromSeconds(2 * keepAliveIntervalInSeconds);
options.HandshakeTimeout = TimeSpan.FromSeconds(keepAliveIntervalInSeconds);
options.KeepAliveInterval = TimeSpan.FromSeconds(keepAliveIntervalInSeconds);
});
but it is not working as it supposed to do. I am getting an error in the client that says :
[2020-06-03T09:48:44.367Z] Error: Connection disconnected with error 'Error: Server timeout elapsed without receiving a message from the server.'.
Is there anything that I am doing wrong here ?
Error: Connection disconnected with error 'Error: Server timeout elapsed without receiving a message from the server.'
In the "Configure server options" section of this doc, we can find:
The default value of KeepAliveInterval is 15 seconds. When changing KeepAliveInterval, we need to change the ServerTimeout/serverTimeoutInMilliseconds setting on the client side too. And the recommended ServerTimeout/serverTimeoutInMilliseconds value is double the KeepAliveInterval value.
And the default timeout value of serverTimeoutInMilliseconds is 30,000 milliseconds (30 seconds), if you just update KeepAliveInterval setting of your SignalR hub to 60 seconds but not change the serverTimeoutInMilliseconds value on your client side, which would cause above error.

ElastiCache - Redis Cluster mode enabled fails to write data in-between

I am using a redis cluster of node type m5.4x large with 1 node , in-order to cache some results. The writes to this redis node is very frequent. And I could see that intermittently the writes to cluster fails.
Below is the stack trace we see in logs for the failure.
org.redisson.client.WriteRedisConnectionException: Unable to send
command! Node source:
NodeSource[slot=null,addr=null,redisClient=null,redirect=null,entry=org.redisson.connection.MasterSlaveEntry#6608962a],
connection: [id: 0xbad70cba, L:0.0.0.0/0.0.0.0:47904], command:
(EVAL),params: [local insertable = false; local value =
redis.call('hget',KEYS[1], ARGV[5]); local t, val;if value ..., 8,
SEARCH_CACHE, redisson__timeout__set:{SEARCH_CACHE},
redisson__idle__set:{SEARCH_CACHE},
redisson_map_cache_created:{SEARCH_CACHE},
redisson_map_cache_updated:{SEARCH_CACHE},
redisson__map_cache__last_access__set:{SEARCH_CACHE},
redisson_map_cache_removed:{SEARCH_CACHE},
{SEARCH_CACHE}:redisson_options, ...] at
org.redisson.command.CommandAsyncService.checkWriteFuture(CommandAsyncService.java:675)
at
org.redisson.command.CommandAsyncService.access$100(CommandAsyncService.java:84)
at
org.redisson.command.CommandAsyncService$9$1.operationComplete(CommandAsyncService.java:638)
at
org.redisson.command.CommandAsyncService$9$1.operationComplete(CommandAsyncService.java:635)
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:485)
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at
io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987)
at
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:869)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1371)
at
io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
at
io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
at
io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
at
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081)
at
io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128)
at
io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745) Caused by:
java.nio.channels.ClosedChannelException at
io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown
Source)
I am using redisson client version 3.6.5.
Can someone please help me to identify what is the issue?
Below is the configuration I have setup for redis cluster connection
idleConnectionTimeout: 1000
pingTimeout: 10000
connectTimeout: 10000
timeout: 30000
retryAttempts: 3
retryInterval: 1500
reconnectionTimeout: 30000
failedAttempts: 3
subscriptionsPerConnection: 5
slaveSubscriptionConnectionMinimumIdleSize: 1
slaveSubscriptionConnectionPoolSize: 50
slaveConnectionPoolSize: 250
masterConnectionMinimumIdleSize: 5
masterConnectionPoolSize: 250

JAXWS and Http Post Timeout in GlassFish v3.0.1

I am trying to set connect and request timeout for a JAX WS and Http Post calls.
My code works, but only a maximum of 20 seconds.
That is I can change the timeout value to 5 seconds, 2 seconds, it works, but setting the timeout value to 30 seconds will time out at 20 seconds, setting the time out value to 60 seconds will still timeout at 20 seconds.
Does anybody know where is that maximum of 20 seconds set??
For JAXWS:
//This works, timed out in 10 seconds
((BindingProvider) soapPort).getRequestContext().put(JAXWSProperties.CONNECT_TIMEOUT, 10000);
// This would time out in 20 seconds!!!
((BindingProvider) soapPort).getRequestContext().put(JAXWSProperties.CONNECT_TIMEOUT, 60000);`
For Http:
// This works, timed out in 10 seconds
HttpConnectionParams.setConnectionTimeout(params, 10000);
// This would time out in 20 seconds!!!
HttpConnectionParams.setConnectionTimeout(params, 50000);
The default JAX-WS runtime for Glassfish is Metro 2.0. See the 5.6. HTTP Timeouts section in the Metro guide, so we have:
// setConnectTimeout()
int timeout = ...;
Map<String, Object> ctxt = ((BindingProvider)proxy).getRequestContext();
ctxt.put(JAXWSProperties.CONNECT_TIMEOUT, timeout);
// setReadTimeout()
int timeout = ...;
Map<String, Object> ctxt = ((BindingProvider)proxy).getRequestContext();
ctxt.put("com.sun.xml.ws.request.timeout", timeout);
Only as a guide, you can see that there are three parameters for Websphere (in Metro 2.0 only two), which are:
CONNECTION_TIMEOUT: The amount of time WebSphere JAX-WS client would wait to establish a http/https connection (default is 180 seconds)
WRITE_TIMEOUT: The amount of time the client would wait to finish sending the request (default is 300 seconds)
RESPONSE_TIMEOUT: The amount of time the client would wait to finish receiving the response (default is 300 seconds)