Redis regular timeouts on Docker - redis

We are using the latest Redis Docker and finding that we get periodic timeouts (12 in the last 4 hours).
The errors are things like:
Timeout performing PING, inst: 51, mgr: ExecuteSelect, queue: 3, qu=0, qs=3, qc=0, wr=0/0, in=0/1
Timeout performing PING, inst: 51, mgr: ExecuteSelect, queue: 4, qu=0, qs=4, qc=0, wr=0/0, in=28/0
Timeout performing GET Prod.Cust.Agent.118, inst: 100, mgr: ProcessReadQueue, queue: 3, qu=0, qs=3, qc=0, wr=0/0, in=630/1
Looking at the help for Latency timeouts with Redis (http://redis.io/topics/latency) I'm wondering what if any settings need to be changed on:
a. the docker
b. the server running the docker
For instance:
The help suggests:
Transparent huge pages must be disabled from your kernel. Use echo never > /sys/kernel/mm/transparent_hugepage/enabled to disable them, and restart your Redis process.
Is that something that should be done on docker or the Server?
When I check the value on the server I get:
jhilden#Omega:~$ more /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
Thanks in advance for the advice.

Related

Why does DistributedCache SessionHandler throw connection issues?

I have a .net core app running in VM's in azure where I use Redis as an implementation for DistributedCache. This way we have user sessions stored in Redis and can be shared in the web farm. We only use Redis for storing sessions. We are using Azure Cache for Redis with a normal instance. Both the VM and Redis are in the same region.
Add in startup:
services.AddStackExchangeRedisCache(options => {
options.Configuration = configuration["RedisCache:ConnectionString"];
});
In the web app we are having intermittent problems with redis closing connections. All calls to Redis are managed by calling session Async-methods like below.
public static async Task<T> Get<T>(this ISession session, string key) {
if (!session.IsAvailable)
await session.LoadAsync();
var value = session.GetString(key);
return value == null ? default(T) : JsonConvert.DeserializeObject<T>(value);
}
The errors we are seeing are:
StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: EVAL; An existing connection was forcibly closed by the remote host.; IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=3,Free=32764,Min=512,Max=32767), Local-CPU: n/a
---> StackExchange.Redis.RedisConnectionException: SocketFailure on myredis.redis.cache.windows.net:6380/Interactive, Idle/Faulted, last: EVAL, origin: ReadFromPipe, outstanding: 1, last-read: 34s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.593.37019
---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
And
StackExchange.Redis.RedisConnectionException: SocketFailure on myredis.redis.cache.windows.net:6380/Interactive, Idle/Faulted, last: EXPIRE, origin: ReadFromPipe, outstanding: 1, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.593.37019
---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
We are not experiencing traffic spikes during the timeouts and the Redis instance is not under any heavy load.
I have no idea how to troubleshoot this further. Any idea?
The connection might be closed by Redis server because of idling for too long.
In your Azure Cache control you can find the config for Redis server, see if you can find timeout setting.
If you can issue commands through command line, you can also issue this command
CONFIG get timeout
If it's zero, it means no timeout.
Then the issue is with your redis client. I'm not familiar with .Net, whatever client you're using to connect to Redis server, check the timeout option or Google search (Name of the client)+ timeout and see if you can find any useful information.

Can not run Redis commands using StackExchangeRedis

Hello i am trying to connect to a Redis database from a ASP NET Core 3.1 application and i keep getting this error when i issue a command.
> 'No connection is active/available to service this operation: SET a; A
> blocking operation was interrupted by a call to
> WSACancelBlockingCall., mc: 1/1/0, mgr: 10 of 10 available,
> clientName: [ClientName], IOCP: (Busy=2,Free=998,Min=8,Max=1000),
> WORKER:
I think it has something to do with the library StackExchangeRedis since until now it worked, up until it stopped working randomly.I have updated to the last version, restarted pc, whatever and nothing.
I can connect to my local redis and issue commands with both the Redis-Cli and using telnet 127.0.0.1 6379 , so that is why i think the culprit is the library.
ConnectionString
localhost:6379,ssl=True,allowAdmin=True,abortConnect=False,defaultDatabase=0
How i use it:
var con=ConnectionMultiplexer.Connect(connectionString); //passes
con.GetDatabase().StringSet("a","a"); //throws
If just using it for localhost development purposes you can try disabling ssl : localhost:6379,**ssl=false**,allowAdmin=True,abortConnect=False,defaultDatabase=0

Error while running query on Impala with Superset

I'm trying to connect impala to superset, and when I test the connection prints: "Seems OK!", and when I try to see databases on impala with the SQL Editor in the left side it shows all databases without problems.
Preview of Databases/Tables
But when i write a query and click on "Run Query", it gives the error: "Could not start SASL: b'Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Ticket expired)'"
Error running query
I'm running superset with SSL and in production mode (with Gunicorn) and Impala with SSL in a Kerberized Hadoop Cluster, and my impala database config is:
Impala Config
And in the extras I put:
{
"metadata_params": {},
"engine_params": {
"connect_args": {
"port": 21050,
"use_ssl": "True",
"ca_cert": "path/to/my/ca_cert.pem",
"auth_mechanism": "GSSAPI"
}
},
"metadata_cache_timeout": {},
"schemas_allowed_for_csv_upload": []
}
How can I solve this error? In my superset log it only shows:
Triggering query_id: 65
INFO:superset.views.core:Triggering query_id: 65
Query 65: Running query on a Celery worker
INFO:superset.views.core:Query 65: Running query on a Celery worker
Versions: Superset 0.36.0, Impyla 0.16.2
I was able to fix this error doing this steps:
1 - Created service user for celery-worker, created a kerberos ticket for him and created a crontab to renew the ticket.
2 - Runned celery worker from this service user, instead running from root.
3 - Killed an celery-worker that was running in another machine of my cluster
4 - Restarted Impala and Superset
I think this error ocurred because in some queries instead of use the celery worker in my superset machine, it was using the celery worker that was in another machine without a valid kerberos ticket. I could fix this error because when I was reading celery-worker log , it showed that a connection with the celery worker in other machine failed in a query running.

RabbitMQ Ack Timeout

I'm using RPC Pattern for processing my objects with RabbitMQ.
You suspect,I have an object, and I want to have that process finishes and After that send ack to RPC Client.
Ack as default has a timeout about 3 Minutes.
My process Take long time.
How can I change this timeout for ack of each objects or what I must be do for handling these like processess?
Modern versions of RabbitMQ have a delivery acknowledgement timeout:
In modern RabbitMQ versions, a timeout is enforced on consumer delivery acknowledgement. This helps detect buggy (stuck) consumers that never acknowledge deliveries. Such consumers can affect node's on disk data compaction and potentially drive nodes out of disk space.
If a consumer does not ack its delivery for more than the timeout value (30 minutes by default), its channel will be closed with a PRECONDITION_FAILED channel exception. The error will be logged by the node that the consumer was connected to.
Error message will be:
Channel error on connection <####> :
operation none caused a channel exception precondition_failed: consumer ack timed out on channel 1
Timeout by default is 30 minutes (1,800,000ms)note 1 and is configured by the consumer_timeout parameter in rabbitmq.conf.
note 1: Timeout was 15 minutes (900,000ms) before RabbitMQ 3.8.17.
if you run rabbitmq in docker, you can describe volume with file rabbitmq.conf, then create this file inside volume and set consumer_timeout
for example:
docker compose
version: "2.4"
services:
rabbitmq:
image: rabbitmq:3.9.13-management-alpine
network_mode: host
container_name: 'you name'
ports:
- 5672:5672
- 15672:15672 ----- if you use gui for rabbit
volumes:
- /etc/rabbitmq/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
And you need create file
rabbitmq.conf
on you server by this way
/etc/rabbitmq/
documentation with params: https://github.com/rabbitmq/rabbitmq-server/blob/v3.8.x/deps/rabbit/docs/rabbitmq.conf.example

CumulocityLongPollingTransport - canceling the long poll request because of inactivity

I am using the Cumulocity java agent (7.38.0) and it apparently lost communication with the server somehow and never recovered. The admin interface says:
LAST COMMUNICATION
November 22, 2016 2:25 AM
and last cumulo record in the the device syslog was:
Nov 22 01:25:47 localhost root: 01:25:47.166 [CumulocityLongPollingTransport-scheduler-2] WARN c.c.s.c.n.ConnectionHeartBeatWatcher - canceling the long poll request because of inactivity
(there was 1 hour time diff due to some device config prob.)
process looks running anyways:
ps -ef | grep -i c8y
root 1341 1257 0 Nov19 ? 00:00:00 /bin/sh ./c8y-agent.sh
root 1342 1341 0 Nov19 ? 00:00:00 /bin/sh ./c8y-agent.sh
root 1344 1342 0 Nov19 ? 00:25:39 java -cp cfg/*:lib/* -Dlogback.configurationFile=cfg/logback.xml c8y.lx.agent.Agent
Has anyone seen this prob before?
We had it once or twice when people were connecting to cumulocity via firewall or vpn. The result was exactly as you described: the polling gets stuck after some time, like if connections were blocked. In other words i would suspect that it’s a proxy that’s blocking the reconnect.