Redis Exceptions with ServiceStack - redis

I periodically I get these exceptions:
RedisResponseException
Unexpected reply: +OK, sPort: 60957, LastCommand:
It seems to happen when lots of activity occurs simultaneously. Using even the latest Amazon ElastiCache server, as well as local Mac & Ubuntu flavors.
Other errors occur to but this is the most common. Is there some gotcha with Redis in terms of config settings etc?

Are you using PooledRedisClientManager or BasicRedisClientManager?
I got a lot of Unexpected reply whith BasicRedisClientManager

I had the same error (Unexpected reply) on a multi-threaded Azure Worker Role.
Solution:
The ServiceStack.Redis.RedisClient is not Thread safe.
On multithreaded applications you need to create a new insance of the RedisClient for each thread!
var myRedisClient = new RedisClient(accessKey);
...or use the client manager as fvoncina said. But I didn´t use it so far.

Related

WCF InstancePersistenceCommand Exception

I have a WCF application which consists in some async communications with ecternal services. When we start a new expedient, a new instance is created; it process data and send an xml to a external service and waits for the response. This response requires that a person review the xml and send the response so it usually it is delayed for a long time. For this reason, the workflow go to idle and we use persistence with AppFabric.
The fact is that sometime, when we receive the response, the next exception is raised:
The execution of the InstancePersistenceCommand named {urn:schemas-microsoft-com:System.Activities.Persistence/command}LoadWorkflowByInstanceKey was interrupted by an error.
Normally this error does not occur, it can occur very sporadically. However, we are trying to update the app to include a new functionality (it does not modify the workflow) but when the application is deployed to the server, the instances that were created with the old deployment and were waiting for the response, throw this exception when they receive the response from the external service. However, the instances initiated with the new deployment process the response without problem.
I have been looking for information about this problem but I haven't found much. Anybody can help me?
SOLUTION:
Thanks a lot for your answer, it may be helpful for me in the future. In this case, the problem was that I was updating an assembly version of one of the implicated project (to upload a nuget package) and for a reason that I don’t understand, the instances created with an old version raised this exception when the service with the new version had to manipulate the mentioned instances.
If I change the assembly version to upload the nuget and then set the original version and deploy with this version, everything works ok. Anybody knows what is the reason?
Thanks a lot.
This may be because there is a program running in the background and trying to extend the lock on the instance store every 30 seconds, and it seems that whenever the connection to the SQL service fails, it marks the instance store as invalid.
You can try <workflowIdle timeToUnload="0"/>, if it doesn't work you can look at the methods provided by other links.
Windows workflow 4.0 InstancePersistenceCommand Error
Why do I get exception "The execution of the InstancePersistenceCommand named LoadWorkflowByInstanceKey was interrupted by an error"
WF4 InstancePersistenceCommand interrupted

Google cloud run redis client loses connection to the instance

I use Google Cloud Run with Google Memorystore Redis.
My application is written on Node.js (Express) and connected to the Redis Instance in this way:
const asyncRedis = require("async-redis");
const redisClient = asyncRedis.createClient(process.env.REDISPORT, String(process.env.REDISHOST));
redisClient.on('error', (err) => console.error('ERR:REDIS:', err));
.....
const value = await redisClient.get("Code");
Every half an hour the application loses its connection to the Redis and I receive the error:
AbortError: Redis connection lost and command aborted. It might have been processed.
at RedisClient.flush_and_error (/usr/src/app/node_modules/redis/index.js:362)
at RedisClient.connection_gone (/usr/src/app/node_modules/redis/index.js:664)
at RedisClient.on_error (/usr/src/app/node_modules/redis/index.js:410)
at Socket.<anonymous> (/usr/src/app/node_modules/redis/index.js:279)
at Socket.emit (events.js:315)
at emitErrorNT (internal/streams/destroy.js:92)
at emitErrorAndCloseNT (internal/streams/destroy.js:60)
at processTicksAndRejections (internal/process/task_queues.js:84)
In two or three minutes the connection returns and the application works correctly during half an hour until next disconnect.
Any idea?
According to the official GCP documentation there can be several scenarios that could cause the connectivity issues with your Redis instance. However, as the connectivity issues you are experiencing are intermittent and normally you connect to the instance successfully, none of the mentioned scenarios seems to be applicable to your situation.
There is a similar issue opened on GitHub, where some users where experiencing it when reaching the maximum number of clients for their Redis instance, however, I would recommend you to contact the GCP technical support, so that they could review your particular configuration and inspect your instance with the internal tools.

Why is "await Publish<T>" hanging / not completing / not finishing

The following piece of code has been working for some time and it has suddenly stopped returning:
await availableChangedPublishEndpoint
.Publish<IAvailableStockChanged>(
AvailableStockCounter.ConvertSkuQtyToAvailableStockChangedEvent(
newAvailable,
absMessage.Warehouse)
);
There is nothing clever in ConvertSkuQtyToAvailableStockChangedEvent - it just maps one simple class to another.
We added logs before and after this code and it's definitely just stopping at this point. Other systems are publishing fine, other messages are being sent from this application (for e.g. logs are actually sent via RabbitMQ). We have redeployed and we have upgraded to latest MassTransit version. We are seeing that the messages are being published - possibly multiple times, but this Publish method never returns.
We had a broken RabbitMQ node and a clean service restart on one node fixed it. I appreciate there might be other reasons for this behaviour, but this was our problem.
systemctl restart rabbitmq-server
Looking further into RabbitMQ we saw that some of the empty queues that were connected to this exchange were not synchronized (see below) and when we tried to synchronize them that wouldn't work.
We also couldn't delete some of these unsynchronized queues.
We believe an unexpected shutdown of one of the nodes had caused this problem - but it left most queues / exchanges completely OK.

Topshelf Windows Service times out Error 7000 7009

I have a windows service programmed in vb.NET, using Topshelf as Service Host.
Once in a while the service doesn't start. On the event log, the SCM writes errors 7000 and 7009 (service did not respond in a timely fashion). I know this is a common issue, but I (think) I have tried everything with no result.
The service only relies in WMI, and has no time-consuming operations.
I read this question (Error 1053: the service did not respond to the start or control request in a timely fashion), but none of the answers worked for me.
I Tried:
Set topshelf's start timeout.
Request additional time in the first line of "OnStart" method.
Set a periodic timer wich request additional time to the SCM.
Remove TopShelf and make the service with the Visual Studio Service Template.
Move the initialization code and "OnStart" code to a new thread to return inmediately.
Build in RELEASE mode.
Set GeneratePublisherEvidence = false in the app.config file (per application).
Unchecked "Check for publisher’s certificate revocation" in the internet settings (per machine).
Deleted all Alternate Streams (in case some dll was marked as web and blocked).
Removed any "Debug code"
Increased Window's general service timeout to 120000ms.
Also:
The service doesn't try to communicate with the user's desktop in any way.
The UAC is disabled.
The Service Runs on LOCAL SYSTEM ACCOUNT.
I believe that the code of the service itself is not the problem because:
It has been on production for over two years.
Usually the service starts fine.
There is no exception logged in the Event Log.
The "On Error" options for the service dosn't get called (since the service doesn't actually fails, just doesn't respond to the SCM)
I've commented out almost everything on it, pursuing this error! ;-)
Any help is welcome since i'm completely out of ideas, and i've been strugling with this for over 15 days...
For me the 7009 error was produced by my NET core app because I was using this construct:
var builder = new ConfigurationBuilder()
.SetBasePath(Directory.GetCurrentDirectory())
.AddJsonFile("appsettings.json");
and appsettings.json file obviously couldn't be found in C:\WINDOWS\system32.. anyway, changing it to Path.Combine(AppContext.BaseDirectory, "appsettings.json") solved the issue.
More general help - for Topshelf you can add custom exception handling where I finally found some meaningfull error info, unlike event viewer:
HostFactory.Run(x => {
...
x.OnException(e =>
{
using (var fs = new StreamWriter(#"C:\log.txt"))
{
fs.WriteLine(e.ToString());
}
});
});
I've hit the 7000 and 7009 issue, which fails straight away (even though the error message says A timeout was reached (30000 milliseconds)) because of misconfiguration between TopShelf and what the service gets installed as.
The bottom line - what you pass in HostConfigurator.SetServiceName(name) needs to match exactly the SERVICE_NAME of the Windows service which gets installed.
If they don't match it'll fail straight away and you get the two event log messages.
I had this start happening to a service after Windows Creator's Edition update installed. Basically it made the whole computer slower, which is what I think triggered the problem. Even one of the Windows services had a timeout issue.
What I learned online is that the constructor for the service needs to be fast, but OnStart has more leeway with the SCM. My service had a C# wrapper and it included an InitializeComponent() that was called in the constructor. I moved that call to OnStart and the problem went away.

ServiceStack.Redis: Unable to Connect: sPort: 50071

I'm using the ServiceStack Redis Client and I was hoping that I could get a clarification on what might cause the following error ... "Unable to Connect: sPort: 50071"? I'm using the "PooledRedisClientManager" object for connections. Thanks for any assistance.
IF YOU ARE USING A SELF HOSTED REDIS SERVER AND USING THE Service Stack Redis Client THEN BUYER BEWARE
As of 9/23/2015
Service Stack does license validation in the client code (rather than the server). If you are ripping through a lot of messages 6000+ an hour you will get. The resulting error is
Unable to Connect: sPort:
However, it is not handling their custom LicenseExpection and exposing the error correctly. The error would be something like this:
The free-quota limit on '6000 Redis requests per hour' has been reached. Please see https://servicestack.net to upgrade to a commercial license or visit https://github.com/ServiceStackV3/ServiceStackV3 to revert back to the free ServiceStack v3.
I doubt you have imposed such a limit on your server :-)
This could be a time out issue, try increasing it:
pooledRedisClientManager.ConnectTimeout = 1000
You need to check that you are not creating a new PooledRedisClientManager for each request / usage. You will quickly run out of ports. Use a singleton approach in a web environment.