We are using the StackExchange.Redis ConnectionMultiplexer class as follows:
private void InitializeConnection()
{
_logger.Info("Initializing a connection to the Redis cluster. ");
bool isReconnectionAttempt = false;
if (_connectionMultiplexer != null)
{
Debug.WriteLine("disposing " + _connectionMultiplexer.GetHashCode());
_connectionMultiplexer.ConnectionFailed -= HandleConnectionFailedEvent;
// test this change.....
_connectionMultiplexer.Close();
isReconnectionAttempt = true;
_logger.Info("This is reconnection attempt to the Redis cluster.");
}
_connectionMultiplexer = ConnectionMultiplexer.Connect(_connectionString);
_needConnect = !_connectionMultiplexer.IsConnected;
_connectionMultiplexer.ConnectionFailed += HandleConnectionFailedEvent;
When I simulate a network issue, the ConnectionFailed event is fired as expected. When this happens, we attempt to dispose the object and create a new one. However, even after the previous _connectionMultiplexer instance was closed/disposed, we still get several ConnectionFailed events fired by the previous instance.
The document, however, indicates that we only get a single ConnectionFailed event when the network goes down. The next time we get such an event is when the network goes down again. But this is not what I am experiencing. Ideas?
Related
I have an API with ASP.Net Core 3.1 which uses Azure.Messaging.EventHubs and Azure.Messaging.EventHubs.Processor where I get events from a consumer group and then send them to a SignalR hub. The processor runs only when there are users connected to the hub and stops when the last one gets disconnected updating its checkpoint in a BlobStorage.
The current logic of process for each events its: If the time diff in minutes between DateTime.UtcNow and the event timestamp is less than 2, it sends the event to the SignalR hub, and nothing more.
The problem is as follows: Sometimes there is a long period of time where the EventProcessorClient is stopped and many events are retained in the EventHub, causing it to have a really long wait while slowly catching up to the most recent events until the SignalR Hub starts receiving them again. The period of time for the processor to catch up with the most recent events is way too much, specially when thinking about receiving hundreds of events per minute.
Is there a way of, for example, manually moving the checkpoint before starting the processor? or to get only the events of the last X minutes? maybe another idea/solution?
PS: I don't care for events older than 2 to 5 minutes for this consumer group.
PS2: The retention time configured in the EventHub is 1 day.
The code:
/* properties and stuff */
// Constructor
public BusEventHub(ILogger<BusEventHub> logger, IConfiguration configuration, IHubContext<BusHub> hubContext) {
_logger = logger;
Configuration = configuration;
_busExcessHub = hubContext;
/* Connection strings and stuff */
// Create a blob container client that the event processor will use
storageClient = new BlobContainerClient(this.blobStorageConnectionString, this.blobContainerName);
// Create an event processor client to process events in the event hub
processor = new EventProcessorClient(storageClient, consumerGroup, this.ehubNamespaceConnectionString, this.eventHubName);
// Register handlers for processing events and handling errors
processor.ProcessEventAsync += ProcessEventHandler;
processor.ProcessErrorAsync += ProcessErrorHandler;
}
public async Task Start() {
_logger.LogInformation($"Starting event processing for EventHub {eventHubName}");
await processor.StartProcessingAsync();
}
public async Task Stop() {
if (BusHubUserHandler.ConnectedIds.Count < 2) {
_logger.LogInformation($"Stopping event processing for EventHub {eventHubName}");
await processor.StopProcessingAsync();
} else {
_logger.LogDebug("There are still other users connected");
}
}
private async Task ProcessEventHandler(ProcessEventArgs eventArgs) {
try {
string receivedEvent = Encoding.UTF8.GetString(eventArgs.Data.Body.ToArray());
_logger.LogDebug($"Received event: {receivedEvent}");
BusExcessMinified busExcess = BusExcessMinified.FromJson(receivedEvent);
double timeDiff = (DateTime.UtcNow - busExcess.Timestamp).TotalMinutes;
if (timeDiff < 2) {
string responseEvent = busExcess.ToJson();
_logger.LogDebug($"Sending message to BusExcess Hub: {responseEvent}");
await _busExcessHub.Clients.All.SendAsync("UpdateBuses", responseEvent);
}
_logger.LogDebug("Update checkpoint in the blob storage"); // So that the service receives only new events the next time it's run
await eventArgs.UpdateCheckpointAsync(eventArgs.CancellationToken);
} catch (TaskCanceledException) {
_logger.LogInformation("The EventHub event processing was stopped");
} catch (Exception e) {
_logger.LogError($"Exception: {e}");
}
}
/* ProcessErrorHandler */
It is possible to request an initial position for partitions as they're initialized, which will allow you to specify the enqueue time as your starting point. This sample illustrates the details. The caveat is that the initial position is only used when there is no checkpoint for a partition; checkpoints will always take precedence.
From the scenario that you're describing, it sounds as if checkpoints aren't useful to you and are getting in the way of your preferred usage pattern. If there aren't other mitigating factors, I'd recommend never checkpointing and instead overriding the default starting position to dynamically reset to the time that you're interested in.
If you do, for some reason, need to checkpoint in addition to this then your best bet will be deleting the checkpoint data, as checkpoints are based on the offset and won't recognize an enqueue time for positioning.
I'm attempting to implement DLM using the locking mechanisms provided by the ServiceStack-Redis library and described here, but I'm finding that the API seems to present a race condition which will sometimes grant the same lock to multiple clients.
BasicRedisClientManager mgr = new BasicRedisClientManager(redisConnStr);
using(var client = mgr.GetClient())
{
client.Remove("touchcount");
client.Increment("touchcount", 0);
}
Random rng = new Random();
Action<object> simulatedDistributedClientCode = (clientId) => {
using(var redisClient = mgr.GetClient())
{
using(var mylock = redisClient.AcquireLock("mutex", TimeSpan.FromSeconds(2)))
{
long touches = redisClient.Get<long>("touchcount");
Debug.WriteLine("client{0}: I acquired the lock! (touched: {1}x)", clientId, touches);
if(touches > 0) {
Debug.WriteLine("client{0}: Oh, but I see you've already been here. I'll release it.", clientId);
return;
}
int arbitraryDurationOfExecutingCode = rng.Next(100, 2500);
Thread.Sleep(arbitraryDurationOfExecutingCode); // do some work of arbitrary duration
redisClient.Increment("touchcount", 1);
}
Debug.WriteLine("client{0}: Okay, I released my lock, your turn now.", clientId);
}
};
Action<Task> exceptionWriter = (t) => {if(t.IsFaulted) Debug.WriteLine(t.Exception.InnerExceptions.First());};
int arbitraryDelayBetweenClients = rng.Next(5, 500);
var clientWorker1 = new Task(simulatedDistributedClientCode, 1);
var clientWorker2 = new Task(simulatedDistributedClientCode, 2);
clientWorker1.Start();
Thread.Sleep(arbitraryDelayBetweenClients);
clientWorker2.Start();
Task.WaitAll(
clientWorker1.ContinueWith(exceptionWriter),
clientWorker2.ContinueWith(exceptionWriter)
);
using(var client = mgr.GetClient())
{
var finaltouch = client.Get<long>("touchcount");
Console.WriteLine("Touched a total of {0}x.", finaltouch);
}
mgr.Dispose();
When running the above code to simulate two clients attempting the same operation within short succession of one another, there are three possible outputs. The first one is the optimal case where the Mutex works properly and the clients proceed in the proper order. The second case is when the 2nd client times out waiting to acquire a lock; also an acceptable outcome. The problem, however, is that as arbitraryDurationOfExecutingCode approaches or exceeds the timeout for acquiring a lock, it is quite easy to reproduce a situation where the 2nd client is granted the lock BEFORE the 1st client releases it, producing output like this:
client1: I acquired the lock! (touched: 0x)
client2: I acquired the lock! (touched: 0x)
client1: Okay, I released my lock, your turn now.
client2: Okay, I released my lock, your turn now.
Touched a total of 2x.
My understanding of the API and its documentation is that the timeOut argument when acquiring a lock is meant to be just that -- the timeout for getting the lock. If I have to guess at a timeOut value that is high enough to always be longer than the duration of my executing code just to prevent this condition, that seems pretty error prone. Does anyone have a work around other than passing null to wait on locks forever? I definitely don't want to do that or I know I'll end up with ghost locks from crashed workers.
The answer from mythz (thanks for the prompt response!) confirms that the built-in AcquireLock method in ServiceStack.Redis doesn't draw a distinction between the lock acquisition period versus the lock expiration period. For our purposes, we have existing code that expected the distributed locking mechanism to fail quickly if the lock was taken, but allow for long-running processes within the lock scope. To accommodate these requirements, I derived this variation on the ServiceStack RedisLock that allows a distinction between the two.
// based on ServiceStack.Redis.RedisLock
// https://github.com/ServiceStack/ServiceStack.Redis/blob/master/src/ServiceStack.Redis/RedisLock.cs
internal class RedisDlmLock : IDisposable
{
public static readonly TimeSpan DefaultLockAcquisitionTimeout = TimeSpan.FromSeconds(30);
public static readonly TimeSpan DefaultLockMaxAge = TimeSpan.FromHours(2);
public const string LockPrefix = ""; // namespace lock keys if desired
private readonly IRedisClient _client; // note that the held reference to client means lock scope should always be within client scope
private readonly string _lockKey;
private string _lockValue;
/// <summary>
/// Acquires a distributed lock on the specified key.
/// </summary>
/// <param name="redisClient">The client to use to acquire the lock.</param>
/// <param name="key">The key to acquire the lock on.</param>
/// <param name="acquisitionTimeOut">The amount of time to wait while trying to acquire the lock. Defaults to <see cref="DefaultLockAcquisitionTimeout"/>.</param>
/// <param name="lockMaxAge">After this amount of time expires, the lock will be invalidated and other clients will be allowed to establish a new lock on the same key. Deafults to <see cref="DefaultLockMaxAge"/>.</param>
public RedisDlmLock(IRedisClient redisClient, string key, TimeSpan? acquisitionTimeOut = null, TimeSpan? lockMaxAge = null)
{
_client = redisClient;
_lockKey = LockPrefix + key;
ExecExtensions.RetryUntilTrue(
() =>
{
//Modified from ServiceStack.Redis.RedisLock
//This pattern is taken from the redis command for SETNX http://redis.io/commands/setnx
//Calculate a unix time for when the lock should expire
lockMaxAge = lockMaxAge ?? DefaultLockMaxAge; // hold the lock for the default amount of time if not specified.
DateTime expireTime = DateTime.UtcNow.Add(lockMaxAge.Value);
_lockValue = (expireTime.ToUnixTimeMs() + 1).ToString(CultureInfo.InvariantCulture);
//Try to set the lock, if it does not exist this will succeed and the lock is obtained
var nx = redisClient.SetEntryIfNotExists(_lockKey, _lockValue);
if (nx)
return true;
//If we've gotten here then a key for the lock is present. This could be because the lock is
//correctly acquired or it could be because a client that had acquired the lock crashed (or didn't release it properly).
//Therefore we need to get the value of the lock to see when it should expire
string existingLockValue = redisClient.Get<string>(_lockKey);
long lockExpireTime;
if (!long.TryParse(existingLockValue, out lockExpireTime))
return false;
//If the expire time is greater than the current time then we can't let the lock go yet
if (lockExpireTime > DateTime.UtcNow.ToUnixTimeMs())
return false;
//If the expire time is less than the current time then it wasn't released properly and we can attempt to
//acquire the lock. This is done by setting the lock to our timeout string AND checking to make sure
//that what is returned is the old timeout string in order to account for a possible race condition.
return redisClient.GetAndSetEntry(_lockKey, _lockValue) == existingLockValue;
},
acquisitionTimeOut ?? DefaultLockAcquisitionTimeout // loop attempting to get the lock for this amount of time.
);
}
public override string ToString()
{
return String.Format("RedisDlmLock:{0}:{1}", _lockKey, _lockValue);
}
public void Dispose()
{
try
{
// only remove the entry if it still contains OUR value
_client.Watch(_lockKey);
var currentValue = _client.Get<string>(_lockKey);
if (currentValue != _lockValue)
{
_client.UnWatch();
return;
}
using (var tx = _client.CreateTransaction())
{
tx.QueueCommand(r => r.Remove(_lockKey));
tx.Commit();
}
}
catch (Exception ex)
{
// log but don't throw
}
}
}
To simplify use as much as possible, I also expose some extension methods for IRedisClient to parallel the AcquireLock method, along these lines:
internal static class RedisClientLockExtensions
{
public static IDisposable AcquireDlmLock(this IRedisClient client, string key, TimeSpan timeOut, TimeSpan maxAge)
{
return new RedisDlmLock(client, key, timeOut, maxAge);
}
}
Your question highlights the behavior of Distributed Locking in ServiceStack.Redis, if the timeout specified is exceeded, the timed-out clients treats it as an invalid lock and will attempt to auto-recover the lock. If there was no auto-recovery behavior a crashed client would never release the lock and no further operations waiting on that lock would be allowed through.
The locking behavior for AcquireLock is encapsulated in the RedisLock class:
public IDisposable AcquireLock(string key, TimeSpan timeOut)
{
return new RedisLock(this, key, timeOut);
}
Which you can take a copy of and modify to suit the behavior you'd prefer:
using (new MyRedisLock(client, key, timeout))
{
//...
}
If I'm connected to RabbitMQ and listening for events using an EventingBasicConsumer, how can I tell if I've been disconnected from the server?
I know there is a Shutdown event, but it doesn't fire if I unplug my network cable to simulate a failure.
I've also tried the ModelShutdown event, and CallbackException on the model but none seem to work.
EDIT-----
The one I marked as the answer is correct, but it was only part of the solution for me. There is also HeartBeat functionality built into RabbitMQ. The server specifies it in the configuration file. It defaults to 10 minutes but of course you can change that.
The client can also request a different interval for the heartbeat by setting the RequestedHeartbeat value on the ConnectionFactory instance.
I'm guessing that you're using the C# library? (but even so I think the others have a similar event).
You can do the following:
public class MyRabbitConsumer
{
private IConnection connection;
public void Connect()
{
connection = CreateAndOpenConnection();
connection.ConnectionShutdown += connection_ConnectionShutdown;
}
public IConnection CreateAndOpenConnection() { ... }
private void connection_ConnectionShutdown(IConnection connection, ShutdownEventArgs reason)
{
}
}
This is an example of it, but the marked answer is what lead me to this.
var factory = new ConnectionFactory
{
HostName = "MY_HOST_NAME",
UserName = "USERNAME",
Password = "PASSWORD",
RequestedHeartbeat = 30
};
using (var connection = factory.CreateConnection())
{
connection.ConnectionShutdown += (o, e) =>
{
//handle disconnect
};
using (var model = connection.CreateModel())
{
model.ExchangeDeclare(EXCHANGE_NAME, "topic");
var queueName = model.QueueDeclare();
model.QueueBind(queueName, EXCHANGE_NAME, "#");
var consumer = new QueueingBasicConsumer(model);
model.BasicConsume(queueName, true, consumer);
while (!stop)
{
BasicDeliverEventArgs args;
consumer.Queue.Dequeue(5000, out args);
if (stop) return;
if (args == null) continue;
if (args.Body.Length == 0) continue;
Task.Factory.StartNew(() =>
{
//Do work here on different thread then this one
}, TaskCreationOptions.PreferFairness);
}
}
}
A few things to note about this.
I'm using # for the topic. This grabs everything. Usually you want to limit by a topic.
I'm setting a variable called "stop" to determine when the process should end. You'll notice the loop runs forever until that variable is true.
The Dequeue waits 5 seconds then leaves without getting data if there is no new message. This is to ensure we listen for that stop variable and actually quit at some point. Change the value to your liking.
When a message comes in I spawn the handling code on a new thread. The current thread is being reserved for just listening to the rabbitmq messages and if a handler takes too long to process I don't want it slowing down the other messages. You may or may not need this depending on your implementation. Be careful however writing the code to handle the messages. If it takes a minute to run and your getting messages at sub-second times you will run out of memory or at least into severe performance issues.
I have a handler similar to the following, which essentially responds to a command and sends a whole bunch of commands to a different queue.
public void Handle(ISomeCommand message)
{
int i=0;
while (i < 10000)
{
var command = Bus.CreateInstance<IAnotherCommand>();
command.Id = i;
Bus.Send("target.queue#d1555", command);
i++;
}
}
The issue with this block is, until the loop is fully completed none of the messages appear in the target queue or in the outgoing queue. Can someone help me understand this behavior?
Also if I use Tasks to send messages within the Handler as below, messages appear immediately. So two questions on this,
What's the explanation on Task based Sends to go through immediately?
Are there are any ramifications on using Tasks with in message handlers?
public void Handle(ISomeCommand message)
{
int i=0;
while (i < 10000)
{
System.Threading.ThreadPool.QueueUserWorkItem((args) =>
{
var command = Bus.CreateInstance<IAnotherCommand>();
command.Id = i;
Bus.Send("target.queue#d1555", command);
i++;
});
}
}
Your time is much appreciated!
First question: Picking a message from a queue, running all the registered message handlers for it AND any other transactional action(like writing new messages or writes against a database) is performed in ONE transaction. Either it all completes or none of it. So what you are seeing is the expected behaviour: picking a message from the queue, handling ISomeCommand and writing 10000 new IAnotherCommand is either done completely or none of it. To avoid this behaviour you can do one of the following:
Configure your NServiceBus endpoint to not be transactional
public class EndpointConfig : IConfigureThisEndpoint, AsA_Publisher,IWantCustomInitialization
{
public void Init()
{
Configure.With()
.DefaultBuilder()
.XmlSerializer()
.MsmqTransport()
.IsTransactional(false)
.UnicastBus();
}
}
Wrap the sending of IAnotherCommand in a transaction scope that suppresses the ambient transaction.
public void Handle(ISomeCommand message)
{
using (new TransactionScope(TransactionScopeOption.Suppress))
{
int i=0;
while (i < 10000)
{
var command = Bus.CreateInstance();
command.Id = i;
Bus.Send("target.queue#d1555", command);
i++;
}
}
}
Issue the Bus.Send on another thread, by either starting a new thread yourself, using System.Threading.ThreadPool.QueueUserWorkItem or the Task classes. This works because an ambient transaction is not automatically carried over to a new thread.
Second question: The ramifications of using Tasks, or any of the other methods I mentioned, is that you have no transactional quarantee for the whole thing.
How do you handle the case where you have generated 5000 IAnotherMessage and the power suddenly goes out?
If you use 2) or 3) the original ISomeMessage will not complete and will be retried automatically by NServiceBus when you start up the endpoint again. End result: 5000 + 10000 IAnotherCommands.
If you use 1) you will lose IAnotherMessage completely and end up with only 5000 IAnotherCommands.
Using the recommended transactional way, the initial 5000 IAnotherCommands would be discarded, the original ISomeMessage comes back on the queue and is retried when the endpoint starts up again. Net result: 10000 IAnotherCommands.
If memory serves NServiceBus wraps the calls to the message handlers in a TransactionScope if the transaction option is used and TransactionScope needs some help to be cross-thread friendly:
TransactionScope and multi-threading
If you are trying to reduce overhead you can also bundle your messages. The signature for the send is Bus.Send(IMessage[]messages). If you can guarantee that you don't blow up the size limit for MSMQ, then you could Send() all the messages at once. If the size limit is an issue, then you can chunk them up or use the Databus.
we have more than dozon of wcf services and being called using TCP binding. There are a lots of calls to same wcf service at various places in code.
AdminServiceClient client = FactoryS.AdminServiceClient();// it takes significant time. and
client.GetSomeThing(param1);
client.Close();
i want to cache the client or produce it from singleton. so that i can save some time, Is it possible?
Thx
Yes, this is possible. You can make the proxy object visible to the entire application, or wrap it in a singleton class for neatness (my preferred option). However, if you are going to reuse a proxy for a service, you will have to handle channel faults.
First create your singleton class / cache / global variable that holds an instance of the proxy (or proxies) that you want to reuse.
When you create the proxy, you need to subscribe to the Faulted event on the inner channel
proxyInstance.InnerChannel.Faulted += new EventHandler(ProxyFaulted);
and then put some reconnect code inside the ProxyFaulted event handler. The Faulted event will fire if the service drops, or the connection times out because it was idle. The faulted event will only fire if you have reliableSession enabled on your binding in the config file (if unspecified this defaults to enabled on the netTcpBinding).
Edit: If you don't want to keep your proxy channel open all the time, you will have to test the state of the channel before every time you use it, and recreate the proxy if it is faulted. Once the channel has faulted there is no option but to create a new one.
Edit2: The only real difference in load between keeping the channel open and closing it every time is a keep-alive packet being sent to the service and acknowledged every so often (which is what is behind your channel fault event). With 100 users I don't think this will be a problem.
The other option is to put your proxy creation inside a using block where it will be closed / disposed at the end of the block (which is considered bad practice). Closing the channel after a call may result in your application hanging because the service is not yet finished processing. In fact, even if your call to the service was async or the service contract for the method was one-way, the channel close code will block until the service is finished.
Here is a simple singleton class that should have the bare bones of what you need:
public static class SingletonProxy
{
private CupidClientServiceClient proxyInstance = null;
public CupidClientServiceClient ProxyInstance
{
get
{
if (proxyInstance == null)
{
AttemptToConnect();
}
return this.proxyInstance;
}
}
private void ProxyChannelFaulted(object sender, EventArgs e)
{
bool connected = false;
while (!connected)
{
// you may want to put timer code around this, or
// other code to limit the number of retrys if
// the connection keeps failing
AttemptToConnect();
}
}
public bool AttemptToConnect()
{
// this whole process needs to be thread safe
lock (proxyInstance)
{
try
{
if (proxyInstance != null)
{
// deregister the event handler from the old instance
proxyInstance.InnerChannel.Faulted -= new EventHandler(ProxyChannelFaulted);
}
//(re)create the instance
proxyInstance = new CupidClientServiceClient();
// always open the connection
proxyInstance.Open();
// add the event handler for the new instance
// the client faulted is needed to be inserted here (after the open)
// because we don't want the service instance to keep faulting (throwing faulted event)
// as soon as the open function call.
proxyInstance.InnerChannel.Faulted += new EventHandler(ProxyChannelFaulted);
return true;
}
catch (EndpointNotFoundException)
{
// do something here (log, show user message etc.)
return false;
}
catch (TimeoutException)
{
// do something here (log, show user message etc.)
return false;
}
}
}
}
I hope that helps :)
In my experience, creating/closing the channel on a per call basis adds very little overhead. Take a look at this Stackoverflow question. It's not a Singleton question per se, but related to your issue. Typically you don't want to leave the channel open once you're finished with it.
I would encourage you to use a reusable ChannelFactory implementation if you're not already and see if you still are having performance problems.