Redis connection settings for app "surviving" redis connectivity issues - redis

I'm using azure redis cache for certain performance monitoring services. Basically when events like page loads, etc occur, I send a fire and forget command to redis to record the event. My goal is for my app to function fine whether or not it can contact the redis server. I'm looking for a best practice for this scenario. I would be OK with losing some events if necessary. I've been finding that even though I'm using fire and forget, the app staggers when the web server runs into high latency or connectivity issues with the server.
I'm using StackExchange.Redis. Any best practice configuration options/programming practices for this scenario?

The way I was implementing a singleton pattern on the connection turned out to be blocking requests. Once I fixed this my app behaves as I want (e.g. it still functions when redis connection dies).

Related

Monitoring Yarn/Cloudera application logs in production

I am NOT talking about Cloudera or Yarn system level logs. I am talking about applications running on Cloudera/Yarn infrastructure.
We have tens of Java and Python applications running on our Cloudera Infra, and all of them generate application logs. I am looking for the best way to monitor these logs for any errors and warnings. If it is a pure stand alone Java application, traditionally we can use one of these log scraper tools that send emails based on an expression matching (to detect error/warning/any other special situation). I am looking for something similar, that can monitor our application logs and emails us in real time for better production application support.
If thinking about this like a traditional application log monitoring is not the right way, then I am happy to know if there are any better industry standard approaches. Thanks!
I guess the ElasticStack (https://www.elastic.co/de/) could be one approach to solve this. You could use FileBeats to send your application logs to Logstash which forwards it to ElasticSearch. You could then create a Watcher in Kibana which sends i.e. Emails based on some triggering condition (we use a webhook to send notifications into a MS Teams channel).
This solution should work at least in near-realtime (~1-2 minutes delay, but this also depends on your watcher configuration).

How Can I use Apache to load balance Marklogic Cluster

Hi I am new to Marklogic and Apache. I have been provided task to use apache as loadbalancer for our Marklogic cluster of 3 machines. Marklogic cluster is currently running on Linux servers.
How can we achieve this? Any information regarding this would be helpful.
You could use mod_proxy_balancer. How you configure it depends what MarkLogic client you would like to use. If you would like to use the Java Client API, please follow the second example here to allow apache to generate stickiness cookies. If you would like to use XCC, please configure it to use the ML-Server-generated or backend-generated "SessionID" cookie.
The difference here is that XCC uses sessions whereas the Java Client API builds on the REST API which is stateless, so there are no sessions. However, even in the Java Client API when you use multi-request transactions, that imposes state for the duration of that transaction so the load balancer needs a way to route requests during that transaction to the correct node in the MarkLogic cluster. The stickiness cookie will be resent by the Java Client API with every request that uses a Transaction so the load balancer can maintain that stickiness for requests related to that transaction.
As always, do some testing of your configuration to make sure you got it right. Properly configuring apache plugins is an advanced skill. Since you are new to apache, your best hope of ensuring you got it right is checking with an HTTP monitoring tool like WireShark to look at the HTTP traffic from your application to MarkLogic Server to make sure things are going to the correct node in the cluster as expected.
Note that even with the client APIs (Java, Node.js) its not always obvious or explicit at the language API layer what might cause a session to be created. Explicitly creating multi statement transactions definately will, but other operations may do so as well. If you are using the same connection for UI (browser) and API (REST or XCC) then the browser app is likely to be doing things that create session state.
The safest, but least flexable configuration is "TCP Session Affinity". If they are supported they will eliminate most concerns related to load balancing. Cookie Session Affinity relies on guarenteeing that the load balencer uses the correct cookie. Not all code is equal. I have had cases where it the load balancer didn't always use the cookie provided. Changing the configuration to "Load Balancer provided Cookie Affinity" fixed that.
None of this is needed if all your communications are stateless at the TCP layer, the HTTP layer and the app layer. The later cannot be inferred by the server.
Another conern is if your app or middle tier is co-resident with other apps or the same app connecting to the same load balancer and port. That can be difficult to make sure there are no 'crossed wires' . When ML gets a request it associates its identity with the client IP and port. Even without load balencers, most modern HTTP and TCP client libraries implement socket caching. A great perfomrance win, but a hidden source of subtle random severe errors if the library or app are sharing "cookie jars" (not uncomnon). A TCP and Cookie Jar cache used by different application contexts can end up sending state information from one unrelated app in the same process to another. Mostly this is in middle tier app servers that may simply pass on requests from the first tier without domain knowledge, presuming that relying on the low level TCP libraries to "do the right thing" ... They are doing the right thing -- for the use case the library programmers had in mind -- don't assume that your case is the one the library authors assumed. The symptoms tend to be very rare but catastrophic problems with transaction failures and possibly data corruption
and security problems (at an application layer) because the server cannot tell the difference between 2 connections from the same middle tier.
Sometimes a better strategy is to load balance between the first tier and the middle tier, and directly connect from the middle tier to MarkLogic.
Especially if caching is done at the load balancer. Its more common for caching to be useful between the middle tier and the client then the middle tier and the server. This is also more analogous to the classic 3 tier architecture used with RDBMS's .. where load balancing is between the client and business logic tiers not between business logic and database.

How best to manage Redis connections using ServiceStack?

I work on a few .NET web apps that use Redis heavily for caching along with ServiceStack's Redis client. In all cases I've got Redis running on the same machine. I've used both BasicRedisClientManager and PooledRedisClientManager (always implemented as singletons) and have had some issues with both approaches.
With BasicRedisClientManager, things would work fine for a while, but eventually Redis would start refusing connections. Using netstat we discovered that thousands of TCP connections to the default Redis port were hanging around in TIME_WAIT status.
We then switched to PooledRedisClientManager, which seemed to fix the problem immediately. However, not long after, we started noticing occasional CPU spikes that we narrowed down to thread waiting (System.Threading.Monitor.Wait calls) caused by PooledRedisClientManager.GetClient.
In code, we use a get-in-get-out approach (using ServiceStack's handy ExecAs shortcuts) so in general connections are acquired very frequently but held as briefly as possible.
We get a modest amount of traffic but we're no StackExchange, and I can't help but think the ServiceStack client is up to the job and we're just doing something wrong. Is PooledRedisClientManager the correct approach here? Would it be advisable to simply increase the pool size? Or is that likely just masking a problem with our code?
Just looking for general guidance here, I don't have specific code I need help with at this point. Thanks in advance.
Are you absolutely sure all Redis connections are being disposed?
With ServiceStack, the Redisproperty on Service and ViewPageBase (if you're using SS Razor) do dispose themselves, but any time you request a connection from the pool yourself you must dispose it yourself.
However, despite this, we recently had issues with our pool being exhausted of all connections, too. One of my colleagues discovered that there wasn't proper clean up for Razor pages and made a pull request here - This means that there has only been correct disposal on Razor pages since ServiceStack v4.0.21. I have not checked if that fix has been back-ported to the v3 branch.
My colleague also added TrackingRedisClientsManager that may help you track down the improper disposal. See here
You can also check the stats of a PooledRedisClientManager by using this helper method. We threw it on a little razor page to check the stats as we feel appropriate) but you could write better code around this to monitor the pool health of specific nodes, too.

Simulating a transient error for Service Bus

I'm writing an application which will use the Azure Service Bus. For local development I'm using Windows Server Service Bus to provide the same services (the code to use either is identical).
I want to write the application to be tolerant of transient errors when sending or receiving messages. To that end, I want to be able to test the fault-handling code can deal with the local Service Bus instance suddenly being unavailable during execution of various operations.
Ideally, I'd want to write some automated integration tests around these scenarios, but I appreciate that may not be practically achieved.
What can I do to simulate transient errors on my local Service Bus?
One easy thing would be to call the stop-sbservice (affects one node) or stop-sbfarm (affects the entire farm) cmdlets. This would let you simulate a servicebus outage locally. You can then call start-sbservice or start-sbfarm to bring the service back and validate that your code recovers properly. This approach also has the added benefit that you control when the service returns (compare to just crashing the process). This page has information on the available cmdlets.
If that's not enough, another approach that I've used in the past is to shut down the network interface, or, if the server is in another machine, put up a firewall on the ports used to communicate to service bus.

Redis clients broadcast problems (in the context of Socket.IO)

So I've read some articles about scaling Socket.IO. For various reasons I don't want to use built-in Socket.IO scaling mechanism (mostly it seems to be inefficient, since it publishes a lot more stuff to Redis then required from my point of view).
So I've came up with this simple idea:
Each Socket.IO server creates Redis pub/sub/store clients, connects to Redis and subscribes to a channel. Now, when I want to broadcast data I just publish it to Redis and all other Socket.IO servers get it and push it to users.
There is a problem, though (which I think is also a problem for Socket.IO built-in mechanism). Let's say I want to know the number of all connected users. There are at least two ways of doing that:
Server A publishes give_me_clients to Redis. Then each Socket.IO server counts connections and publishes number_of_clients. Server A grabs this data, combines it and sends it to the client.
Each server updates number_of_clients_for::ID_HERE in Redis whenever user connects/disconnects to the server. Then Server A just fetches data and combines it. Might be more efficient.
There are problems with these solutions though:
Server A is not aware of other servers. Therefore he does not know when he should stop listening to number_of_clients. One could fix it with making Server A aware of other servers: whenever a server connects to Redis he publishes new_server (Server A grabs the data and stores it in memory). But what to do, when Redis - Socket.IO connection breaks? Is there a way for Redis to notify clients that one of the client disconnected?
Actually the same as above. When a Socket.IO server crashes how to clear number_of_clients data?
So the real question is: can Redis notify (publish to chanel) clients that the connection with one of them has just ended??
After a lot of testing it seems, that Redis does not have such functionality. Also I've found out, that scaling Socket.IO is really a pain.
So I've switched from Socket.IO to WS (see this link). It is low level (but perfect for my use) and it only supports WebSockets (in all major versions). But then again I only want to support WebSockets and FlashSocket (which I have to imlement manually, but that's fine).
The advantage is that I can easily create cluster with such servers. HAProxy works with such servers almost out of the box (some minor tuning). Servers can easily communicate on a local net (with UDP or central TCP server if the cluster is big).
The disadvantage is that one have to manually implement some cool features like heartbeats, broadcasting, rooms, etc. Also you want have long-polling fallback, but that's fine in my case. Scaling is still more important, imho.