I did a production release for one of my clients with Azure PAAS – Azure Web App (formerly Azure Websites). We marked the site in Auto scale mode with 2 minimum instances. We started getting complains from the users that intermittently users are experiencing some issues which can cause only if they lose the session variables while actively using the site.
We were wondering how that can happen since we are maintaining the sessions in "in proc". We didn’t go for a separate out of proc session manager (Redis cache / DB based) on the day 1 because, according to the documentation, Azure Web app runs with a sticky session load balancer (when running with multiple instances) by default. It injects ARR affinity cookie in the Http response that helps in redirecting a user to the same instance with which it established the session at the first time.
To debug this issue, we started printing the actual sessionId in the page. After many attempts, we reproduced the issue – surprisingly sessions are getting swapped. Lets say – my session Id is “1eocgtmwwwwvs1cxksyofne4”, after the page refreshes it got changed to “5p1hsxszq2mcqmt5i5ytqg12” including all other information that are managed in session variable. Scary… isn’t it?
Reached out to support with an Emergency ticket – the response is:
"Azure Web Apps is a stateless platform and our recommendations is to implement a Session Management solution that best works for your environment and avoid the dependency on In-memory session state management especially in cases where your Web App is hosted on multiple servers . In your case as you are considering to implement a Caching solution for session management in near future , our recommendation is to move to a single instance as a workaround for now . We will assist you in ensuring that the app is working as expected on a single instance.”
I completely understand that – but why Affinity cookie will fail. I know that Affinity cookie can be disabled and I didn’t disable it. I thought of sharing this story. It could be useful for any other person relying on Affinity cookie based sticky load balancer.
BTW: Redis session provider implementation is very easy. It took just a few mins to implement.
Doing a postmortem now – what went wrong? I didn’t consider the cookie ‘not supported’ browsers. Then why would it fail intermittently even in my browser where I support cookie. PAAS resources are constantly in move / getting swapped .. .. and there is no way we can use in proc session … etc
Related
I'm using azure redis cache for certain performance monitoring services. Basically when events like page loads, etc occur, I send a fire and forget command to redis to record the event. My goal is for my app to function fine whether or not it can contact the redis server. I'm looking for a best practice for this scenario. I would be OK with losing some events if necessary. I've been finding that even though I'm using fire and forget, the app staggers when the web server runs into high latency or connectivity issues with the server.
I'm using StackExchange.Redis. Any best practice configuration options/programming practices for this scenario?
The way I was implementing a singleton pattern on the connection turned out to be blocking requests. Once I fixed this my app behaves as I want (e.g. it still functions when redis connection dies).
I am using the Azure Container Service with Kubernetes orchestrator and have an app deployed on a cluster with 3 nodes. It has 5 replicas. How can I verify load balancing in action e.g. I want to be able to see that every time I hit the external IP I am being routed to perhaps a different node. Thanks.
The simplest solution is to connect (over ssh for example) to 3 nodes and run WinDump there. In order everything is working properly you will be able to see what happens on every node.
Also here is Microsoft documentation for testing a load balancer:
https://learn.microsoft.com/en-us/azure/virtual-machines/windows/tutorial-load-balancer#test-load-balancer
The default Load Balancer which are available to your Windows Azure Web and Worker roles are software load balancers and not so much configurable however they do work in Round Robin setting. If you want to test this behavior this is what you need to do:
Create two (or more) instances of your service with RDP access
enabled so you can RDP to both instances
RDP to your both instances and run NETMON or any network monitor
solution in it.
Now access your Windows Azure web application from your desktop You
need to understand that when a network connection is made from your
desktop the connection is still alive based on network settings
(default 60 seconds) so you need to wait until default timeout is
passed to access your Windows Azure web application again.
When you will access your Windows Azure Web application again you can
verify that seconds time the request went to next instance. BE sure
to pass the connection timeout otherwise your request will be keep
handled by same instance.
Note: If you dont want to use RDP, you sure can also create a test ASP.NET page to write some special code based on your specific instance which will show you that this page is specific to certain instance. The best way to do is to read the Instance ID as below:
int instanceID = RoleEnvironment.CurrentRoleInstance.Id;
If you want to have more control over Windows Azure Load Balancing, i would suggest using the Windows Azure Traffic Manager which will help you to route the traffic to your site via Round-Robin, Performance or backup based scenario. More info on using Traffis Manager is in this article.
Hi I am new to Marklogic and Apache. I have been provided task to use apache as loadbalancer for our Marklogic cluster of 3 machines. Marklogic cluster is currently running on Linux servers.
How can we achieve this? Any information regarding this would be helpful.
You could use mod_proxy_balancer. How you configure it depends what MarkLogic client you would like to use. If you would like to use the Java Client API, please follow the second example here to allow apache to generate stickiness cookies. If you would like to use XCC, please configure it to use the ML-Server-generated or backend-generated "SessionID" cookie.
The difference here is that XCC uses sessions whereas the Java Client API builds on the REST API which is stateless, so there are no sessions. However, even in the Java Client API when you use multi-request transactions, that imposes state for the duration of that transaction so the load balancer needs a way to route requests during that transaction to the correct node in the MarkLogic cluster. The stickiness cookie will be resent by the Java Client API with every request that uses a Transaction so the load balancer can maintain that stickiness for requests related to that transaction.
As always, do some testing of your configuration to make sure you got it right. Properly configuring apache plugins is an advanced skill. Since you are new to apache, your best hope of ensuring you got it right is checking with an HTTP monitoring tool like WireShark to look at the HTTP traffic from your application to MarkLogic Server to make sure things are going to the correct node in the cluster as expected.
Note that even with the client APIs (Java, Node.js) its not always obvious or explicit at the language API layer what might cause a session to be created. Explicitly creating multi statement transactions definately will, but other operations may do so as well. If you are using the same connection for UI (browser) and API (REST or XCC) then the browser app is likely to be doing things that create session state.
The safest, but least flexable configuration is "TCP Session Affinity". If they are supported they will eliminate most concerns related to load balancing. Cookie Session Affinity relies on guarenteeing that the load balencer uses the correct cookie. Not all code is equal. I have had cases where it the load balancer didn't always use the cookie provided. Changing the configuration to "Load Balancer provided Cookie Affinity" fixed that.
None of this is needed if all your communications are stateless at the TCP layer, the HTTP layer and the app layer. The later cannot be inferred by the server.
Another conern is if your app or middle tier is co-resident with other apps or the same app connecting to the same load balancer and port. That can be difficult to make sure there are no 'crossed wires' . When ML gets a request it associates its identity with the client IP and port. Even without load balencers, most modern HTTP and TCP client libraries implement socket caching. A great perfomrance win, but a hidden source of subtle random severe errors if the library or app are sharing "cookie jars" (not uncomnon). A TCP and Cookie Jar cache used by different application contexts can end up sending state information from one unrelated app in the same process to another. Mostly this is in middle tier app servers that may simply pass on requests from the first tier without domain knowledge, presuming that relying on the low level TCP libraries to "do the right thing" ... They are doing the right thing -- for the use case the library programmers had in mind -- don't assume that your case is the one the library authors assumed. The symptoms tend to be very rare but catastrophic problems with transaction failures and possibly data corruption
and security problems (at an application layer) because the server cannot tell the difference between 2 connections from the same middle tier.
Sometimes a better strategy is to load balance between the first tier and the middle tier, and directly connect from the middle tier to MarkLogic.
Especially if caching is done at the load balancer. Its more common for caching to be useful between the middle tier and the client then the middle tier and the server. This is also more analogous to the classic 3 tier architecture used with RDBMS's .. where load balancing is between the client and business logic tiers not between business logic and database.
I am developing a MVC4 application . We have hosted our application on Windows Azure IAAS Model . Right now we have configured 2 virtual machines and everything is working good. But we have an issue with maintaining User Loging .
If i login in virtual machine 1 , its not getting carried over ,when the next request is coming from Virtual machine 2 . We have mapped two virtual machines over load balance .
Should i look into Cache solutions . Any input will be greatly helpful ...
Thanks,
Jaswanth
You're hitting two completely separate VMs (yes load balanced, but separate). This mandates the need for storing any type of session data external to the VMs (or you need to sync the session content and have it identical in both VMs).
Azure doesn't do anything to sync session data for you. That's on you, to build it into your app's architecture. You mentioned caching, which is certainly a viable solution (which you pick, though, is up to you). There are other solutions too such as database-based session storage. Again, that's up to you.
But bottom line: If you're going to scale an app beyond a single server (VM in this case), in a load-balanced way, you cannot store session data in a specific vm.
Use a durable session state store (like Redis or SQL Server, etc) or put your state in a cookie and read/write it on each request. If cookie includes sensitive content, encrypt it.
In the context of Tomcat, can session replication takes place without enabling sticky session?
I understand the purpose of sticky session is to have the client 'sticks' to 1 server throughout the session. With session replication, the client's interaction with the server is replicated throughout the cluster (many web servers).
In the above case, can session replication takes place? i.e. client's session is spread though out the web servers and each interaction with any one web server is replicated across, thus, allowing seamless interaction.
AFAIK, tomcat clustering does not support non-sticky sessions. From tomcat docs:
Make sure that your loadbalancer is configured for sticky session
mode.
But there's a solution (I created, so you know I'm biased :-)) called memcached-session-manager (msm), that also supports non-sticky sessions. msm uses memcached (or any backend speaking the memcached protocol) as backend for session backup/storage.
In non-sticky mode sessions are only stored in memcached and no longer in tomcat, as with non-sticky sessions the session-store must be external (to avoid stale data).
It also supports session locking: with non-sticky sessions multiple, parallel requests may hit different tomcats and could modify the session in parallel, so that some of the session changes might get overwritten by others. Session locking allows synchronization of parallel requests going to different tomcats.
The msm home page mainly describes the sticky session approach (as it started with this only), for details regarding non-sticky sessions you might search or ask on the mailing list.
Details and examples regarding the configuration can be found in the msm wiki (SetupAndConfiguration).
Just to give you an idea regarding the complexity: what you need is one or more memcached servers running (or s.th. similar speaking memcached) and an updated tomcat context.xml like this:
<Context>
...
<Manager className="de.javakaffee.web.msm.MemcachedBackupSessionManager"
memcachedNodes="n1:host1.domain.com:11211,n2:host2.domain.com:11211"
sticky="false"
sessionBackupAsync="false"
lockingMode="auto"
requestUriIgnorePattern=".*\.(ico|png|gif|jpg|css|js)$"
/>
</Context>
Your load-balancer does not need special configuration, so once you have these things in place you can start testing your application.
An excellent article about this topic here:
http://www.mulesoft.com/tomcat-cluster
The terracotta product they mention has a simplistic tutorial here:
http://www.terracotta.org/documentation/web-sessions/get-started
Terracotta works on Tomcat but you must pay attention to check which bits of Terracotta are free and which are commercial. A couple of years ago their redundant store was paid and I don't remember this solution being a separated product.
I have actually found the inverse to this question. Session replication for me (Tomcat7) with the OOTB options only works properly WITHOUT Sticky Session. After turning the logging up I found that with the JVMRoutes enabled my session ID goes from A123456789 to A123456789.01 -- suffixed with the jvmroute. That session is successfully replicated from Node 01 in the cluster to Node 02 but uses the same ID (A123456789.01). When I take Node 01 out of the cluster then the traffic starts to stick on Node 02 -- and it looks now for a session A123456789.02 which of course does not exist. .01 is there, but basically sits idle until it expires. If I bring the other server up and the sessions are replicated and then take 02 down, I get even odder behaviour because the session is picked up where it left off.
For me, so far, session replication without sticky sessions (just regular RR amongst the nodes in the cluster) is the only thing that has worked.