Using RedisCloud as a datastore for a ServiceStack based AppHarbor hosted app.
The RedisCloud .net client documentation states to not use the ServiceStack.Redis connection managers:
Note: the ServiceStack.Redis client connection managers (BasicRedisClientManager and PooledRedisClientManager) should be disabled when working with the Garantia Data Redis Cloud. Use the single DNS provided upon DB creation to access your Redis DB. The Garantia Data Redis Cloud distributes your dataset across multiple shards and efficiently balances the load between these shards.
Why would they suggest that? Because they are doing fancy load balancing stuff in their 'Garantia Data' layer and don't want to handle unnecessary connections? The RedisClient class is not thread-safe, so it makes it much more difficult from the application programming perspective.
Should I just ignore their instructions and use a PooledRedisClientManager? How would I configure it with the single uri that RedisCloud provides?
Or will I need to write a basic RedisClient pool wrapper that just creates new RedisClient connections as needed to handle concurrent access (i.e. ignores all read/write pooling specifics, hopefully delegating all that up-stream to the RedisCloud layer)?
Why would they suggest that? Because they are doing fancy load balancing stuff in their 'Garantia Data' layer and don't want to handle unnecessary connections?
I think you could be right. To my knowledge these classes simply wrap creating/retrieving instances of RedisClient (though, I think Basic always creates a new RedisClient). While I looked over their site, I did't see anything about 'max number of connections to the Redis server(s). The previous Redis vendor from AppHarbor (MyRedis) had plans that listed the number of max connections allowed per plan. However, I also didn't see anything on the site mention connection limits/handling.
Should I just ignore their instructions and use a PooledRedisClientManager? How would I configure it with the single uri that RedisCloud provides?
Well, if you do ignore their instructions my guess is you could eventually run into a 'max number of connections exceeded' error. That would make it difficult to get to your Redis Server(s). I think you could still use the BasicRedisClientManager because when you call GetClient() it always 'news up' a RedisClient in the same way shown in their example.
Related
I wanted to understand the pro's/cons of using a client node within a cluster vs a external thin client. Ofcourse the thin client will be less chatty Vs a client node and hence less n/w interactions. Changes in the cluster topology(nodes adding/removing) would not affect the client, while it directly affects the client node.
All these make me wonder will a thin client always be a better option or are then other cases where having a client node makes much more sense.
If Apache/Gridgain has any documentation/links around this. That would do too.
TIA
I think there won't be any thick client in future major releases; it will be superseded by a thin one instead because of a single protocol and lightweight design.
At the moment, a thick client still has some features advantage:
faster and better discovery and communication (topology changes)
peer class loading
near caching
advanced compute capabilities
events listening
full data structures support/distributed locking
etc
The feature parity list is constantly shrinking, but it's also worth mentioning that some features might be available for a particular platform only. For example, in .NET thin client, but not in Java one.
You have mentioned the cons already - being a cluster-wide citizen implies the same obligation a server node has. I.e. a good network channel and participation in all global events.
That means in some cases a thick client might not be deployed and working as expected. Usually it's about NAT, private networks, firewalls, and so on.
In general, I'd say if your task could be implemented by a thin client, use it. If a required feature/API is not yet available, consider using a thick one. For example, if you need something like a health-check for your application running every minute, you definitely would like to have a thin client for that task and not to trigger PME.
Thick clients are aware of all nodes, data distribution, and are more efficient in most cases, use them if your deployment allows for it. Plus, thick clients support all of the GridGain APIs.
Thin clients are lightweight(similar to a jdbc driver), connect to the cluster via binary protocol with a well-defined message format, support a limited set of APIs, and allow for support of multiple languages: Java, .NET, C++, Python, Node.JS, and PHP are supported out of the box.
see docs on thin/thick clients differences
Also take a look at capabilities of thin clients.
This section explains how to choose a client.
For example, a thick client serves as a reducer for queries, thereby you avoid an extra hop(from server to thin client), and lessening the cluster load when executing a query on a partitioned cache.
A Thick client could also directly participate in compute jobs, usually it is used as a reducer, whereas a thin client just submits a job to the cluster.
A thick client could also receive event notifications.
Thick clients could reconnect more reliably(because they know the current
cluster state) if the cluster topology changes.
FYI: This will be my first real foray into Async/Await; for too long I've been settling for the familiar territory of BackgroundWorker. It's time to move on.
I wish to build a WCF service, self-hosted in a Windows service running on a remote machine in the same LAN, that does this:
Accepts a request for a single .ZIP archive
Creates the archive and packages several files
Returns the archive as its response to the request
I have to support archives as large as 10GB. Needless to say, this scenario isn't covered by basic WCF designs; we must take additional steps to meet the requirement. We must eliminate timeouts while the archive is building and memory errors while it's being sent. Both of these occur under basic WCF designs, depending on the size of the file returned.
My plan is to proceed using task-based asynchronous WCF calls and streaming mode.
I have two concerns:
Is this the proper approach to the problem?
Microsoft has done a nice job at abstracting all of this, but what of the underlying protocols? What goes on 'under the hood?' Does the server keep the connection alive while the archive is building (could be several minutes) or instead does it close the connection and initiate a new one once the operation is complete, thereby requiring me to properly route the request through the client machine firewall?
For #2, clearly I'm hoping for the former (keep-alive). But after some searching I'm not easily finding an answer. Perhaps you know.
You need streaming for big payloads. That is the right approach. This has nothing at all to do with asynchronous IO. The two are independent. The client cannot even tell that the server is async internally.
I'll add my standard answers for whether to use async IO or not:
https://stackoverflow.com/a/25087273/122718 Why does the EF 6 tutorial use asychronous calls?
https://stackoverflow.com/a/12796711/122718 Should we switch to use async I/O by default?
Each request runs over a single connection that is kept alive. This goes for both streaming big amounts of data as well as big initial delays. Not sure why you are concerned about routing. Does your router kill such connections? That's a problem.
Regarding keep alive, there is nothing going over the wire to do that. TCP sessions can stay open indefinitely without any kind of wire traffic.
I've got a working Silverlight/WCF application that I need to start thinking about scaling. An obvious target for scaling, of course, is Azure.
The key architectural feature of the application is that 2-10 Silverlight clients will join a given "room" (using a duplex Net.TCP connection), and any of those clients can then send a message (for instance, a chat message), which then needs to be pushed in real-time to every other client connected to the same room, using the underlying duplex WCF connection.
Right now, the way the WCF service works is basically to keep in-memory a list of sessions and the rooms that they're associated with, so that when a message from one session comes in, it can automatically send the message to every other session in the room.
This works fine for a single WCF server instance, but it gets complicated if you need to scale it so that multiple WCF instances are in play. If you use network-layer load balancing, of course, you would typically find that only some of the members of your room are on the same server you're on, which means that when it comes time to push out messages to all those members, only some of them would actually get notified.
Apart from Azure, I had been thinking that I would handle it via some sort of application-layer load balancing. For instance, the web server that each client downloads the Silverlight application from might do a primitive round-robin sort of load-balancing, i.e., "OK, everyone in room x, you use WCF instance 1. Everyone in room y, you use WCF instance 2." That sort of thing.
So I have two questions:
(1) Is there any other, better way to architect this, so as to be able to use network-layer load balancing rather than needing to make the application aware of the underlying infrastructure?
(2) If I have to do the application-layer load balancing, what's the best way to handle this in Azure? Do I have to use the IAAS (full VM's), or is there a way to do this using PAAS (worker roles)? My understanding is that it's not possible to independently address worker roles, which would make a roles-based approach difficult, if not impossible.
SignalR powered by the Azure Service bus, may work for you.
http://vasters.com/clemensv/2012/02/13/SignalR+Powered+By+Service+Bus.aspx
We've an existing system which connects to the the back end via http (apache/ssl) and polls the server for new messages, needless to say we have scalability issues.
I'm researching on removing this polling and have come across BOSH/XMPP but I'm not sure how we should take the BOSH technique (using long lived http connection).
I've seen there are few libraries available but the entire thing seems bloaty since we do not need buddy lists etc and simply want to notify the clients of available messages.
The client is written in C/C++ and works across most OS so that is an important factor. The server is in Java.
does bosh result in huge number of httpd processes? since it has to keep all the clients connected, what would be the limit on that. we are also planning to move to 64 bit JVM/apache what would be the max limit of clients in that case.
any hints?
I would note that BOSH is separate from XMPP, so there's no "buddy lists" involved. XMPP-over-BOSH is what you're thinking of there.
Take a look at collecta.com and associated blog posts (probably by Jack Moffitt) about how they use BOSH (and also XMPP) to deliver real-time information to large numbers of users.
As for the scaling issues with Apache, I don't know — presumably each connection is using few resources, so you can increase the number of connections per Apache process. But you could also check out some of the connection manager technologies (like punjab) mentioned on the BOSH page above.
Hey there guys, I am a recent grad, and looking at a couple jobs I am applying for I see that I need to know things like runtime complexity (straight forward enough), caching (memcached!), and load balancing issues
(no idea on this!!)
So, what kind of load balancing issues and solutions should I try to learn about, or at least be vaguely familiar with for .net or java jobs ?
Googling around gives me things like network load balancing, but wouldn't that usually not be adminstrated by a software developer?
One thing I can think of is session management. By default, whenever you get a session ID, that session ID points to some in-memory data on the server. However, when you use load-balacing, there are multiple servers. What happens when data is stored in the session on machine 1, but for the next request the user is redirected to machine 2? His session data would be lost.
So, you'll have to make sure that either the user gets back to the same machine for every concurrent request ('sticky connection') or you do not use in-proc session state, but out-of-proc session state, where session data is stored in, for example, a database.
There is a concept of load distribution where requests are sprayed across a number of servers (usually with session affinity). Here there is no feedback on how busy any particular server may be, we just rely on statistical sharing of the load. You could view the WebSphere Http plugin in WAS ND as doing this. It actually works pretty well even for substantial web sites
Load balancing tries to be cleverer than that. Where some feedback on the relative load of the servers determines where new requests go. (even then session affinity tends to be treated as higher priority than balancing load). The WebSphere On Demand Router that was originally delivered in XD does this. If you read this article you will see the kind of algorithms used.
You can achieve balancing with network spraying devices, they could consult "agents" running in the servers which give feedback to the sprayer to give a basis for decisions where request should go. Hence even this Hardware-based approach can have a Software element. See Dynamic Feedback Protocol
network combinatorics, max- flow min-cut theorems and their use