The documentation says:
GemFire clients are processes that send most or all of their data requests and updates to a GemFire server system. Clients run as standalone processes, without peers of their own.
Fundamentally, all peers communicate among themselves to manage the cache. An entry made by one peer in a region goes to all other peers. Similarly, a client's cache gets updated as soon as there is a change on the server. Also a client is allowed to make new entries into the region that will get propagated to all server peers.
What then is the real difference between a client and a server peer? Based on my understanding, both have access to all data and both can do the same operations.
The major difference between a peer and a client is that the peer connects to all other members of the distributed system; it has at-least 2 connections open at all times to each other member in the distributed system. Clients do not need connections to all servers, a single connection to a single server is enough. Thus, you can have tens of thousands of clients, but may be only hundreds of peers. (The number of connections that the client establishes can be configured while creating a client pool. You can also configure single-hop on the client, which enables it to connect directly to servers against which it wishes to operate).
The performance implication here is that peers can access any data with just one network hop, whereas clients may need at-most 2 network hops (one from client to server, one from server to the node where data lives).
The other differences are:
1. Clients can Register interest, peers cannot.
2. Clients can register Continuous Queries, peers cannot.
Related
I have multiple gRPC server instances run behind a load balancer and, a large number of clients each client subscribe to one of the instances.
I have a use case when I need to stream a message from server to group of clients and am wondering can I store all the client's streams in central DB ie. Redis then when I want to stream a message one of the instances will fetch all the stream connections belonging to the clients' group and use them to send a message?
LDAP Fault-tolerance configuration (e.g SunOne):
Does anyboby know how to configuration "Fault-tolerance" for LDAP, e.g SunOne LDAP.
I search via google without any userful result?
Thanks
Assuming, by "fault tolerance," "high availability (HA)" is being asked, I would say it can be achieved by redundancy. And, it would not be peculiar to SunOne or any directory server software from other vendors.
There are different ways to solve this. It depends on the business requirements and the affordability. One method that comes to mind is to have the LDAP software installed on an HA pair. This requires hardware and OS capabilities for fail-over and it requires two servers (in a world of virtualization, "server" can mean different things [physical box, frame, LPAR, etc.]; so, I'll just leave the interpretation to the reader). When one server fails, the other server takes over and assumes the primary role in the pair. This is the fault-tolerance part. In this approach, the machine/server with the secondary role is passive (i.e., it's not serving clients) until the primary goes down. You will need to implement LDAP data replication between two servers. They can be two LDAP masters in a P2P replication topology.
Another method is to have multiple LDAP servers (i.e., masters, replicas) and cluster them using a network dispatcher (ND) software/appliance/etc., which would distribute the incoming traffic to the individual servers (usually replicas) in the cluster. If you lose one replica in the cluster, ND will not send any traffic to that replica until it comes back. However, other replicas will still be receiving load and therefore serving to the incoming traffic. This is the fault-tolerance part in this method. The degree of the availability you want will also dictate what can be done in a clustered environment. You can have a single LDAP master (to which the organization's applications would make updates) and keep it out of the cluster, but pair with another server for fail-over (so you wouldn't lose availability for updates from the applications - this also gives you the freedom to do maintenance on the master without interrupting your applications [well, they need to be written to be able to write to more than one LDAP master if the primary one is not available]). You would have to have the secondary server to receive replication from the primary in any case. If the budget doesn't let you have more servers/replicas, then you can put the master server along with replicas in the cluster as well to help with the read traffic. Instead of an HA-pair in which one of the servers would be passive, you can have two masters configured in a P2P replication topology and have them both in the cluster to help with the traffic too. There are different ways to approach to this method depending on the level of redundancy wanted or that can be afforded.
So I've read some articles about scaling Socket.IO. For various reasons I don't want to use built-in Socket.IO scaling mechanism (mostly it seems to be inefficient, since it publishes a lot more stuff to Redis then required from my point of view).
So I've came up with this simple idea:
Each Socket.IO server creates Redis pub/sub/store clients, connects to Redis and subscribes to a channel. Now, when I want to broadcast data I just publish it to Redis and all other Socket.IO servers get it and push it to users.
There is a problem, though (which I think is also a problem for Socket.IO built-in mechanism). Let's say I want to know the number of all connected users. There are at least two ways of doing that:
Server A publishes give_me_clients to Redis. Then each Socket.IO server counts connections and publishes number_of_clients. Server A grabs this data, combines it and sends it to the client.
Each server updates number_of_clients_for::ID_HERE in Redis whenever user connects/disconnects to the server. Then Server A just fetches data and combines it. Might be more efficient.
There are problems with these solutions though:
Server A is not aware of other servers. Therefore he does not know when he should stop listening to number_of_clients. One could fix it with making Server A aware of other servers: whenever a server connects to Redis he publishes new_server (Server A grabs the data and stores it in memory). But what to do, when Redis - Socket.IO connection breaks? Is there a way for Redis to notify clients that one of the client disconnected?
Actually the same as above. When a Socket.IO server crashes how to clear number_of_clients data?
So the real question is: can Redis notify (publish to chanel) clients that the connection with one of them has just ended??
After a lot of testing it seems, that Redis does not have such functionality. Also I've found out, that scaling Socket.IO is really a pain.
So I've switched from Socket.IO to WS (see this link). It is low level (but perfect for my use) and it only supports WebSockets (in all major versions). But then again I only want to support WebSockets and FlashSocket (which I have to imlement manually, but that's fine).
The advantage is that I can easily create cluster with such servers. HAProxy works with such servers almost out of the box (some minor tuning). Servers can easily communicate on a local net (with UDP or central TCP server if the cluster is big).
The disadvantage is that one have to manually implement some cool features like heartbeats, broadcasting, rooms, etc. Also you want have long-polling fallback, but that's fine in my case. Scaling is still more important, imho.
From what I understand IMAP requires a connection per each user. I'm writing an IMAP client (currently just gmail) that supports many (100s, 1000s maybe 10000s+) users at a time. Obviously cutting down the number of open connections would be great. I'm wondering if it's possible to use thread pooling on my side to connect to gmail via IMAP or if that simply isn't supported by the IMAP protocol.
IMAP typically uses SSL over TCP/IP. And a TCP/IP connection will need to be maintained per IMAP client connection, meaning that there will be many simultaneous open connections.
These multiple simultaneous connections can easily be maintained in a non-threaded (single thread) implementation without affecting the state of the TCP connections. You'll have to have some sort of a flow concept per IMAP TCP/IP connection, and store all of the flows in a container (a c++ STL map for instance) using the TCP/IP five-tuple (or socketFd) as a key. For each data packet received, lookup the flow and handle the packet accordingly. There is nothing about this approach that will affect the TCP nor IMAP connections.
Considering that this will work in a single-thread environment, adding a thread pool will only increase the throughput of the application, since you can handle data packets for several flows simultaneously (assuming its a multi-core CPU) You will just need to make sure that 2 threads dont handle data packets for the same flow at the same time, which could cause the packets to be handled out of order. An approach could be to have a group of flows per thread, maybe using IP pools or something similar.
I'm trying to get some feedback on the recommendations for a service 'roster' in my specific application. I have a server app that maintains persistant socket connections with clients. I want to further develop the server to support distributed instances. Server "A" would need to be able to broadcast data to the other online server instances. Same goes for all other active instances.
Options I am trying to research:
Redis / Zookeeper / Doozer - Each server instance would register itself to the configuration server, and all connected servers would receive configuration updates as it changes. What then?
Maintain end-to-end connections with each server instance and iterate over the list with each outgoing data?
Some custom UDP multicast, but I would need to roll my own added reliability on top of it.
Custom message broker - A service that runs and maintains a registry as each server connects and informs it. Maintains a connection with each server to accept data and re-broadcast it to the other servers.
Some reliable UDP multicast transport where each server instance just broadcasts directly and no roster is maintained.
Here are my concerns:
I would love to avoid relying on external apps, like zookeeper or doozer but I would use them obviously if its the best solution
With a custom message broker, I wouldnt want it to become a bottleneck is throughput. Which would mean I might have to also be able to run multiple message brokers and use a load balancer when scaling?
multicast doesnt require any external processes if I manage to roll my own, but otherwise I would need to maybe use ZMQ, which again puts me in the situation of depends.
I realize that I am also talking about message delivery, but it goes hand in hand with the solution I go with.
By the way, my server is written in Go. Any ideas on a best recommended way to maintain scalability?
* EDIT of goal *
What I am really asking is what is the best way to implement broadcasting data between instances of a distributed server given the following:
Each server instance maintains persistent TCP socket connections with its remote clients and passes messages between them.
Messages need to be able to be broadcasted to the other running instances so they can be delivered to relavant client connections.
Low latency is important because the messaging can be high speed.
Sequence and reliability is important.
* Updated Question Summary *
If you have multiple servers / multiple end points that need to pub/sub between each other, what is a recommended mode of communication between them? One or more message brokers to re-pub messages to a roster of the discovered servers? Reliable multicast directly from each server?
How do you connect multiple end points in a distributed system while keeping latency low, speed high, and delivery reliable?
Assuming all of your client-facing endpoints are on the same LAN (which they can be for the first reasonable step in scaling), reliable UDP multicast would allow you to send published messages directly from the publishing endpoint to any of the endpoints who have clients subscribed to the channel. This also satisfies the low-latency requirement much better than proxying data through a persistent storage layer.
Multicast groups
A central database (say, Redis) could track a map of multicast groups (IP:PORT) <--> channels.
When an endpoint receives a new client with a new channel to subscribe, it can ask the database for the channel's multicast address and join the multicast group.
Reliable UDP multicast
When an endpoint receives a published message for a channel, it sends the message to that channel's multicast socket.
Message packets will contain ordered identifiers per server per multicast group. If an endpoint receives a message without receiving the previous message from a server, it will send a "not acknowledged" message for any messages it missed back to the publishing server.
The publishing server tracks a list of recent messages, and resends NAK'd messages.
To handle the edge case of a server sending only one message and having it fail to reach a server, server can send a packet count to the multicast group over the lifetime of their NAK queue: "I've sent 24 messages", giving other servers a chance to NAK previous messages.
You might want to just implement PGM.
Persistent storage
If you do end up storing data long-term, storage services can join the multicast groups just like endpoints... but store the messages in a database instead of sending them to clients.