Can NATS JetStream partially replicate streams on different NATS Servers

Can NATS JetStream partially replicate streams on different NATS Servers - jetstream

I have Server A with NATS Jet Stream.I want some part of NATS Streams replicated on Server B and Server C (NATS Streams) mutually exclusive.

As far as the Stream's built-in replication goes you can not: all the messages are replicated in all the nats-servers (according to the number of replicas). But you can use tags to select which cluster (or which servers in a cluster) are used. You can also mirror streams and source streams from other streams (and subjects).
I should add that you can also effectively split a stream in multiple streams (like partitions) by using the partition function of subject name mapping since nats-server version 2.8. (https://docs.nats.io/nats-concepts/subject_mapping#deterministic-subject-token-partitioning)

Related

How to SET on specific redis server with StackExchange.Redis client?

I have 3 redis servers running in docker containers. From redis-cli I can SET on specific server.
SET myValue 100
How can I do this with StackExchange.Redis client?
I don't see anything in server api that allows to do that. Bear in mind that I don't know much about Redis at all.
var connection = ConnectionMultiplexer.Connect("localhost:6379,localhost:6380,localhost:6381");
var server = connection.GetServer("localhost", 6381);
server.???

SE.Redis expects to be managing a single logical keyspace; the support for multiple nodes is intended either for master/replica setups, or for redis-cluster (although, in the case of cluster, node discovery is achieved via the redis API, so a single node would be fine if it is reachable). With that in place: the selection of servers is implicit from the operation (i.e. writes need to go to a master, and in the case of "cluster", the keyspace shard mapping should be applied).
If you want to write to separate servers as though they are separate databases, you should use a connection per server; not a single connection that spans them all. Right now, SE.Redis is probably detecting 3 master nodes and electing to use one of them arbitrarily. You can see what it thinks by passing a TextWriter to the Connect/ConnectAsync method.

Is there a way to fully separate two redis databases for pub/sub usage?

Scenario: Two instances of an application share the same redis instance, but use different databases. The application makes use of the redis pub/sub functions to exchange data between services.
Problem: When application instance A publishes something (on redis database 1), application instance B (running on redis database 2) receives the message.
Expectation: As both instances of the application use a different database, I would expect not only that the keys in redis are hold separately, but pub/sub subscribers aswell.
Question: Can I tell redis to keep pub/sub separate for each database?

No - PubSub is shared across all clients connected to the server, regardless of their currently SELECTed database (shared database/numbered database/keyspace). While you can use different channels and such, real separation is possible only by using two Redis instances.
Note: using shared/numbered databases isn't recommended - always use dedicated Redis instances per app/service/use case

As https://redis.io/docs/manual/pubsub/#database--scoping suggests
If you need scoping of some kind, prefix the channels with the name of
the environment (test, staging, production...).

Load balancing in apache kafka

I am new to Apache Kafka and was playing around with it. If I have 2 brokers and one topic with 4 partitions and assume one of my broker is heavily loaded, will kafka takes care of balancing the incoming traffic from producers to the other free broker ? If so how it is done ?

If you have multiple partitions, it's the producers responsibility/choice of which partition they want to send it to.
Producers publish data to the topics of their choice. The producer is responsible for choosing which message to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the message). link
In Kafka producer, a partition key can be specified to indicate the destination partition of the message. By default, a hashing-based partitioner is used to determine the partition id given the key, and people can use customized partitioners also. To reduce # of open sockets, in 0.8.0 (https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning key is not specified or null, a producer will pick a random partition and stick to it for some time (default is 10 mins) before switching to another one. link
If you specify which partition you want the data to go into, it will always go into that specific partition. If you don't specify, the producer could send it to any partition. The Kafka broker never internally moves or balances messages/partitions.
I believe this decision is to provide certain guarantees for the ordering of messages in a Kafka partition.

Kafka producer tends to distribute messages equally among all partitions unless you override this behavior, then you need to have a look if the four partitions is distributed evenly among brokers.
It depends on what do you mean by "one of the brokers is heavily loaded". if it is because of that topic or this cluster has any other topics (e.g. __consumer_offset).
You can choose the brokers in which partition resides with a cli tools with Kafka or with some kind of UI like yahoo kafka-manager.

Redis: Efficient cluster of servers for large key set

I have a very large set of keys, 200M keys, with small values, <100 bytes, to store and I'm trying to use Redis. The problem is such that I have 10 Redis DB to split the keys over, but currently I'm on a single server with those 10 Redis DB. By a Redis DB I mean using SELECT. From my calculations it looks like I'm going to blow out memory. I think I'll need over 4TB of memory for this case! What are my options? First, my calculation is based on 10000 keys with 100 byte values taking 220MB of RAM (this is from a table I found). So simply put (2*10^8 / 10^4) * 220MB = 4.4TB.
If my calculation looks correct, what are my options? I've read on different posts that Redis VM is no longer an option. Can I use a Redis cluster? This still appears to require too many servers to be practical. I understand I could switch to another DB, but I'd like that to be the last resort option.

Firstly, using shared databases (i.e. the SELECT command) isn't a recommended practice since all of these databases are essentially managed by the same Redis process. It is preferable having 10 separate Redis processes (even on the same server) in order to avoid contention (more info here).
Next, there are ways to reduce the memory footprint of your database. You could, for example, perform client-side compression (see here) or consider other optimizations such as using Hashes to keep multiple values (as described here).
That said, a Redis server is ultimately bound by the amount of RAM that the host provides. Once you've reached that limit you'll need to shard your database and use a Redis cluster. Since you're already using multiple databases this shouldn't pose a big challenge as your code should already be compatible with that to a degree. Sharding can be done in one of three approaches: client, proxy or Redis Cluster. Client-side sharding can be implemented in your code or by the Redis client that you're using (if the client library that you're using supports that). Redis Cluster (v3) is expected to be released in the very near future and already has a stable release candidate. As for proxy-based sharding, there are several open source solutions out there, including Twitter's twemproxy, Netflix's dynomite and codis. Additional information about sharding and partitioning can be found here.
Disclaimer: I work at Redis Labs. Lastly, AFAIK there's only one Redis-as-a-Service provider that already provides built-in support for clustering Redis. Redis Labs' Redis Cloud is a fully-managed service that can scale seamlessly to any required capacity. Our clusters support both the '{}' hashtag standard as well as sharding by RegEx - more about this can be found here.

You can use LMDB with Dynomite to store data beyond your memory capacity. LMDB uses both disk and memory to store data. Dynomite make LMDB to be distributed.
We have done a POC with this combo and they work nicely together.
For more information, please check out our open issue here:
https://github.com/Netflix/dynomite/issues/254

How to build a simplified redis cluster (support data sharding and load balance)?

Since the redis cluster is still a work in progress, I want to build a simplied one by myselfin the current stage. The system should support data sharding,load balance and master-slave backup. A preliminary plan is as follows:
Master-slave: use multiple master-slave pairs in different locations to enhance the data security. Matsters are responsible for the write operation, while both masters and slaves can provide the read service. Datas are sent to all the masters during one write operation. Use Keepalived between the master and the slave to detect failures and switch master-slave automatically.
Data sharding: write a consistant hash on the client side to support data sharding during write/read in case the memory is not enougth in single machine.
Load balance: use LVS to redirect the read request to the corresponding server for the load balance.
My question is how to combine the LVS and the data sharding together?
For example, because of data sharding, all keys are splited and stored in server A,B and C without overlap. Considering the slave backup and other master-slave pairs, the system will contain 1(A,B,C), 2(A,B,C) , 3(A,B,C) and so on, where each one has three servers. How to configure the LVS to support the redirection in such a situation when a read request comes? Or is there other approachs in redis to achieve the same goal?
Thanks:)

You can get really close to what you need by using:
twemproxy shard data across multiple redis nodes (it also supports node ejection and connection pooling)
redis slave master/slave replication
redis sentinel to handle master failover
depending on your needs you probably need some script listening to fail overs (see sentinel docs) and clean things up when a master goes down

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas