Solandra Sharding: Insider Thoughts - solandra

Just got started on Solandra and was trying to understand the 2nd
level details of Solandra sharding.
AFAIK Soalndra creates number of shards configured (as
"solandra.shards.at.once" property) where each shard is up to size of
"solandra.maximum.docs.per.shard".
On the next level it starts
creating slots inside each shard which are defined by
"solandra.maximum.docs.per.shard"/"solandra.index.id.reserve.size".
What I understood from the datamodel of SchemaInfo CF that inside a
particular shard there are slots owned by different physical nodes and
these is a race happening between nodes to get these slots.
My questions are:
Does this mean if I request write on a particular solr node
eg .....solandra/abc/dataimport?command=full-import does this request
gets distributed to all possible nodes etc. Is this distributed write?
Because until that happens how would other nodes be competing for
slots inside a particular shard.Ideally the code for writing a doc or
set of docs would be getting executed on a single physical JVM.
By sharding we tried to write some docs on the single physical node
but if it is writing based on the slots which are owned by different
physical nodes , what did we actually achieved as we again need to
fetch results from different nodes. I understand that the write
throughput is maximized.
Can we look into tuning these numbers -?
"solandra.maximum.docs.per.shard" ,
"solandra.index.id.reserve.size","solandra.shards.at.once" .
If I have just one shard and replication factor as 5 in a single DC
6 node setup, I saw that the endpoints of this shard contain 5
endpoints as per the Replication Factor.But what happens to the 6th
one. I saw through nodetool that the left 6th node doesn't really get
any data. If I increase the replication factor to 6 while keeping the
cluster on , will this solve the problem and doing repair etc or is
there a better way.

Overall the shards.at.once param is used to control parallelism of indexing. the higher that number the more shards are written to at once. If you set it to one you will always to writing to only one shard. Normally this should be set to 20% > the number of nodes in the cluster. so for a four node cluster set it to five.
The higher the reserve size, the less coordination between the nodes is needed. so if you know you have lots of documents to write then raise this.
The higher the docs.per.shard the slower the queries for a given shard will become. In general this should be 1-5M max.
To answer your points:
This will only import from one node. but it will index across many depending on shards at once.
I think the question is should you write across all nodes? Yes.
Yes see above.
If you increase shards.at.once this will be populated quickly

Related

Optimizing Redis cluster nodes

I understand that in a Redis cluster, there are 16384 slots total distributed across the nodes. So if I have a key like this entity:user:userID (like user:1234) and the value is a serialized user object and say if my application has 500k+ users. It should get distributed to each slots evenly. We currently have 6 nodes total (3 masters and 3 slaves), and we are always wondering when we shall add 2 more nodes to 8 total. We also do write the cache data to disk, and sometimes we do get latency warning when persisting to disk. I'd assume if there are more nodes, there are less data to persist for each node, thus a better performance/usage of resources. But asides from disk i/o, is there a dead-on performance measurement to let us know when we should start adding additional nodes?
Thanks!
If your limiting factor is disk I/O for replication, using SSDs can drastically improve performance.
Two additional signs that it is time to scale out include server load and used memory for your nodes. There are others, but these two are simple to reason about.
If your limiting factor is processing power on the nodes (e.g. server load) because of a natural increase in requests, scaling out will help distribute the load across more nodes. If one node is consistently higher than the others, this could indicate a hot partition, which is a separate problem to solve.
If your limiting factor is total storage capacity (e.g. used memory) because of a natural increase in data stored in your cache, scaling out will help grow the total storage capacity of your cluster. If you have a very large dataset and the set of keys used on a regular basis is small, technologies such as Redis on Flash by Redis Labs may be applicable.

Redis > Isolate keys with large values?

It's my understanding that best practice for redis involves many keys with small values.
However, we have dozens of keys that we'd like to have store a few MB each. When traffic is low, this works out most of the time, but in high traffic situations, we find that we have timeout errors start to stack up. This causes issues for all of our tiny requests to redis, which were previously reliable.
The large values are for optimizing a key part of our site's functionality, and a real performance boost when it's going well.
Is there a good way to isolate these large values so that they don't interfere with the network I/O of our best practice-sized values?
Note, we don't need to dynamically discover if a value is >100KB or in the MBs. We have a specific method that we could have use a separate redis server/instance/database/node/shard/partition (I'm not a hardware guy).
Just install/configure as many instances as needed (2 in the case), each managing independently on a logical subset if keys (e.g. big and small), with routing done by the application. Simple and effective - divide and converter conquer
The correct solution would would be to have 2 separate redis clusters, one for big sized keys, and another one for small sized keys. These 2 clusters could run on the same set of physical or virtual machines, aka multitenancy (You would want to do that to fully utilize the underlying cores on your machine, as redis server is single threaded). This way you would be able to scale both the clusters separately, and your problem of small requests timing out because of queueing behind the bigger ones will be alleviated.

Why are hotspots on my redis cluster bad?

I have a redis cluster and I am planning to add keys which I know will have a much heavier read/update frequency than other keys. I assume this might cause hotspots on my cluster. Why is this bad and how can I avoid it ?
Hotspot on keys is ok, if these keys can sharding to different redis nodes. But if there is hotspot on some redis nodes/machines, that will be bad, as the memory/cpu load of these machines will be quite heavy, while other nodes are not efficiently used.
If you know exactly what these keys are, you can calculate slots of them by yourself at first, with CRC16 of the key modulo 16384.
And then you can distribute these slots to different redis nodes.
Whether or not items will cause a hot spot on a particular node or nodes depends on a bunch of factors. As already mentioned, hotspotting on a single key isn't necessarily a problem if the overall cluster traffic is still relatively even and the node that key is on isn't taxed. If each of your cluster nodes are handling 1000 commands/sec and on one of those nodes all of the commands are one related to one key, it's not really going to matter since all of the commands are processed serially on a single thread, it's all the same.
However, if you have 5 nodes, all of which are handling 1000 commands/sec, but you add a new key to one node which makes that single node incur another 3000 commands/sec, one of your 5 nodes is now handling 50% of the processing. This means that it's going to take longer for that node handle all of its normal 1000 commands/sec, and 1/5 of your traffic is now arbitrarily much slower.
Part of the general idea of distributing/sharding a database is not only to increase storage capacity but to balance work as well. Unbalancing that work will end up unbalancing or screwing up your application performance. It'll either cause 1/Nth of your request load to be arbitrarily slower due to accessing your bogged down node, or it'll increase processing time across the board if your requests potentially access multiple nodes. Evenly spreading load gives an application better capacity to handle higher load without adversely effecting performance.
But there's also the practical consideration of whether the access to your new keys are proportionally large to your ongoing traffic. If your cluster is handling 1000+ commands/sec/node and a single key will add 10 req/sec to a single particular node, you'll probably be fine just fine either way.

Why redis cluster only have 16384 slots?

In my opinion, with the development of keys, the 'hash conflict' will occurs more and more frequently. I have no idea if those keys on the same slot are stored in singly linked list, then read performance will be effected, especially the stale record?
answer from antirez, the author of Redis, below.
The reason is:
Normal heartbeat packets carry the full configuration of a node, that can be replaced in an idempotent way with the old in order to update an old config. This means they contain the slots configuration for a node, in raw form, that uses 2k of space with16k slots, but would use a prohibitive 8k of space using 65k slots.
At the same time it is unlikely that Redis Cluster would scale to more than 1000 mater nodes because of other design tradeoffs.
So 16k was in the right range to ensure enough slots per master with a max of 1000 maters, but a small enough number to propagate the slot configuration as a raw bitmap easily. Note that in small clusters the bitmap would be hard to compress because when N is small the bitmap would have slots/N bits set that is a large percentage of bits set.
These "slots" are merely a unit of distribution across shards. You're not going to have of 16K shards servers in a cluster; but the are granular enough to allow some degree of weighted load distribution. (For example if you start with four shard on one type of hardware and choose to introduce two more of a more power profile, you could make the new servers targets for twice as many slots as the existing servers and thus achieve a more even relatively utilization of your capacity.
I'm just summarizing the gist of how they're used. For details read the Redis Cluster Specification.

Optimizing write performance of a 3 Node 8 Core/16G Cassandra cluster

We have setup a 3 node performance cluster with 16G RAM and 8 Cores each. Our use case is to write 1 million rows to a single table with 101 columns which is currently taking 57-58 mins for the write operation. What should be our first steps towards optimizing the write performance on our cluster?
The first thing I would do is look at the application that is performing the writes:
What language is the application written in and what driver is it using? Some drivers can offer better inherent performance than others. i.e. Python, Ruby, and Node.js drivers may only make use of one thread, so running multiple instances of your application (1 per core) may be something to consider. Your question is tagged 'spark-cassandra-connector' so that possibly indicates your are using that, which uses the datastax java driver, which should perform well as a single instance.
Are your writes asynchronous or are you writing data one at a time? How many writes does it execute concurrently? Too many concurrent writes could cause pressure in Cassandra, but not very many concurrent writes could reduce throughput. If you are using the spark connector are you using saveToCassandra/saveAsCassandraTable or something else?
Are you using batching? If you are, how many rows are you inserting/updating per batch? Too many rows could put a lot of pressure on cassandra. Additionally, are all of your inserts/updates going to the same partition within a batch? If they aren't in the same partition, you should consider batching them up.
Spark Connector Specific: You can tune the write settings, like batch size, batch level (i.e. by partition or by replica set), write throughput in mb per core, etc. You can see all these settings here.
The second thing I would look at is look at metrics on the cassandra side on each individual node.
What does the garbage collection metrics look like? You can enable GC logs by uncommenting lines in conf/cassandra-env.sh (As shown here). Are Your Garbage Collection Logs Speaking to You?. You may need to tune your GC settings, if you are using an 8GB heap the defaults are usually pretty good.
Do your cpu and disk utilization indicate that your systems are under heavy load? Your hardware or configuration could be constraining your capability Selecting hardware for enterprise implementations
Commands like nodetool cfhistograms and nodetool proxyhistograms will help you understand how long your requests are taking (proxyhistograms) and cfhistograms (latencies in particular) could give you insight into any other possibile disparities between how long it takes to process the request vs. perform mutation operations.