What is the best way to swap hard drives in cassandra - datastax

we have dense nodes. (~1TB per node; 8 node cluster. DSC 2.1.13; RF=3). we are planning on moving SSDs from spinning disks. what is the best way of doing this?
Approach 1: nodetool decommission -> going to take for ever to re-distribute all the data across the ndoes.
Approach 2: nodetool removednode HostUUID -> Per my research also takes time to re-distribute keys
Approach 3: shutdown node-> swap the drives-> nodetool repair.
Thanks!

Go with approach 3 and you can copy everything from that drive to ssd and reboot node to shorten the process. If you can do it under hinted handoff window you will have that node up and in consistent state in no time. Make sure your clients don't crash if they use strict consistency levels as already stated by other two comments.

Related

One instance with multple GPUs or multiple instances with one GPU

I am running multiple models using GPUs and all jobs combined can be run on 4 GPUs, for example. Multiple jobs can be run on the same GPU since the GPU memory can handle it.
Is it a better idea to spin up a powerful instance with all 4 GPUs as part of it and run all the jobs on one instance? Or should I go the route of having multiple instances with 1 GPU on each?
There are a few factors I'm thinking of:
Latency of reading files. Having a local disk on one machine should be faster latency wise, but it would be a quite a few reads from one source. Would this cause any issues?
I would need quite a few vCPU and a lot of memory to scale the IOPS since GPC scales IOPS that way, apparently. What is the best way to approach this? If anyone has any more on this, would appreciate pointers.
If in the future I need to downgrade to save costs/downgrade performance, I could simple stop the instance and change my specs.
Having everything on one machine would be easier to work with. I know in production I would want a more distributed approach, but this is strictly experimentation.
Those are my main thoughts. Am I missing something? Thanks for all of the help.
Ended up going with one machine with multiple GPUs. Just assigned the jobs to the different GPUs to make the memory work.
I suggest you'll take a look here if you want to run multiple tasks on the same GPU.
Basically when using several tasks (different processes or containers) on the same GPU, it won't be efficient due to some kind on context switching.
You'll need the latest nvidia hardware to test it.

zookeeper vs redis server sync

I have a small cluster of servers I need to keep in sync. My initial thought on this was to have one server be the "master" and publish updates using redis's pub/sub functionality (since we are already using redis for storage) and letting the other servers in the cluster, the slaves, poll for updates in a long running task. This seemed to be a simple method to keep everything in sync, but then I thought of the obvious issue: What if my "master" goes down? That is where I started looking into techniques to make sure there is always a master, which led me to reading about ideas like leader election. Finally, I stumbled upon Apache Zookeeper (through python binding, "pettingzoo"), which apparently takes care of a lot of the fault tolerance logic for you. I may be able to write my own leader selection code, but I figure it wouldn't be close to as good as something that has been proven and tested, like Zookeeper.
My main issue with using zookeeper is that it is just another component that I may be adding to my setup unnecessarily when I could get by with something simpler. Has anyone ever used redis in this way? Or is there any other simple method I can use to get the type of functionality I am trying to achieve?
More info about pettingzoo (slideshare)
I'm afraid there is no simple method to achieve high-availability. This is usually tricky to setup and tricky to test. There are multiple ways to achieve HA, to be classified in two categories: physical clustering and logical clustering.
Physical clustering is about using hardware, network, and OS level mechanisms to achieve HA. On Linux, you can have a look at Pacemaker which is a full-fledged open-source solution coming with all enterprise distributions. If you want to directly embed clustering capabilities in your application (in C), you may want to check the Corosync cluster engine (also used by Pacemaker). If you plan to use commercial software, Veritas Cluster Server is a well established (but expensive) cross-platform HA solution.
Logical clustering is about using fancy distributed algorithms (like leader election, PAXOS, etc ...) to achieve HA without relying on specific low level mechanisms. This is what things like Zookeeper provide.
Zookeeper is a consistent, ordered, hierarchical store built on top of the ZAB protocol (quite similar to PAXOS). It is quite robust and can be used to implement some HA facilities, but it is not trivial, and you need to install the JVM on all nodes. For good examples, you may have a look at some recipes and the excellent Curator library from Netflix. These days, Zookeeper is used well beyond the pure Hadoop contexts, and IMO, this is the best solution to build a HA logical infrastructure.
Redis pub/sub mechanism is not reliable enough to implement a logical cluster, because unread messages will be lost (there is no queuing of items with pub/sub). To achieve HA of a collection of Redis instances, you can try Redis Sentinel, but it does not extend to your own software.
If you are ready to program in C, a HA framework which is often forgotten (but can be quite useful IMO) is the one coming with BerkeleyDB. It is quite basic but support off-the-shelf leader elections, and can be integrated in any environment. Documentation can be found here and here. Note: you do not have to store your data with BerkeleyDB to benefit from the HA mechanism (only the topology data - the same ones you would put in Zookeeper).

Forming a web application cluster with 3 VMs running in the same physical box

Are there any advantages what so ever to form a cluster if all the nodes are Virtual machines running inside the same physical host? Our small company just purchased a server with 16GB of Ram. I propose to just setup IIS on the box to handle outside requests, but our 'Network Engineer' argue that it will be better to create 3 VMs on the box and form a cluster with the VMs for load balancing. But since they are all in the same box, are there actual benefits for taking the VM approach rather than no VMs?
THanks.
No, as the overheads of running four operating systems would take a toll on performance, plus, I believe all modern web servers (plus IIS) are multithreaded so are optimised for performance anyway.
Maybe the Network Engineer knows something that you don't. Just ask. Use common sense to analyze the answer.
That said, running VMs always needs resources - but you might not notice. Doesn't make sense? Well, even if you attach the computer with a Gigabit link to the Internet, you still won't be able to process more data than the ISP gives you. If your uplink is 1MB/s, that's the best you can get. Any VM today is able to process that little trickle of data while being bored 99.999% of the time.
Running the servers in VMs does have other advantages, though. First of all, you can take them down individually for maintenance. If the load surges because your company is extremely successful, you can easily add more VMs on other physical boxes and move virtual servers around with a mouse click. If the main server dies, you can set up a replacement machine and migrate the VMs without having to reinstall everything.
I'd certainly question this decision myself as from a hardware perspective you obviously still have a single point of failure so there is no benefit.
From an application perspective it could be somewhat tenuously suggested that this would allow zero downtime deployments by taking VMs out of the "farm" one at a time but you won't get any additional application redundancy or performance from virtualisation in this instance. What you will get is considerably more management overhead in terms of infrastructure and deployment for little gain.
If there's a plan to deploy to a "proper" load balanced environment in the near future this might be a good starting point to ensure your application works correctly in a farm (sticky sessions etc). Although this makes your apparently live environment also a QA server, which is far from ideal.
from a performance perspective, 3 VMs on the same hardware is slower
from an availability perspective, 2 VMs will give higher availability (better protects from app software failures, OS failures, you can perform maintenance on one node while the other is up).

Distributed system design

In a distributed system, a certain node distributes 'X' units of work equally across 'N' nodes (via socket message passing).
As we increase the number of worker nodes, each nodes completes his job faster but we have to set-up more connections.
In a real situation, it would be similar to changing 10 nodes in a Hadoop-like system with each node processing 100GB by 1,000,000 nodes with each node processing 1MB.
What's the impact of setting up more connections in this case? Is this a big overhead in poll() function?
What's the best approach?
Sounds like you will need to consult Amdahl's Law.
At least it was how I computed how many machines on a high-speed switch were optimal for my parallel computations.
Does it have to use sockets and message passing between Supervisor and Worker?
You can use some type of queuing so avoid putting load onto the Supervisor. Or a distributed file system similar to HDFS to distribute the tasks and collect the results.
It also depends on the number of nodes you are planning to deploy the Workers on. 1,000,000 nodes is a very big number therefore in that case, you'll have to distribute the tasks into multiple queues.
The thing to be careful about is what will happen if all the nodes finish their tasks at the same time. It would be worth putting some variability into when they can request for a new task. ZooKeeper (http://hadoop.apache.org/zookeeper/) is potentially something you can also use to synchronise the jobs.
Can you measure your network cost? The time spent working on the worker machine should be only part of the cost of the message pass and receive.
Also can you describe the O notation for handling each worker result into the master result?
Does your master round robin expected responses?
btw -- if your worker nodes are finishing quicker but underutilizing the cpu resources you may be missing a design trade-off?
of course, you could be the rule or the exception to any law(argument/out of date research). ;-)

Correct way to take a snapshot

I've asked this question before, but I am still confused. What's the correct and quickest way to take a snapshot (I only use EBS-backed Unix and Windows machines, so that's all my interest right now). Some ideas:
Just take the snapshot... This seems to sometimes cause system corruption.
Stop the machine, take the snapshot and then start the machine. I guess this also means I need to wait for each individual task to complete, making scirpting a bit of a challenge?
Take a snapshot with the 'reboot machine' flag set. There is very little in the documentation to specify why a reboot is needed...
Hope you EC2 experts can help me.
If a bit of data loss is acceptable, just take the snapshot while the instance is running.
If it's not acceptable, write some script magic to ensure everything your application is working on is saved to disk, take the snapshot, then let your app resume work.
For what I do, I find it best to keep a separate EBS volume with my application and its data on it, and when I need a snapshot, I just stop the app for a moment, snapshot, and fire it back up. That way you don't have to worry about what the OS is doing during the snapshot, and it also comes with the added bonuses of being able to quickly move your app and its data to more powerful hardware, and having much smaller snapshots.
Quick answer - both operating systems have features for safely unmounting the drive. An unmounted drive can be snapshotted without fear of corruption.
Long answer
An EBS snapshot is point-in-time and differential (it does not "copy" your data per-se), so as long as your drive is in a consistent and restorable state when it begins, you'll avoid the corruption (since the snapshot is atomic).
As you have implied, whatever state the whole drive is in when it starts, that's what your snapshot image will be when it is restored (if you snapshot while you are half-way done writing a file, then, by-golly, that file will be half-written when you restore).
For both Linux and Windows, a consistent state can be this can be acheived by unmounting the drive. This will guarantee that your buffers are flushed to disk and no writes can occur. In both linux and windows there are commands to list which processes are using the drive; once you have stopped those processes or otherwise gotten them to stop marking the drive for use (different for each program/service), you can unmount. In windows, this is very easy by setting your drive as a "removable drive" and then using the "safely remove hardware" feature to unmount. In linux, you can unmount with the "umount" command.
There are other sneakier ways, but the above is pretty universal.
So assuming you get to a restorable state before you begin, you can resume using your drive as soon as the snapshot starts (you do not need to wait for the snapshot completes before you unlock (or remount) and resume use). At that point you can remount the volume.
How AWS snapshot works:
Your volume and the snapshot is just a set of pointers, when you take a snapshot, you just diverge any blocks that you write to from that point forward; they are effectively new blocks associated with the volume, and the old blocks at that logical place in the volume are left alone so that the snapshot stays the same logically.
This is also why subsequent snapshots will tend to go faster (they are differential).
http://harish11g.blogspot.com/2013/04/understanding-Amazon-Elastic-block-store-EBS-snapshots.html
Taking a snapshot often requires scheduled downtime.
Procedure:
I'd unmount the drive.
Start the snapshot and wait until it's done.
And then re-mount it.
Afaik, the only viable way for a consistent snapshot.
If you could share more about what kind of data is on the snapshot (e.g. a database?), then I could probably extend my answer.
I cannot comment on Windows-based instances since I'm not at all familiar with it but in order to avoid redundancy, check out this blog entry explains as it explains a lot:
http://shlomoswidler.com/2010/01/creating-consistent-snapshots-of-live.html
In a nutshell, they use the xfs file system and when they freeze it to create the snapshot, they allow updates to the filesystem to pass. According to the comments, it seems to work for most people.