Virtualization of Hyper-threaded cores - virtual-machine

I'm looking for some guidance before I spend tons of time reorganizing a legacy program. If I have cores that are part of a virtual cluster. I have a computation that is broken into many parts and distributed to each member of the cluster. If each cores is hyper-threaded which of the following is most efficient:
2 virtual machines, one for each logical core. Half the computation is sent to each
1 virtual machine, where the OS handles the use of the logical cores.
1 virtual machine, where OpenMP is used to create 2 threds to split the computation.
My gut feeling is option 2, because a hyper-threaded core isn't a true core and option 3 requires additional overhead of starting threads and communicating data while one thread is idle. Any insight is greatly appreciated. Thanks.

You can get some idea from this post Intel Core i5 And Core i7: Intel’s Mainstream Magnum Opus

Related

Is it possible to use system memory instead of GPU memory for processing Dask tasks

We have been running DASK clusters on Kubernetes for some time. Up to now, we have been using CPUs for processing and, of course, system memory for storing our Dataframe of around 1,5 TB (per DASK cluster, split onto 960 workers). Now we want to update our algorithm to take advantage of GPUs. But it seems like the available memory on GPUs is not going to be enough for our needs, it will be a limiting factor(with our current setup, we are using more than 1GB of memory per virtual core).
I was wondering if it is possible to use GPUs (thinking about NVDIA, AMD cards with PCIe connections and their own VRAMS, not integrated GPUs that use system memory) for processing and system memory (not GPU memory/VRAM) for storing DASK Dataframes. I mean, is it technically possible? Have you ever tried something like this? Can I schedule a kubernetes pod such that it uses GPU cores and system memory together?
Another thing is, even if it was possible to allocate the system RAM as VRAM of GPU, is there a limitation to the size of this allocatable system RAM?
Note 1. I know that using system RAM with GPU (if it was possible) will create an unnecessary traffic through PCIe bus, and will result in a degraded performance, but I would still need to test this configuration with real data.
Note 2. GPUs are fast because they have many simple cores to perform simple tasks at the same time/in parallel. If an individual GPU core is not superior to an individual CPU core then may be I am chasing the wrong dream? I am already running dask workers on kubernetes which already have access to hundreds of CPU cores. In the end, having a huge number of workers with a part of my data won't mean better performance (increased shuffling). No use infinitely increasing the number of cores.
Note 3. We are mostly manipulating python objects and doing math calculations using calls to .so libraries implemented in C++.
Edit1: DASK-CUDA library seems to support spilling from GPU memory to host memory but spilling is not what I am after.
Edit2: It is very frustrating that most of the components needed to utilize GPUs on Kubernetes are still experimental/beta.
Dask-CUDA: This library is experimental...
NVIDIA device plugin: The NVIDIA device plugin is still considered beta and...
Kubernetes: Kubernetes includes experimental support for managing AMD and NVIDIA GPUs...
I don't think this is possible directly as of today, but it's useful to mention why and reply to some of the points you've raised:
Yes, dask-cuda is what comes to mind first when I think of your use-case. The docs do say it's experimental, but from what I gather, the team has plans to continue to support and improve it. :)
Next, dask-cuda's spilling mechanism was designed that way for a reason -- while doing GPU compute, your biggest bottleneck is data-transfer (as you have also noted), so we want to keep as much data on GPU-memory as possible by design.
I'd encourage you to open a topic on Dask's Discourse forum, where we can reach out to some NVIDIA developers who can help confirm. :)
A sidenote, there are some ongoing discussion around improving how Dask manages GPU resources. That's in its early stages, but we may see cool new features in the coming months!

H/W requirements for Confluent single node

What are the hardware requirements for Single node Confluent installation?
I checked their official site, but it has specifications for multi-node: https://docs.confluent.io/platform/current/installation/system-requirements.html
Unclear what all you're trying to run. Zookeeper and Kafka alone have been made to run with limited resources on a Raspberry Pi and definitely can run on any modern laptop or computer.
If you're running a single node, it's not considered "production grade", so with at least 5 services (ZK, broker, Schema Registry, REST proxy, ksqlDB) at 2 GB max heap each, that'd require 10 GB RAM + overhead for the OS, so call it 16 GB of memory to be conservative
If you also want to (reasonably) run Control Center, it's suggested to have 6GB for that, increasing your memory requirements up to at least 24 GB on a single node if you want to include calculation for your own Kafka client applications
Of course, you can opt out of certain services and tune each JVM to how you want...
As far as disk space goes, really depends on how much data you plan on having. 500 GB would be a good starting point, but a single disk wouldn't be fault tolerant

Virtual Enviroment CPU Allocation

I am currently attempting to spec out a virtual environment and I am having a hard time understanding how many cores or "cpu's" I can apply to virtual machines.
Can someone let me know how many usable cores I have in the attached image spec?
In other words, how many cores can I assign to VMs before I hit my limit, or run into issue with performance.
Server spec
(2) xeon silver 4214 2.2 12c per Server
4 servers total. Based on this I should have 192 virtual cores that I can allocate? Or am I wrong??
you have 48 logical processors on 1 server with listed CPU. Now think this way - you might have other VMs that will consume some amount of resources like CPU and RAM. If you will assign lets say 16 cpus to a VM, will other hosts in your cluster (I assume you clustered all 4 hosts) be able to handle the load of other VMs+ this with 16cpus?
You should check VMs usage on idle and with some load so then you could do some calculation of how many cpus each VM should have before you gonna experience major performance issues.

The mechanics behind the mapping of redis instances to separate CPU cores

It's documented that separate redis instances map to separate CPU cores. If I have 8 redis instances running on a Debian/Ubuntu machine with 8 cores, all of them would map to a core each.
1) What happens if I scale this machine down to 4 cores?
2) Do the changes happen automatically (by default), or is some explicit configuration involved?
3) Is there any way to control the behavior? If so, to what extent?
Would love to understand the technicals behind this, and an illustrative example is most welcome. I run an app hosted in the cloud which uses redis as a back-end. Scaling up (and down) the machine's CPU cores is one of the things I have to do, but I'd like to know what I'm first getting into.
Thanks in advance!
There is no magic. Since redis is single-threaded, a single instance of redis will only occupy a single core at once. Running multiple instances creates the possibility that more than one of them will be executing at once, on different cores (if you have them). How this is done is left entirely up to the operating system. redis itself doesn't do anything to "map" instances to specific cores.
In practice, it's possible that running 8 instances on 8 cores might give you something that looks like a direct mapping of instances to cores, since a smart OS will spread processes across cores (to maximize available resources), and should show some preference for running a process on the same core that it recently vacated (to make best use of cache). But at best, this is only true for the simple case of a 1:1 mapping, with no other processes on the system, all processes equally loaded, no influence from network drivers, etc.
In the general case, all you can say is that the OS will decide how to give CPU time to all of the instances that you run, and it will probably do a pretty good job, because the scheduling parts of the OS were written by people who know what they're doing.
Redis is a (mostly) single-threaded process, which means that an instance of the server will use a single CPU core.
The server process is mapped to a core by the operating system - that's one of the main tasks that an OS is in charge of. To reiterate, assigning resources, including CPU, is an OS decision and a very complex one at that (i.e. try reading the code of the kernel's scheduler ;)).
If I have 8 redis instances running on a Debian/Ubuntu machine with 8 cores, all of them would map to a core each.
Perhaps, that's up to the OS' discretion. There is no guarantee that every instance will get a unique core, and it is possible that one core may be used by several instances.
1) What happens if I scale this machine down to 4 cores?
Scaling down like this means a restart. Once the Redis servers are restarted, the OS will assign them with the available cores.
2) Do the changes happen automatically (by default), or is some explicit configuration involved?
There are no changes involved - every process, Redis or not, gets a core. Cores are shared between processes, with the OS orchestrating the entire thing.
3) Is there any way to control the behavior? If so, to what extent?
Yes, most operating systems provide interfaces for controlling the allocation of resources. Specifically, the taskset Linux command can be used to set or get a process's CPU affinity.
Note: you should leave CPU affinity setting to the OS - it is supposed to be quite good at that. Instead, make sure that you provision your server correctly for the load.

Hardware requirements for a Virtual Server

We have decided to go with a virtualization solution for a few of our development servers. I have an idea of what the hardware specs would be like if we bought separate physical servers, but I have no idea how to consolidate that information into the specification for a generalized virtual server.
I know intuitively that the specs are not additive - I shouldn't just add up all the RAM requirements from each machine to get the RAM required for the virtual server. I can't really treat them as parallel systems either because no matter how good the virtualization software is, it can't abstract away two servers trying to peg the CPU at the same time.
So my question is - is there a standard method to estimating the hardware requirements for a virtualized system given hardware requirement estimations for the underlying virtual machines? Is there a +C constant for VMWare/MS Virtual Server overhead (and if so, what is C?)?
P.S. I promise to move this over to serverfault once it goes into beta (Promise kept)
Yes add 25% additional resources to manage the VM. So if I need 4 servers that are equal to single core 2 ghz machines with 2 gigs of ram I will need 10 ghz processing power plus 10 gigs of ram. This will allow all systems to redline and still be ok.
In the real world this will never happen though, all your servers will not always be running all the time. You can get a feel for usage by profiling your current servers and determine their exact requirements and then adding an additional 25% in resources.
Check out this software for profiling utilization http://confluence.atlassian.com/display/JIRA/Profiling+Memory+and+CPU+usage+with+YourKit
The requirements are in fact additive. You should add up the memory requirements for each VM, and the disk requirements, and have at least one processor core per VM. Then add on whatever you need for the host system.
VMs can share a CPU, to some extent, if you have really low performance requirements, but they cannot share disk space or memory.
Answers above are far too high, second (1 core per VM) is closer. You can either 1) plan ahead and probably over-purchase 2) add just-in-time. Do you have some reason that you must know well ahead (yearly budget? your chosen host platform doesn't cluster hosts, so you can't add later?)
Unless you have an incredible simple usage profile, it will be hard to predict before and you'll over purchase. The answer above (+25%) would be several times more than you need for an modern server virtualization software (VMware, Zen, etc) that manages resources smartly. It's accurate only for desktop products like VPC. I chose to rough it out on a napkin and profile my first environment (set of machines) on the host. I'm happy.
Examples of things that will confound your estimation
Disk space, Some systems (Lab
Manager) use only the difference in
space from the base template. 10
deployed machines with 10 GB drives
using about 10 GB (template) + 200MB.
Disk space: You'll then find you
don't like the deltas in specific
scenarios.
CPU / Memory: This is dev
shop - so you'll have erratic load.
Smart hosts don't reserve memory and CPU.
CPU / Memory: But then you'll
want to do perf testing, and want to
reserve CPU cycles (not all hosts can
do that)
We all virtualize for different reasons. Many of the guests
in our environment don't have much work. We want them there to see how something behaves with a cluster of 3 servers of type X. Or, we have a bundle of weird client desktops waiting around, being used one at time by a tester. They rarely consume many host resources.
So, if you are using something like that doesn't do delta disks, disk space might be somewhat calculable. If lab manager (delta disk), disk space is really hard to predict.
Memory and processor usage: You'll have to profile or over-purchase heavily. I have many more guest CPUs than host CPUS, and don't have perf problems - but that's because of the choppy usage in our QA environments.