Azure SQL Managed Instance Gen4 and Gen5 hardware choices - azure-sql-managed-instance

Azure SQL Database Managed Instance can be created on two different hardware generations Gen5 and Gen4 with the following differences:
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-managed-instance-resource-limits#hardware-generation-characteristics
Are there any guidelines in what scenarios should be choose Gen4 or Gen5?

Gen 5 is better for some workloads while Gen 4 is better for the others. However, in the most cases, the primary choice should be Gen5, unless if bigger core/memory ratio or the difference between physical/logical cores make big difference.
Gen 5 has network acceleration, so in most cases it should provide better IO bandwidth to remote storage on General Purpose than Gen 4, which might be the most biggest bottleneck in your workload.
Gen 5 is a newer hardware configuration than Gen 4, hence the Gen5 processors are Intel Haswell instead of Intel Broadwell. However, Gen5 uses hyperthreading and a vCore on Gen 5 is a logical processor - this might make some difference, but you would need to try and test. The vCore is the same price on both HW gens.
Gen 5 uses faster local SSD disks (fast NVMe SSD) than Gen 4, so in Business Critical case there should be an advantage for Gen 5. In both cases tempdb is placed on local SSD both in General Purpose and Business Critical, so workload that are dependent on tempdb would run faster.
Gen 4 has bigger memory/core ratio than Gen5 - 7 on Gen4 vs 5.1 on Gen5
Gen4 has only 8-24 cores range with proportional memory 56-178GB, while Gen5 can go up to 80 cores. Also, new configurations such as SKU will less than 8 cores will probably be available only on Gen5 hardware.

Gen 4 is no longer available for new purchases.
Note that with GEN 5 General Purpose you have to buy 2 cores at a minimum, Gen 4 you could buy 1 core. Price per core has not changed, so your total pricing has doubled.
The same with Business Critical. With Gen 4 the minimum cores are 2, while Gen 5 the minimum cores are 4. Again, this is a doubling of costs. This is particularly shocking if you wanted to go from General Purpose to Business Critical because the core costs are already about double.
The other killer now in Business Critical in Gen 5 hardware is that the max # of databases has STAYED at 50. They double your costs and keep you at 50 databases! There is no reason that Business Critical can't start at 2 cores like it does for Gen 4....

Related

Why does higher core count lead to higher CPI?

I'm looking at a chart that shows that, in reality, increasing the core count on a CPI usually results in higher CPI for most instructions, as well as that it usually increases the total amount of instructions the program executes. Why is this happening?
From my understanding CPI should increase only when increasing the clock frequency, so the CPI increase doesn't make much sense to me.
What chart? What factors are they holding constant while increasing core count? Perhaps total transistor budget, so each core has to be simpler to have more cores?
Making a single core larger has diminishing returns, but building more cores has linear returns for embarrassingly parallel problems; hence Xeon Phi having lots of simple cores, and GPUs being very simple pipelines.
But CPUs that also care about single-thread performance / latency (instead of just throughput) will push into those diminishing returns and build wider cores. Many problems that we run on CPUs are not trivial to parallelize, so lots of weak cores is worse than fewer faster cores. For a given problem size, the more threads you have the more of its total time the thread will be communicating with other threads (and maybe waiting for data from them).
If you do keep each core identical when adding more cores, their CPI generally stays the same when running the same code. e.g. SPECint_rate scales nearly linearly with number of cores for current Intel/AMD CPUs (which do scale up by adding more of the same cores).
So that must not be what your chart is talking about. You'll need to clarify the question if you want a more specific answer.
You don't get perfectly linear scaling because cores do compete with each other for memory bandwidth, and space in the shared last-level cache. (Although most current designs scale up the size of last-level cache with the number of cores. e.g. AMD Zen has clusters of 4 cores sharing 8MiB of L3 that's private to those cores. Intel uses a large shared L3 that has a slice of L3 with each core, so the L3 per core is about the same.)
But more cores also means a more complex interconnect to wire them all together and to the memory controllers. Intel many-core Xeon CPUs notably have worse single-thread bandwidth than quad-core "client" chips of the same microarchitecture, even though the cores are the same in both. Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?

Hardware for Deep Learning

I have a couple questions on hardware for a Deep Learning project I'm starting, I intend to use pyTorch for Neural Networks.
I am thinking about going for an 8th Gen CPU on a z390 (I'll wait month to see if prices drop after 9th gen CPU's are available) so I still get a cheaper CPU that can be upgraded later.
Question 1) Are CPU cores going to be beneficial would getting the latest Intel chips be worth the extra cores, and if cores on CPU will be helpful, should I just go AMD?
I am also thinking about getting a 1080ti and then later on, once I'm more proficient adding two more 2080ti's, I would go for more but it's difficult to find a board to fit 4.
Question 2) Does mixing GPU's effect parallel processing, Should I just get a 2080ti now and then buy another 2 later. And a part b to this question do the lane speeds matter, should I spend more on a board that doesn't slow down the PCIe slots if you utilise more than one.
Question 3) More RAM? 32GB seems plenty. So 2x16gb sticks with a board that can has 4 slots up to 64gb.
The matter when running multi GPU is also the number of available PCIe lanes. If you may go for up to 4 GPUs, I'd go for AMD Threadrippers for the 64 PCIe lanes.
For machine learning in a general manner, core & thread count is quite important, so TR is still a good option, depending on the budget of course.
Few poeple mention that running an instance on each GPU may be more interesting, if you do so, mising GPUs is not a problem.
32GB of ram seems good, no need to go for 4 sticks if your CPU does not support quad channel indeed.

Designing a Computer for Spatially-Explicit Modeling in NetLogo

I have done various searches and have yet to find a forum or article that discusses how to approach building a modeling computer for use with NetLogo. I was hoping to start such a discussion, and since the memory usage of NetLogo is proportional to the size of the world and number of simulations run in parallel with BehaviorSpace, it seems reasonable that a formula exists relating sufficient hardware to NetLogo demands.
As an example, I am planning to run a metapopulation model in a landscape approximately 12km x 12km, corresponding to a NetLogo world of 12,000x12,000 at a pixel size of 1, for a 1-meter resolution (relevant for the animal's movement behavior). An earlier post described a large world (How to model a very large world in NetLogo?), and provided a discussion for potential ways to reduce needing large worlds (http://netlogo-users.18673.x6.nabble.com/Re-Rumors-of-Relogo-td4869241.html#a4869247). Another post described a world of 3147x5141 and was using a Linux computer with 64GB of RAM (http://netlogo-users.18673.x6.nabble.com/Java-OutofMemory-errors-for-large-NetLogo-worlds-suggestions-requested-td5002799.html). Clearly, the capability of computers to run large NetLogo worlds is becoming increasingly important.
Presumably, the "best" solution for researchers at universities with access to Windows-based machines would be to run 16GB to 64GB of RAM with a six- or eight-core processor such as the Intel Xeon capable of hyperthreading for running multiple simulations in parallel with BehaviorSpace. As an example, I used SELES (Fall & Fall 2001) on a machine with a 6-core Xeon processor with hyperthreading enabled and 8GB of RAM to run 12,000 replicates of a model with a 1-meter resolution raster map of 1580x1580. This used the computer to its full capacity and it took about a month to run the simulations.
So - if I were to run 12,000 replicates of a 12,000x12,000 world in NetLogo, what would be the "best" option for a computer? Without reaching for the latest and greatest processing power out there, I would presume the most fiscally-reasonable option to be a server board with dual Xeon processors (likely 8-core Ivy bridge) with 64GB of RAM. Would this be a sufficient design, or are there alternatives that are cheaper (or not) for modeling at this scale? And additionally, do there exist "guidelines" of processor/RAM combinations to cope with the increasing demand of NetLogo on memory as the size of worlds and the number of parallel simulations increase?

Does quad-core perform substantially better than a dual-core for web development?

First, I could not ask this on most hardware forums, because they are mostly populated by
gamers. Additionally, it is difficult to get an opinion from sysadmins, because they have a fairly different perspective as well.
So perhaps, amongst developers, I might be able to deduce a realistic trend.
What I want to know is, if I regularly fire up netbeans/eclipse, mysql workbench, 3 to 5 browsers with multi-tabs, along with apache-php / mysql running in the background, perhaps gimp/adobe photoshop from time to time, does the quad core perform considerably faster than a dual core? provided the assumption is that the quad has a slower i.e. clockspeed ~2.8 vs a 3.2 dual-core ?
My only relevant experience is with the old core 2 duo 2.8 Ghz running on 4 Gig ram performed considerably slower than my new Core i5 quad core on 2.8 Ghz (desktops). It is only one sample data, so I can't see if it hold true for everyone.
The end purpose of all this is to help me decide on buying a new laptop ( 4 cores vs 2 cores have quite a difference, currently ).
http://www.intel.com/content/www/us/en/processor-comparison/comparison-chart.html
I did a comparison for you as a fact.
Here Quad core is 2.20 GHz where dual core is 2.3 GHz.
Now check out this comparison and see the "Max Turbo Frequency". You will notice that even though quad core has less GHz but when it hit turbo it passes the dual core.
Second thing to consider is Cache size. Which does make a huge difference. Quad core will always have more Cache. In this example it has 6MB but some has up to 8MB.
Third is, Max memory bandwidth, Quad core has 25.6 vs dual core 21.3 means more faster speed in quad core.
Fourth important factor is graphics. Graphics Base Frequency is 650MHz in quad and 500MHz in dual.
Fifth, Graphics Max Dynamic Frequency is 1.30 for quad and 1.10 for dual.
Bottom line is if you can afford it quad not only gives you more power punch but also allow you to add more memory later. As max memory size with Quad is 16GB and dual restricts you to 8GB. Just to be future proof I will go with Quad.
One more thing to add is simultaneous thread processing is 4 in dual core and 8 in quad, which does make a difference.
The problem with multi-processors/multi-core processors has been and still is memory bandwidth. Most applications in daily use have not been written to economize on memory bandwidth. This means that for typical, everyday use you'll run out of bandwidth when your apps are doing something (i e not waiting for user input).
Some applications - such as games and parts of operating systems - attempt to address this. Their parallellism loads a chunk of data into a core, spends some time processing it - without accessing memory further - and finally writes the modified data back to memory. During the processing itself the memory bus is free and other cores can load and store data.
In a well-designed, parallel code essentially any number of cores can be working on different parts of the same task so long as the total amount of processing - number of cores * processing time - is less than or equal to the total time doing memory work - number of cores * (read time + write time).
A code designed and balanced for a specific number of cores will be efficient for fewer but not for more cores.
Some processors have multiple data buses to increase the overall memory bandwidth. This works up to a certain point after which the next-higher memory - the L3 cache- becomes the bottleneck.
Even if they were equivalent speeds, the quad core is executing twice as many instructions per cycle as the duo core. 0.4 Mhz isn't going to make a huge difference.

What makes a modern commodity cluster?

Would would be the most cost effective way of implementing a terabyte distributed memory cache using commodity hardware these days? What would class as a piece of commodity hardware?
Commodity hardware is considered hardware that
Is off the shelf (nothing custom)
Is available in substantially similar version from many manufacturers.
There are many motherboards that can hold 8 or 16 GB of RAM. Fewer server motherboards can hold 32 and even 64GB.
But they fit the definition of commodity, therefore can be made into very large clusters for a very large sum of money.
Note, however, that in many access patterns a striped RAID HD array doesn't go much slower than a gigabit ethernet link - so a RAM cluster might not have significant improvement (except in latency) depending on how you're actually using it.
-Adam