Does quad-core perform substantially better than a dual-core for web development? - hardware

First, I could not ask this on most hardware forums, because they are mostly populated by
gamers. Additionally, it is difficult to get an opinion from sysadmins, because they have a fairly different perspective as well.
So perhaps, amongst developers, I might be able to deduce a realistic trend.
What I want to know is, if I regularly fire up netbeans/eclipse, mysql workbench, 3 to 5 browsers with multi-tabs, along with apache-php / mysql running in the background, perhaps gimp/adobe photoshop from time to time, does the quad core perform considerably faster than a dual core? provided the assumption is that the quad has a slower i.e. clockspeed ~2.8 vs a 3.2 dual-core ?
My only relevant experience is with the old core 2 duo 2.8 Ghz running on 4 Gig ram performed considerably slower than my new Core i5 quad core on 2.8 Ghz (desktops). It is only one sample data, so I can't see if it hold true for everyone.
The end purpose of all this is to help me decide on buying a new laptop ( 4 cores vs 2 cores have quite a difference, currently ).

http://www.intel.com/content/www/us/en/processor-comparison/comparison-chart.html
I did a comparison for you as a fact.
Here Quad core is 2.20 GHz where dual core is 2.3 GHz.
Now check out this comparison and see the "Max Turbo Frequency". You will notice that even though quad core has less GHz but when it hit turbo it passes the dual core.
Second thing to consider is Cache size. Which does make a huge difference. Quad core will always have more Cache. In this example it has 6MB but some has up to 8MB.
Third is, Max memory bandwidth, Quad core has 25.6 vs dual core 21.3 means more faster speed in quad core.
Fourth important factor is graphics. Graphics Base Frequency is 650MHz in quad and 500MHz in dual.
Fifth, Graphics Max Dynamic Frequency is 1.30 for quad and 1.10 for dual.
Bottom line is if you can afford it quad not only gives you more power punch but also allow you to add more memory later. As max memory size with Quad is 16GB and dual restricts you to 8GB. Just to be future proof I will go with Quad.
One more thing to add is simultaneous thread processing is 4 in dual core and 8 in quad, which does make a difference.

The problem with multi-processors/multi-core processors has been and still is memory bandwidth. Most applications in daily use have not been written to economize on memory bandwidth. This means that for typical, everyday use you'll run out of bandwidth when your apps are doing something (i e not waiting for user input).
Some applications - such as games and parts of operating systems - attempt to address this. Their parallellism loads a chunk of data into a core, spends some time processing it - without accessing memory further - and finally writes the modified data back to memory. During the processing itself the memory bus is free and other cores can load and store data.
In a well-designed, parallel code essentially any number of cores can be working on different parts of the same task so long as the total amount of processing - number of cores * processing time - is less than or equal to the total time doing memory work - number of cores * (read time + write time).
A code designed and balanced for a specific number of cores will be efficient for fewer but not for more cores.
Some processors have multiple data buses to increase the overall memory bandwidth. This works up to a certain point after which the next-higher memory - the L3 cache- becomes the bottleneck.

Even if they were equivalent speeds, the quad core is executing twice as many instructions per cycle as the duo core. 0.4 Mhz isn't going to make a huge difference.

Related

Optaplanner - multithreading

I am using optaplanner 8.17.FINAL with Java 17.0.2 inside a kubernetes cluster, my server has 32 cores + hyper threading. My app scales to 14 pods and I use moveThreadCount = 4 . On a single run, everything works fine, but on a parallel run, the speed of the optaplanner drops. With 7 launches, the drop is insignificant, 5-10%. But with 14 launches, the speed drop is about 50%. Of course, you can say that there are not enough physical cores, but I'm not sure that hyperthreading works like that. In resource monitoring, I see that 60 logical cores are involved with 14 launches, but why then do the speed drop twice?
I'm tried to inscrease heap size and change garbage collector (G1GC, SerialGC, ParallelGC), but it has little effect
I am not an expert on hyperthreading by any means but perhaps OptaPlanner, by
fully utilizing the entire core(s), cannot benefit from HT so much. If so, you just don't have enough CPU cores to run so many solvers in parallel, which leads to context switching and performance drop, as a result.
You can prove that by adding more cores. If it helps, it means there is no artificial bottleneck for this amount of tasks.

Why does higher core count lead to higher CPI?

I'm looking at a chart that shows that, in reality, increasing the core count on a CPI usually results in higher CPI for most instructions, as well as that it usually increases the total amount of instructions the program executes. Why is this happening?
From my understanding CPI should increase only when increasing the clock frequency, so the CPI increase doesn't make much sense to me.
What chart? What factors are they holding constant while increasing core count? Perhaps total transistor budget, so each core has to be simpler to have more cores?
Making a single core larger has diminishing returns, but building more cores has linear returns for embarrassingly parallel problems; hence Xeon Phi having lots of simple cores, and GPUs being very simple pipelines.
But CPUs that also care about single-thread performance / latency (instead of just throughput) will push into those diminishing returns and build wider cores. Many problems that we run on CPUs are not trivial to parallelize, so lots of weak cores is worse than fewer faster cores. For a given problem size, the more threads you have the more of its total time the thread will be communicating with other threads (and maybe waiting for data from them).
If you do keep each core identical when adding more cores, their CPI generally stays the same when running the same code. e.g. SPECint_rate scales nearly linearly with number of cores for current Intel/AMD CPUs (which do scale up by adding more of the same cores).
So that must not be what your chart is talking about. You'll need to clarify the question if you want a more specific answer.
You don't get perfectly linear scaling because cores do compete with each other for memory bandwidth, and space in the shared last-level cache. (Although most current designs scale up the size of last-level cache with the number of cores. e.g. AMD Zen has clusters of 4 cores sharing 8MiB of L3 that's private to those cores. Intel uses a large shared L3 that has a slice of L3 with each core, so the L3 per core is about the same.)
But more cores also means a more complex interconnect to wire them all together and to the memory controllers. Intel many-core Xeon CPUs notably have worse single-thread bandwidth than quad-core "client" chips of the same microarchitecture, even though the cores are the same in both. Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?

How to allocate more CPU and RAM to SUMO (Simulation of Urban Mobility)

I have downloaded and unzipped sumo-win64-0.32.0 and running sumo.exe this on a powerful machine (64GB ram, Xeon CPU E5-1650 v4 3.6GHz) for about 140k trips, 108k edges, and 25k vehicles types which are departed in the first 30 min of simulation. I have noticed that my CPU is utilized only 30% and Memory only 38%, Is there any way to increase the speed by forcing sumo to use more CPU and ram, or possibly run in parallel? From "Can SUMO be run in parallel (on multiple cores or computers)?
The simulation itself always runs on a single core."
it appears that parallel processing is not possible t, but what about dedicating more CPU and ram?
Windows usually shows the CPU utilization such that 100% means all cores are used, so 30% is probably already more than one core and there is no way of increasing that with a single threaded application as sumo. Also if your scenario fits within RAM completely there is no point of increasing that. You might want to try one of the several parallelization approaches SUMO has but none of them got further than some toy examples (and none is in the official distribution) and the speed improvements are sometimes only marginal. Probably the best you can do is to do some profiling and find the performance bottlenecks and/or send your results to the developers.

Designing a Computer for Spatially-Explicit Modeling in NetLogo

I have done various searches and have yet to find a forum or article that discusses how to approach building a modeling computer for use with NetLogo. I was hoping to start such a discussion, and since the memory usage of NetLogo is proportional to the size of the world and number of simulations run in parallel with BehaviorSpace, it seems reasonable that a formula exists relating sufficient hardware to NetLogo demands.
As an example, I am planning to run a metapopulation model in a landscape approximately 12km x 12km, corresponding to a NetLogo world of 12,000x12,000 at a pixel size of 1, for a 1-meter resolution (relevant for the animal's movement behavior). An earlier post described a large world (How to model a very large world in NetLogo?), and provided a discussion for potential ways to reduce needing large worlds (http://netlogo-users.18673.x6.nabble.com/Re-Rumors-of-Relogo-td4869241.html#a4869247). Another post described a world of 3147x5141 and was using a Linux computer with 64GB of RAM (http://netlogo-users.18673.x6.nabble.com/Java-OutofMemory-errors-for-large-NetLogo-worlds-suggestions-requested-td5002799.html). Clearly, the capability of computers to run large NetLogo worlds is becoming increasingly important.
Presumably, the "best" solution for researchers at universities with access to Windows-based machines would be to run 16GB to 64GB of RAM with a six- or eight-core processor such as the Intel Xeon capable of hyperthreading for running multiple simulations in parallel with BehaviorSpace. As an example, I used SELES (Fall & Fall 2001) on a machine with a 6-core Xeon processor with hyperthreading enabled and 8GB of RAM to run 12,000 replicates of a model with a 1-meter resolution raster map of 1580x1580. This used the computer to its full capacity and it took about a month to run the simulations.
So - if I were to run 12,000 replicates of a 12,000x12,000 world in NetLogo, what would be the "best" option for a computer? Without reaching for the latest and greatest processing power out there, I would presume the most fiscally-reasonable option to be a server board with dual Xeon processors (likely 8-core Ivy bridge) with 64GB of RAM. Would this be a sufficient design, or are there alternatives that are cheaper (or not) for modeling at this scale? And additionally, do there exist "guidelines" of processor/RAM combinations to cope with the increasing demand of NetLogo on memory as the size of worlds and the number of parallel simulations increase?

How much is 1/8th of a core?

I'm new to cloud computing and, for the life of me, I can't figure out how "much" 1/8th of a core is in practical terms.
I know what kind of CPUs Amazon EC2 are using for m1.small, but let's say (for education purposes) that it is a single-core 1GHz CPU.
How is 1/8th of core calculated? Does it mean my application will run at 128MB RAM and 1/1GHz of CPU? Or will my application be able to run only a certain number of operations/CPU cycles before I'll be charged for an addition app-cell?
What I need is a practical explanation of the phrase. Perhaps, on an a simple vert.x HTTP server, where each successful connection calculates 2 + 3? Vert.x uses less than 128MB of RAM.
Afaik, you don't have a limit on the number of cycles: if you application requires many CPU cycles it will probably run slower since it would only use 1/8 of core.
Regarding the memory, if you are just using 1 app cell but your app requires more than 128MB, then it will probably result in an OUT OF MEMORY exception.
slicing of the server to 8th isn't as mathematic as you expect. Sharing server resource with multiple tenant allows to better use CPU globaly, compared to a classic server, so even you path inly 1/8 of the server you actually get more resources, but only when you application actually use them.