Gem5 system requirements for decent performance - gem5

I have to work with gem5 for my project but was wondering that what hardware configuration i should buy. I owned a "good enough" laptop but sadly it's no longer working reliably, so i would have to stick to some lower end laptop. What minimum priced processor i should buy? Also AMD or Intel? Can't afford an apple laptop either.
Any help is deeply appreciated

To give you an idea, I have high end Lenovo P51 laptop with:
Intel Core i7-7820HQ Processor (8MB Cache, up to 3.90GHz) (4 cores 8 threads)
32GB(16+16) DDR4 2400MHz SODIMM
512GB SSD PCIe TLC OPAL2
Ubuntu 17.10
Then the build time for:
git checkout da79d6c6cde0fbe5473ce868c9be4771160a003b
CC=gcc-6 CXX=g++-6 scons -j"$(nproc)" --ignore-style build/ARM/gem5.opt
is 10 minutes which I consider reasonable.
And a minimalistic ARM Buildroot Linux kernel boot takes:
1 minute 40 seconds on the default simplified AtomicSimpleCPU
10 minutes on the much more realistic --cpu-type=HPI --caches
This laptop is likely more expensive than most Apple laptops however at 2500 dollars. But you are going to be developing professionally, it is a worthy investment.
For hobbyist use however, a midend 1200 dollar laptop should be good enough to get started I believe, considering that:
you won't build from scratch very often, mostly incrementally with scons
you can boot with a simple and fast CPU, make a checkpoint with m5 checkpoint before your benchmark, then restore the checkpoint with a more realistic and slower CPU model: How to switch CPU models in gem5 after restoring a checkpoint and then observe the difference?

Related

Hardware for Deep Learning

I have a couple questions on hardware for a Deep Learning project I'm starting, I intend to use pyTorch for Neural Networks.
I am thinking about going for an 8th Gen CPU on a z390 (I'll wait month to see if prices drop after 9th gen CPU's are available) so I still get a cheaper CPU that can be upgraded later.
Question 1) Are CPU cores going to be beneficial would getting the latest Intel chips be worth the extra cores, and if cores on CPU will be helpful, should I just go AMD?
I am also thinking about getting a 1080ti and then later on, once I'm more proficient adding two more 2080ti's, I would go for more but it's difficult to find a board to fit 4.
Question 2) Does mixing GPU's effect parallel processing, Should I just get a 2080ti now and then buy another 2 later. And a part b to this question do the lane speeds matter, should I spend more on a board that doesn't slow down the PCIe slots if you utilise more than one.
Question 3) More RAM? 32GB seems plenty. So 2x16gb sticks with a board that can has 4 slots up to 64gb.
The matter when running multi GPU is also the number of available PCIe lanes. If you may go for up to 4 GPUs, I'd go for AMD Threadrippers for the 64 PCIe lanes.
For machine learning in a general manner, core & thread count is quite important, so TR is still a good option, depending on the budget of course.
Few poeple mention that running an instance on each GPU may be more interesting, if you do so, mising GPUs is not a problem.
32GB of ram seems good, no need to go for 4 sticks if your CPU does not support quad channel indeed.

How to allocate more CPU and RAM to SUMO (Simulation of Urban Mobility)

I have downloaded and unzipped sumo-win64-0.32.0 and running sumo.exe this on a powerful machine (64GB ram, Xeon CPU E5-1650 v4 3.6GHz) for about 140k trips, 108k edges, and 25k vehicles types which are departed in the first 30 min of simulation. I have noticed that my CPU is utilized only 30% and Memory only 38%, Is there any way to increase the speed by forcing sumo to use more CPU and ram, or possibly run in parallel? From "Can SUMO be run in parallel (on multiple cores or computers)?
The simulation itself always runs on a single core."
it appears that parallel processing is not possible t, but what about dedicating more CPU and ram?
Windows usually shows the CPU utilization such that 100% means all cores are used, so 30% is probably already more than one core and there is no way of increasing that with a single threaded application as sumo. Also if your scenario fits within RAM completely there is no point of increasing that. You might want to try one of the several parallelization approaches SUMO has but none of them got further than some toy examples (and none is in the official distribution) and the speed improvements are sometimes only marginal. Probably the best you can do is to do some profiling and find the performance bottlenecks and/or send your results to the developers.

Computer restarts with large mini batches in TensorFlow

I am running TensorFlow for Windows with a Titan X GPU (12 GB memory). When I try to train a network for images of 256X256X1 with mini-batches larger than 50 images, my computer just crashes and restarts automatically. With smaller mini-batches it runs just fine.
Any clues on what might be causing this?
I've seen similar problems being discussed in some gaming forums, where the PC would just shut down when the GPU was under heavy load. The reason was usually that the GPU was drawing more power than the power supply unit could handle. Check e.g. here or here. So may be it's worth investigating whether your PSU is the culprit.
Edit: May be the program SpeedFan can help you debugging this - it is able to show both voltages and readings of temperature sensors, which would also tell you if your PC is overheating (I've never used the tool myself, and I'm not affiliated with it either, just found it online).

How much faster (approx) does Tensorflow run with a GPU?

I have a Mac, and consequently have been running Tensorflow without GPU support (because it's not official yet). However, there are some hacked together impls that I'm thinking of installing... that is if the performance gains are worth the trouble. How much faster (approximately) would Tensorflow run on a Macbook Pro with GPU support?
Thanks
as a rule of thumb somewhere between 10 and 20 times - I've found just running the standard examples.
To give you an idea of the speed difference, I ran some language modelling code (similar to the PTB example), with a fairly large data set, on 3 different machines with the following results:
Intel Xeon X5690 (CPU only): 1 day, 19 hours
Nvidia Grid K520 (on Amazon AWS): 17 hours
Nvidia Tesla K80: 4 hours

What PC hardware is needed to try out GPU programming?

I am interested to try out GPU programming. One thing not clear to me is, what hardware do I need? Is it right any PC with graphics card is good? I have very little knowledge of GPU programming, so the starting learning curve is best not steep. If I have to make a lot of hacks just in order to run some tutorial because my hardware is not good enough, I'd rather to buy a new hardware.
I have a retired PC (~10 year old) installed with Ubuntu Linux, I am not sure what graphics card it has, must be some old one.
I am also planning to buy a new sub-$500 desktop which to my casual research normally has AMD Radeon 7x or Nvidia GT 6x graphics card. I assume any new PC is good enough for the programming learning.
Anyway any suggestion is appreciated.
If you want to use CUDA, you'll need a GPU from NVidia, and their site explains the compute capabilities of their different products.
If you want to learn OpenCL, you can start right now with an OpenCL implementation that has a CPU back-end. The basics of writing OpenCL code targeting CPUs or GPUs is the same, and they differ mainly in performance tuning.
For GPU programming, any AMD or NVidia GPU made in the past several years will have some degree of OpenCL support, though there have been some new features introduced with newer generations that can't be easily emulated for older generations.
Intel's integrated GPUs in Ivy Bridge and later support OpenCL, but Intel only provides a GPU-capable OpenCL implementation for Windows, not Linux.
Also be aware that there is a huge difference between a mid-range and high-end GPU in terms of compute capabilities, especially where double-precision arithmetic is supported. Some low-end GPUs don't support double-precision at all, and mid-range GPUs often perform double-precision arithmetic 24 times slower than single-precision. When you want to do a lot of double-precision calculations, it's absolutely worth it to get a compute-oriented GPU (Radeon 7900 series or GeForce Titan and up).
If you want a low-end system with a non-trivial GPU power, you best bet at the moment is probably to get a system built around an AMD APU.