Which hardware to choose in Neo4j - optimization

I'm beginner in neo4j and I would like to store more than 500 millions nodes and more than 20 billions relationships.
Which hardware is the best to deal with all this data ?
Thanks a lot.
Maxime

Neo4j does not restrict users to use certain hardware specifications. However it recommends minimum specifications for RAM, CPU and disk. That are as follows:
RAM:
Must have at least 2 GB
Good to have around 16 GB
CPU:
Must have an Intel Core I3 processor
Good to have an Intel Core I7 processor
Disk:
Must have SATA drives with 15k RPM
Good to have SSDs
Also have a look on these as well Neo4j : Advices for hardware sizing and config and https://neo4j.com/developer/guide-sizing-and-hardware-calculator/

Just for general recommendations, the top two things to look for are plenty of memory and fast SSDs (especially for larger graphs).
Neo4j has a pagecache for caching node and relationship graph topography, and the more of this you can fit into the pagecache the better. We typically recommend between 8 to 31 GB heap in addition to the pagecache depending on the volume and kind of queries you expect to run.
SSDs aid in Neo4j's index-free adjacency structure, as this involves pointer chasing across the disk. This is mostly for when you can't fit all of the graph in pagecache, but this also aids in lookup of node and relationship properties.

Related

Designing a Computer for Spatially-Explicit Modeling in NetLogo

I have done various searches and have yet to find a forum or article that discusses how to approach building a modeling computer for use with NetLogo. I was hoping to start such a discussion, and since the memory usage of NetLogo is proportional to the size of the world and number of simulations run in parallel with BehaviorSpace, it seems reasonable that a formula exists relating sufficient hardware to NetLogo demands.
As an example, I am planning to run a metapopulation model in a landscape approximately 12km x 12km, corresponding to a NetLogo world of 12,000x12,000 at a pixel size of 1, for a 1-meter resolution (relevant for the animal's movement behavior). An earlier post described a large world (How to model a very large world in NetLogo?), and provided a discussion for potential ways to reduce needing large worlds (http://netlogo-users.18673.x6.nabble.com/Re-Rumors-of-Relogo-td4869241.html#a4869247). Another post described a world of 3147x5141 and was using a Linux computer with 64GB of RAM (http://netlogo-users.18673.x6.nabble.com/Java-OutofMemory-errors-for-large-NetLogo-worlds-suggestions-requested-td5002799.html). Clearly, the capability of computers to run large NetLogo worlds is becoming increasingly important.
Presumably, the "best" solution for researchers at universities with access to Windows-based machines would be to run 16GB to 64GB of RAM with a six- or eight-core processor such as the Intel Xeon capable of hyperthreading for running multiple simulations in parallel with BehaviorSpace. As an example, I used SELES (Fall & Fall 2001) on a machine with a 6-core Xeon processor with hyperthreading enabled and 8GB of RAM to run 12,000 replicates of a model with a 1-meter resolution raster map of 1580x1580. This used the computer to its full capacity and it took about a month to run the simulations.
So - if I were to run 12,000 replicates of a 12,000x12,000 world in NetLogo, what would be the "best" option for a computer? Without reaching for the latest and greatest processing power out there, I would presume the most fiscally-reasonable option to be a server board with dual Xeon processors (likely 8-core Ivy bridge) with 64GB of RAM. Would this be a sufficient design, or are there alternatives that are cheaper (or not) for modeling at this scale? And additionally, do there exist "guidelines" of processor/RAM combinations to cope with the increasing demand of NetLogo on memory as the size of worlds and the number of parallel simulations increase?

Expression Engine Apache and SSD

Recently I've been working on an expression engine project that has a performance problem. On a test with 50 concurrent connections
Extremely high (100%) CPU usage
Low RAM usage (2 gigs out of 8)
Low CPU/RAM usage on the database
And the web server has 4 CPUs. Now, if I turn on the cache, the utilization is lower, but the content is such that dynamic caching had to be taken off. Now the expression engine is made up of templates that have to be read into memory and parsed. For those not familiar with expression engine, it is built using CodeIgniter.
My thinking is this that if Apache and the expression engine files were taken off HDD and put onto an SSD, I/O for the templates, it would be a lot faster and would lower the CPU utilization by Apache. Would this kind of performance improvement actually happen or would an SSD make no difference?
SSD will always be faster then spinny turny disks where disk I/O is concerned, but it doesn't sound like that's where your bottleneck is.
You're not using RAM and as you correctly stated, the templates have to be parsed. You have 4 CPU's, but they may be from 1998 (we don't know). If they are more recent, it sounds like it should be more than enough for 50 concurrent connections, but you may be rendering the contents of the Library of Congress (again, we don't know).
You might get some benefit with tag caching or some of the other techniques mentioned in The Guide.
Also found this: http://eeinsider.com/articles/using-cache-wisely-with-expressionengine/

Which kinds of low level facilities aren't typically supported on multi-core machines?

I'm looking at some optimized, low level, cross platform, concurrency code designed to run on multi-core machines, and want to check some of its assumptions.
Support for hardware optimizations of some kinds aren't, probably, supported on multi core designs (for example, Out of Order Execution support [wikipedia] seems like a good candidate - it takes a lot of surface area to implement, and can be a power hog). Does anyone have a list of other such facilities - ones typically available on single or small number of core machines, but typically left out from machines with larger number of cores on them?
Today, multicore machines are warmed-over die shrinks of uniprocessors. You could almost imagine sawing a 4-core die into 4 1-core dice. I exaggerate only a little bit.
In future, multicore machines will be more thoughtfully designed for energy efficiency and area efficiency. You may see the same ISA, but with different mixes of resources (more or fewer numbers of duplicated functional units), and even with some sharing of resources between cores (e.g. AMD Bulldozer). And, as you say, backing off from the complexity and energy overhead of no-holds-barred out-of-order execution. This will most likely be perceived as different instruction-per-clock (IPC) differences (more or less performance) on the same instruction set architecture.
Also as vendors have to juggle a hypothetical portfolio of big out-of-order serial performance optimized cores and small in-order or less-out-of-order (OoO) and narrower, more energy efficient "throughput" cores, they will be challenged to keep these different implementations in sync with the evolutions of their ISAs. Some cores may support new instructions, new state, new coprocessors, virtualization, security, etc. earlier than others. This leads to a challenge of coding to the common denominator while also lighting up the new facilities for better perf or energy efficiency (or whatever) on those cores that have the new capabilities.
So to answer your specific question, all the traditional computer architecture techniques for trading gates for expressive-power, or performance, or energy efficiency may be rethought and selectively removed in future small throughput-oriented cores.
Hardware multithreading
Aggressive OoO -> humble OoO or even in-order execution
High degrees of microarchitectural speculation
Fancy branch predictors
Big TLBs
Fancy memory prefetchers
Deep pipelines
Wide issue / many copies of functional units
Big caches, wide buses to caches
...
But it goes both ways. It may also be that the new small throughput-optimized energy-optimized cores have new features not present in the older OoO cores. For example, the Larrabee New Instructions (LRBni) (http://www.drdobbs.com/high-performance-computing/216402188) were proposed for a machine with dozens of simpler cores. As another example, the small cores may turn to hardware multithreading to afford better memory latency tolerance to compensate for smaller private caches.
Also, having lots of small energy frugal cores means you may be willing to dedicate and therefore customize some of the cores to optimize performance for particular valuable workloads. For example, the Tensilica custom processors and tools anticipate that some of your small cores will have additional instructions and custom problem-specific datapaths (accelerating an inner loop of video decoding, for example). So in these cases the little core may (counter-intuitively) have much better performance than the much larger core.
Makes sense?
Happy hacking!

Hadoop cluster. 2 Fast, 4 Medium, 8 slower machines?

We're going to purchase some new hardware to use just for a Hadoop cluster and we're stuck on what we should purchase. Say we have a budget of $5k should we buy two super nice machines at $2500/each, four at around $1200/each or eight at around $600 each? Will hadoop work better with more slower machines or fewest much faster machines? Or, as like most things "it depends"? :-)
You're generally better off with Hadoop getting a few extra machines that are less beefy. You almost never see datanodes with more than 16GB ram and dual quad-core CPUs, and often they are smaller than that.
You always have to run one as the namenode (master), and generally you don't also run a datanode (worker/slave) on the same box, although you could since your cluster is small. Assuming you don't, though, getting 2 machines will leave you only 1 worker node, which somewhat defeats the purpose. (Not entirely, because you can still run 4-8 jobs in parallel on the slave, but still.)
At the same time, you don't want to have a cluster of 1000 486s. If your budget is $5k, I would strike a balance and do 4 $1200 machines. Those will provide a decent baseline in terms of individual performance, you'll have 3 datanodes to distribute work to, and you'll have room to grow your cluster if you need.
Things to keep in mind: you'll want to run multiple map or reduce tasks per datanode, and that means multiple JVMs running simultaneously. I would try to get at least 4GB, and preferably 8GB ram. CPU is less important as most MR jobs are IO bound. You could likely get a machine like this for your $1200 price target, so that's my vote.
In a nutshell, you want to max out the number of processor cores and disks. You can sacrifice reliability and quality, but don't get the cheapest hardware out there, as you will have too many reliability problems.
We went with Dell 2xCPU 4-core dell servers, so 8 cores per box. 16GB of memory per box, which is 2GB per core, a bit low as you need memory both for your tasks and for disk buffering. 5x500GB hard drives, and I wish we'd gone for terabyte or higher drives instead.
For drives, my opinion is to buy more cheap, slow, unreliable, high-capacity drives as opposed to more expensive, faster, smaller, reliable drives. If you're having problems with disk throughput, more memory will help with buffering.
This is probably a beefier configuration than you're looking at, but maxing out cores and drives versus buying more boxes is generally a good choice - less power costs, easier to administer, and faster for some operations.
More drives means more simultaneous disk throughput per core, so having as many drives as cores is a good thing. Benchmarking seems to indicate that RAID configurations are slower than JBOD configuration (just mounting the drives and having Hadoop spread load across them) and JBOD is also more reliable.
LAST! Be sure to get ECC memory. Hadoop pushes terabytes of data through memory, and some users have found that non-ECC memory configurations can occasionally introduce single bit errors in terabyte-sized datasets. Debugging these errors is a nightmare.
I recommend having a look at this presentation: http://www.cloudera.com/hadoop-training-thinking-at-scale
Here the various pro's and con's are described.
I think the answer also depends on Your expectations of the cluster grow and networking technology You are using. If you are ok with 1GB ethernet - then type of machines is less significant. In the same time - if you want 10GBit ethernet - you should opt to smaller number of better machines to reduce the cost of networking.
another reference : http://hadoopilluminated.com/hadoop_book/Hardware_Software.html
(disclaimer : I am a co-author of this free hadoop book)

What makes a modern commodity cluster?

Would would be the most cost effective way of implementing a terabyte distributed memory cache using commodity hardware these days? What would class as a piece of commodity hardware?
Commodity hardware is considered hardware that
Is off the shelf (nothing custom)
Is available in substantially similar version from many manufacturers.
There are many motherboards that can hold 8 or 16 GB of RAM. Fewer server motherboards can hold 32 and even 64GB.
But they fit the definition of commodity, therefore can be made into very large clusters for a very large sum of money.
Note, however, that in many access patterns a striped RAID HD array doesn't go much slower than a gigabit ethernet link - so a RAM cluster might not have significant improvement (except in latency) depending on how you're actually using it.
-Adam