Improve Rails loading time - ruby-on-rails-3

This is a bit of a follow-up from a previous question on improving rails console loading time.
The first great suggestion was to figure out which gems take too long.
Next answer, suggested using :require => nil and loading those gems later.
With some gems however, it's not entirely clear how to accomplish this without breaking things. Here's a list of our 'biggest offenders', I wonder if someone can suggest the best approach to loading them only when necessary?
require gon: 2.730000 (2.870059)
require omniauth-openid: 1.410000 (1.503858)
require cancan: 2.640000 (2.707467)
require fog: 2.730000 (2.846530)
require activeadmin: 3.650000 (3.923877)
and of course there are many more that take around 1 second or less, which also adds up... but at least removing the big ones will already improve things.
how can I selectively load gems later to make rails load faster?

While not a direct answer to your question, there are two things you might try:
First, have you tried the Falcon patches for 1.9.3? The patches include some pretty significant load time improvements.
If you're using RVM, you can do a quick-and-dirty install with
rvm install 1.9.3 --patch falcon -n falcon
Second, make sure you're setting GC tuning environment variables. Ruby, by default, allocates GC parameters that are appropriate for small scripts, but not for full Rails apps. Here are my settings, though you'd want to derive your own based on your application's needs:
% env | grep RUBY_
RUBY_HEAP_MIN_SLOTS=800000
RUBY_HEAP_FREE_MIN=100000
RUBY_HEAP_SLOTS_INCREMENT=300000
RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
RUBY_GC_MALLOC_LIMIT=79000000
And my results using ruby 1.9.3-p286:
Stock Stock+GC Falcon Falcon+GC
27.13 8.43 8.63 6.69
Stock 27.13 100.00% 31.07% 31.81% 24.66%
Stock+GC 8.43 321.83% 100.00% 102.37% 79.36%
Falcon 8.63 314.37% 97.68% 100.00% 77.52%
Falcon+GC 6.69 405.53% 126.01% 129.00% 100.00%
Setting the GC tuning parameters has the biggest improvement, but we can get yet another ~26% improvement performance by using the falcon patches. The combination of the falcon patches plus the GC parameters results in over a 75% reduction in boot time.

Related

Is there a reliable way to get __name__ in Manim when using multiprocessing

I am working on a video that has a simulation of ideal gas (points) or realistic gas (circles) going on in the background for parts of it. In order to make processing go faster with a useful number of "molecules" for the realistic gas, I've been using the multiprocessing python package. This requires checks for "__name__ = '__main__'" in regular python to avoid bad multiprocessing conditions. In manim 0.16 versions, I was able to use "__name__ = 'py'", but this broke in the upgrade to 0.17 versions. I eventually fixed it by doing a quick check for whatever __name__ is now by having the main process print("__name__ is now ", __name__), which now outputs "path.to.working.dir.main".
Is there an easy way to find out what to use from within the program without having to check and fix with every update?
In manim 0.16 versions, checking for
__name__ = "py"
was good enough. This changed in manim 0.17 versions and it took me a while to figure out why my animation was crashing. I finally resolved it by inserting a
print("__name__ is now ", __name__)
near the top of the code to find out what to use now. However, this seems like an ineffective approach in the long run (will I have to go through this in version 0.18?) and I'd like to find a better way, more reliable, more automated way.

GraphDB Free 8.5 re-inferring uses a single core?

I am trying to load several large biomedical ontologies into a GraphDB Owl Horst optimized repository, along with 10s of millions of triples using terms from those ontologies. I can load these data into a RDFS+ optimized repo in less than 1 hour, but I can't even load one of the ontologies (chebi_lite) if I let it go overnight. That's using loadrdf on a 64 core, 256 GB AWS server.
My earlier question Can GraphDB load 10 million statements with OWL reasoning? lead to the suggestion that I use the preload command, followed by re-inferring. The preload indeed went very quickly, but when it comes to re-inferring, only one core is used. I haven't been able to let it go for more than an hour yet. Is re-inferring using just one core a consequence of using the free version? Might loadrdf work better if I just did a better configuration?
When I use loadrdf, all cores go close to 100%, but the memory usage never goes over 10% or so. I have tinkered with the JVM memory settings a little, but I don't really know what I'm doing. For example
-Xmx80g -Dpool.buffer.size=2000000

Why is karate-gatling slow compared to JMeter

I have followed the example at karate-gatling-demo for creating a load test. For my use-case I converted a JMeter test to karate. After making sure everything works, I compared the two. In the time that it took karate-gatling to get to even 300 requests, JMeter had already made a few thousands. I thought it might have been the pause in the demo but even after I removed it, the speed of the tests make them unusable. I would really like to implement this as we are already making strides to use normal karate tests as part of our CI process. Is there a reason they are so slow?
(I am using karate-gatling version 0.8.0.RC4)
To provide some info related to the two testing situations...
JMeter: 50 threads/users with 30 second ramp up and 50 loops
Karate-Gatling: repeat scenario 50 times, ramp to 50 users over 30 seconds
Because this is still in the early stages of development. This feedback helps. If possible can you try 0.8.0.RC3 and see if that makes a difference, the test syntax needs a slight change which you should be able to figure out from the version history. There was a fundamental change in the async model which probably has some issues.
Ideally I would love someone who knows Gatling internals to help but this will take a little time to evolve with me looking at it.
EDIT: Gatling support was released in 0.8.0 (final) and multiple teams have reported that it is working well for them.

Using multiple threads/cores for awk performance improvement

I have a directory with ~50k files. Each file has ~700000 lines. I have written an awk program to read each line and print only if there is an error. Everything is running perfectly fine, but the time taken is huge - ~4 days!!!! Is there a way to reduce this time? Can we use multiple cores (processes)? Did anyone try this before?
awk and gawk will not fix this for your by themselves. There is no magic "make it parallel" switch. You will need to rewrite to some degree:
shard by file - the simplest way to fix this is to run multiple awks' in parallel, one per file. You will need some sort of dispatch mechanism. Parallelize Bash script with maximum number of processes shows how you can write this yourself in shell. It will take more reading, but if you want more features check out gearman or celery which should be adaptable to your problem
better hardware - it sounds like your probably need a faster CPU to make this go faster, but it could also be an I/O issue. Having graphs of CPU and I/O from munin or some other monitoring system would help isolate which is the bottleneck in this case. Have you tried running this job on an SSD based system? That is often an easy win these days.
caching - there are probably some amount of duplicate lines or files. If there are enough duplicates it would be helpful to cache the processing in some way. If you calculate the CRC/md5sum for a file and store it in a database you could calculate the md5sum for a new file and skip processing if you've already done so.
complete rewrite - scaling this with awk is going to get ridiculous at some point. Using some map-reduce framework might be a good idea.

Scaling CakePHP Version 2.3.0

I'm beginning a new project using CakePHP. I like the "auto-magic" features, I think its a good fit for the project. I'm wondering about the potential to scale CakePHP to several million IP hits a day. and hundreds of thousands of database writes and reads a day. Also about 50,000 to 500,000 users, often with 3000 concurrently using the site. I'm making use of heavy stored procedures to offset this, and I'm accessing several servers including a load balancer.
I'm wondering about the computational time of some of the auto-magic and how well Cake is able to assist with session requests making many db hits. Has anyone has had success with cake running from a single server array setup with this level of traffic? I'm not using the cloud or a distributed database (yet). I'm really worried about potential bottlenecks with using this framework. I'm interested in advice from anyone who has worked with Cake in production. I've reseached, but I would love a second opinion. Thank you for your time.
This is not a problem but optimization is up to you.
There are different cache methods available you can implement, memcache, redis, full page caching... All of that is supported by cacke already. What you cache and where is up to you.
For searching you could try elastic search to speedup things
There are before dispatcher filters to by pass controller instantiation (you might want to do that in special cases, check the asset filter for example)
Use nginx not apache
Also I would not start with over optimizing and over-thinking this before any code is written, start well, think about caching but when you start to come across bottleneck analyse and fix them. Otherwise you'll waste a lot of time with over optimization before you even have written anything that works.
Cake itself is very fast. Just to proof the bullshit factor of these fancy benchmarks some frameworks do we did one using a dispatcher filter to "optimize" it and even beat Yii who seems to be pretty eager to show how fast it is, but benchmarks are pointless, specially in a huge project where so many human made fail can be introduced.