CasperJS - CPU Usage problems - phantomjs

Recently I've used CasperJS on my server for specific automation tasks.
And found a big problem - sometimes CPU jumps to over 30% usage.
I've played a lot with it and searched a lot for some information - and even found 2 different posts for this subject:
Ways to reduce CPU usage
and
How to reduce CPU usage
but unfortunately in both cases there are no relevant answers.
Important -
To reduce CPU usage I don't load images, and make all the tasks with the same Casper instance.
But still CPU usage seems to be high.
Perhaps anyone has ideas how to improve it?
Or at least which functions shouldn't I use in CasperJS?

Related

How to allocate more CPU and RAM to SUMO (Simulation of Urban Mobility)

I have downloaded and unzipped sumo-win64-0.32.0 and running sumo.exe this on a powerful machine (64GB ram, Xeon CPU E5-1650 v4 3.6GHz) for about 140k trips, 108k edges, and 25k vehicles types which are departed in the first 30 min of simulation. I have noticed that my CPU is utilized only 30% and Memory only 38%, Is there any way to increase the speed by forcing sumo to use more CPU and ram, or possibly run in parallel? From "Can SUMO be run in parallel (on multiple cores or computers)?
The simulation itself always runs on a single core."
it appears that parallel processing is not possible t, but what about dedicating more CPU and ram?
Windows usually shows the CPU utilization such that 100% means all cores are used, so 30% is probably already more than one core and there is no way of increasing that with a single threaded application as sumo. Also if your scenario fits within RAM completely there is no point of increasing that. You might want to try one of the several parallelization approaches SUMO has but none of them got further than some toy examples (and none is in the official distribution) and the speed improvements are sometimes only marginal. Probably the best you can do is to do some profiling and find the performance bottlenecks and/or send your results to the developers.

High CPU with ImageResizer DiskCache plugin

We are noticing occasional periods of high CPU on a web server that happens to use ImageResizer. Here are the surprising results of a trace performed with NewRelic's thread profiler during such a spike:
It would appear that the cleanup routine associated with ImageResizer's DiskCache plugin is responsible for a significant percentage of the high CPU consumption associated with this application. We have autoClean on, but otherwise we're configured to use the defaults, which I understand are optimal for most typical situations:
<diskCache autoClean="true" />
Armed with this information, is there anything I can do to relieve the CPU spikes? I'm open to disabling autoClean and setting up a simple nightly cleanup routine, but my understanding is that this plugin is built to be smart about how it uses resources. Has anyone experienced this and had any luck simply changing the default configuration?
This is an ASP.NET MVC application running on Windows Server 2008 R2 with ImageResizer.Plugins.DiskCache 3.4.3.
Sampling, or why the profiling is unhelpful
New Relic's thread profiler uses a technique called sampling - it does not instrument the calls - and therefore cannot know if CPU usage is actually occurring.
Looking at the provided screenshot, we can see that the backtrace of the cleanup thread (there is only ever one) is frequently found at the WaitHandle.WaitAny and WaitHandle.WaitOne calls. These methods are low-level synchronization constructs that do not spin or consume CPU resources, but rather efficiently return CPU time back to other threads, and resume on a signal.
Correct profilers should be able to detect idle or waiting threads and eliminate them from their statistical analysis. Because New Relic's profiler failed to do that, there is no useful way to interpret the data it's giving you.
If you have more than 7,000 files in /imagecache, here is one way to improve performance
By default, in V3, DiskCache uses 32 subfolders with 400 items per folder (1000 hard limit). Due to imperfect hash distribution, this means that you may start seeing cleanup occur at as few as 7,000 images, and you will start thrashing the disk at ~12,000 active cache files.
This is explained in the DiskCache documentation - see subfolders section.
I would suggest setting subfolders="8192" if you have a larger volume of images. A higher subfolder count increases overhead slightly, but also increases scalability.

Programe Execution Optimization

I am writing a Program for Parabolic Time Price Systems based on the book written by J.Welles Wilder Jr.
I am have way through the program, running with an execution time of 122 microsecs. This is way above the benchmark limit. What I was looking for is a few views and tips if I
write a kernel space program to achieve the same. Implementing it through drivers
[Really keen on this method] Is it possible, if yes then how and where I should start looking, passing instructions to a graphic driver to perform the steps and calculation (Read this in a blog somewhere).
Thanks in Advance.
--->Programming on c
What makes GPU very fast is the fact that it can run around 2000~ (depending on the card) threads asynchronously.
If you code can be divided into threads then it might improve your performance to do the calculations on the gpgpu since average CPU speed is 50-100 GFlops and average GPU speed is 1500~ when used correctly.
Also you might want to consider the difficulties of maintaining gpgpu code. I suggest you that if you have an NVidia GPU you should check out 'Managed CUDA' since it contains a debugger and a GPU profiler which makes it possible to work with.
TL;DR: use gpgpu only for async code and preferably use 'managed CUDA' if possible

DirectX - Stop rendering to lessen amount of resources used?

Recently I've been writing a bot for a game which uses a DirectX backend for its rendering. I have managed to 'hack' the game into allowing me to run multiple instances. Unfortunately, this has taken a serious toll on my computer's CPU/RAM usage. I would like to optimize & reduce the amount of resources each instance eats up. Thus, I have a couple of questions:
If I stop DirectX from rendering, will this increase performance?
How can I do so?
I have a few ideas about how to do this - I'm guessing I can just hook the rendering function and force it to return without doing anything. My question though - will doing so noticeably improve performance?
Any help would be greatly appreciated.
First thing: No, immediately returning from the render call will not reduce CPU load, because you're then busy-waiting. You have to manipulate your main loop such that it does a wait for input or for a small timeout each time. That gives time to the other instances. In a first attempt, you could just pack a Sleep(some small amount) in there. Of course, that reduces the FPS of each instance, but it will allow other applications to run more smoothly.
And secondly: No, if you reduce the frame rate, the RAM usage will not be less. To reduce the memory load, you'll need to find more sophisticated ways, like removing textures when you don't need them any more etc.

How To Simulate Lower CPU Processor Machines For Browser Testing

We have some users which are using lower-CPU powered machines and they're encountering slow response times using our web application. Is there any way for me to do testing so that I can simulate lower CPU rates?
For example, I have 2.3 Ghz computing power, can I lower it to 1.6 Ghz or lower so that I may be able to test it?
BTW, our customers are using Windows. I have to simulate low computing power on Internet Explorer as browser.
Most new CPUs multiplier can easily be lowered (Intel: Speedstep, AMD: PowerNow!). This is used to save power. With RMclock you can manually adjust your multiplier and thus lower your frequency and make your pc slower. I use this tool myself so I can tell you that it works.
http://cpu.rightmark.org/products/rmclock.shtml
The virtual machine Bochs(pronounced boxes) allows you to set a instructions per second directive. It's probably the slowest emulator out there as it is though...
Create some virtual machines.
You can use VirtualPC or VirtualBox both are free.
I would recommend to start something on the background which eats up all your processor cycles.
A program which finds primenumbers or something similar.
Another slight option in addition to those above is to boot windows in a lower resource config. Go to the start menu,, select run and type MSCONFIG. You can go to the boot tab, click on advanced options and limit the memory and number of of processsors. It's not as robust as the above, but it does give you another option.
Lowering the CPU clock doesn't always give expected results.
Newer CPUs feature architecture improvements which make them more efficient on an equvialent clock basis than older chips. Incidentally, because of this virtual machines are a bad way of testing performance for "older" tech as well.
Your best bet is to simply buy a couple of older machines. Using similar RAM (types and amounts), processor, motherboard chipsets, hard drives, and video cards. All of which feed into the total performance of the machine itself.
I bring the other components up because changing just one of them can have an impact on even browser performance. A prime example is memory. If your clients are constrained to something like 512MB of RAM, the machines could be performing a lot of hard drive access for VM swaps, even for just running the browser. In this situation downgrading the clock speed on your processor while still retaining your 2GB (assuming) of RAM would still not perform anywhere near the same even if everything else was equal.
Isak Savo'sanswer works, but can be a bit finicky, as the modern tpl is going to try and limit cpu load as much as possible. When I tested it out, It was hard (though possible with some testing) to consistently get the types of cpu usages I wanted.
Then I remembered, http://www.cpukiller.com/, which does this already. Highly recommended. As an aside, I found this util from playing old 90s games on modern machines, back when frame rate was pegged to cpu clock time, making playing them on modern computers way too fast. Great utility.
Another big difference between high-performance and low-performance CPUs is the number of cores available. This can realistically differ by a factor of 4, way more than the difference in clock frequency you're likely to encounter.
You can solve this by setting the thread affinity. Even IE6 will use 13 threads just to show google.com. That means it will benefit from a multi-core CPU. But if you set the thread affinity to one core only, all 13 IE threads will have to share that one core.
I understand that this question is pretty old, but here are some receipts I personally use (not only for Web development):
BES. I'm getting some weird results while using it.
Go to Control Panel\All Control Panel Items\Power Options\Edit Plan Settings\Change Advanced Power Settings, then go to the "Processor" section and set it's maximum state to 5% (or something else). It works only if your processor supports dynamic multiplier change and ACPI driver is installed correctly.
Run Task Manager and set processor affinity to a single core (or whatever number of cores you want) for your browser's (or any other's) process. Not a best practice for browsers, because JavaScript implementations are usually single-threaded, but, as far as I see, modern browsers actually DO use multiple cores.
There are a few different methods to accomplish this.
If you're using VirtualBox, go into the Settings for the VM you want to slow the CPU speed for. Go to System > Processor, then set the Execution Cap. The percentage controls how slow it will go: lower values are slower relative to the regular speed. In practice, I've noticed the results to be choppy, although it does technically work.
It is also possible to set the CPU speed for the whole system. In the Windows 10 Settings app, go to System > Power & Sleep. Then click Additional Power Settings on the right hand side. Go to Change Plan Settings for the currently selected plan, then click Change Advanced Power Plan Settings. Scroll down to Processor Power Management and set the Maximum Processor State. Again, this is a percentage. Although this does work, I find that in practice, it doesn't have a big impact even when the percentage is set very low.
If you're dealing with a videogame that uses DirectX or OpenGL and doesn't have a framerate cap, another common method is to force Vsync on in your graphics driver settings. This will usually slow the rendering to about 60 FPS which may be enough to play at a reasonable rate. However, it will only work for applications using 3D hardware rendering specifically.
Finally: if you'd rather not use a VM, and don't want to change a system global setting, but would rather simulate an old CPU for one specific process only, then I have my own program to do that called Old CPU Simulator.
The main brain of the operation is a command line tool written in C++, but there is also a GUI wrapper written in C#. The GUI requires .NET Framework 4.0. The default settings should be fine in most cases - just select the CPU you'd like to simulate under Target Rate, then hit New and browse for the program you'd like to run.
https://github.com/tomysshadow/OldCPUSimulator (click the Releases tab on the right for binaries.)
The concept is to suspend and resume the process at a precise rate, and because it happens so quickly the process will appear to just be running slowly. For example, by suspending a process for 3 milliseconds, then resuming it for 1 millisecond, it will appear to be running at 25% speed. By controlling the ratio of time suspended vs. time resumed, it is possible to simulate different speeds. This is completely API agnostic (it doesn't hook DirectX, OpenGL, etc. it'll work with a command line program if you want.)
Old CPU Simulator does not ask for a percentage, but rather, the clock speed to simulate (which it calls the Target Rate.) It then automatically determines, based on your CPU's real clock speed, the percentage to use. Although clock speed is not the only factor that has improved computer performance over time (there are also SSDs, faster GPUs, more RAM, multithreaded performance, etc.) it's a good enough approximation to get fairly consistent results across machines given the same Target Rate. It also supports other options that may help with consistency, such as setting the process affinity to one.
It implements three different methods of suspending and resuming a process and will use the best available: NtSuspendProcess, NtQuerySystemInformation, or Toolhelp Snapshots. It also uses timeBeginPeriod and timeEndPeriod to achieve high precision timing without busy looping. Note that this is not an emulator; the binary still runs natively. If you like, you can view the source to see how it's implemented - it's not a large project. On my machine, Old CPU Simulator uses less than 1% CPU and less than 1 MB of memory, so the program itself is quite efficient (unlike running intensive programs to intentionally slow the CPU.)