Xcode shows up to 400% CPU Usage - but iPhone is only 2-core - objective-c

The following shows a screenshot of the Xcode CPU Report indicating that my application (while number crunching) is maxing-out one of the CPUs:
The above shows a maximum of 400%. However the iPhone has a 2-core CPU, so I am wondering why the gauge doesn't go to 200% instead?
Furthermore, by using concurrency and splitting my number crunching across multiple threads I can max out at 400%, however my algorithm only runs twice as fast. Again, indicating that the work is divided across 2 CPU Cores.
Does anyone know why Xcode shows 400% and how this relates to the physical hardware?

If you are testing in a simulator then it shows reports on the basis of your MAC's processor that's why it is showing 400% ( for a quad-core processor).

The iPhone has only 2 cores (although some iPads have more). The Mac running the simulator apparently has four, or two plus hyper threading.

Related

How do GPU/PPU timings work on the Gameboy?

I’m writing a gameboy emulator and I’ve come to implementing the graphics. However I can’t quite figure out how it works with the cpu as far as timing/clock cycles go. Does the CPU execute a certain amount of cycles (if so how many) and then hand it of to the GPU? Or is the gameboy always in a hblank/vblank state and the GPU uses the CPU in between them? I can’t find any information that helps me with this, only how to use the control registers.
This has been answered at https://forums.nesdev.com/viewtopic.php?f=20&t=17754&p=225009#p225009
It turns out I had it completely wrong and they are completely different.
Here is the post:
The Game Boy CPU and PPU run in parallel. The 4.2 MHz master clock is
also the dot clock. It's divided by 2 to form the PPU's 2.1 MHz memory
access clock, and divided by 4 to form a multi-phase 1.05 MHz clock
used by the CPU.
Each scanline is 456 dots (114 CPU cycles) long and consists of mode 2
(OAM search), mode 3 (active picture), and mode 0 (horizontal
blanking). Mode 2 is 80 dots long (2 for each OAM entry), mode 3 is
about 168 plus about 10 more for each sprite on a given line, and mode
0 is the rest. After 144 scanlines are drawn are 10 lines of mode 1
(vertical blanking), for a total of 154 lines or 70224 dots per
screen. The CPU can't see VRAM (writes are ignored and reads are $FF)
during mode 3, but it can during other modes. The CPU can't see OAM
during modes 2 and 3, but it can during blanking modes (0 and 1).
The link gives more of a general answer instead of implementation specifics, so I want to give my 2 cents.
CPU usually is the main part of your emulator and what actually counts cycles. Each time your CPU does something for any amount of cycles you pass that amount of cycles to other components of your emulator so that they can synchronize themselves.
For example, some CPU instructions read and write memory as part of a single instruction. That means is would take Gameboy CPU 4 (read) + 4 (write) cycles to complete the instruction. So in emulator you do the read, pass 4 cycles to GPU, do the write, pass 4 cycles to GPU. You do the same for other components that run parallel to the CPU like timers and sound.
It's actually important to do it that way instead of emulating whole instruction and then synchronizing everything else. Don't know about real ROMs but there're test ROMs that verify this exact behavior. 8 cycles is a long time and in the middle of multiple memory accesses some other Gameboy component might make a change.

affdex-sdk. How many frames can processed per second?

I used affdex android sdk, our device cpu is msm8994, android 6.0. but we used the sample apk framedectordemo, there only have 10 frames per second. we want to know the official data. How many frames can processed per second? from the affdex web sit, it said can reach to 20+ processed frames per second.
The stated minimum requirements are a Quad-core 1.5 GHz Cortex-A53, 1 GB of RAM and Android 4.4 or better. As the MSM8994 uses a big.LITTLE configuration of 2 quad-core and the fast one is a 2.0Gz Cortex-A57 you should be able to achieve a good performance. If you don't want to modify the sample apk, you can just download AffdexMe from Google Play: https://play.google.com/store/apps/details?id=com.affectiva.affdexme&hl=en and test the performance there. Remember that you can only track up to 6 detectors in AffdexMe, so if you are trying to benchmark the performance for a bigger amount of metrics the results are going to be different.

App crashes on launch with < 256 RAM iOS Devices

The Info
I recently launched an app on the AppStore. After testing on the simulator thousands of times, and actual devices hundreds of times we finally released our app.
The Problem
Reviews started popping up about app crashes when the user launches the app. We figured that the app crashes on launch on iOS devices with less than (or equal to) 256 Mb of RAM. The following devices are devices our app supports with less than 256:
iPod Touch 4G
iPhone 3GS
iPad 1
The app doesn't always crash. Sometimes it launches fine and runs smoothly. Other times it crashes. The time from launch (when the user taps the icon) to crash is usually two seconds, which would mean that the system isn't shutting it down.
Findings
When using Instruments to test on certain devices, I find the following:
There are no memory leaks (I'm using ARC), but there are memory warnings
Items are being allocated like crazy. There are so many allocated items, and even though I'm using ARC it's as if ARC isn't doing what it's supposed to be doing
Because of what I see as "over-allocation", the result is:
This app takes (on average) 60 MB of Real Memory and 166 MB of Virtual. When the app launches the memory being used quickly increases until it reaches about 60 MB at which point the view has been loaded.
Here is a snapshot of the Activity Monitor in Instruments:
I know that those figures are WAYY to high (although the CPU % never really gets up there). I am worried that ARC is not working properly, or the more likely case: I'm not allocating objects correctly. What could possibly be happening?
The Code and Warnings
In Xcode, there are only a few warnings, none of which pertain to the app launch or any files associated with the launching of the app. I placed breakpoints in both the App Delegate and my viewDidLoad method to check and see if the crash occurred there - it didn't.
More Background Info
Also, Xcode never generates any errors or messages in the debugger. There are also no crash reports in iTunes Connect, it just says, "Too few reports have been submitted for a report to be shown." I've added crash reporting to my app, but I haven't released that version.
A Few Questions
I started using Obj-C just as ARC arrived, so I'm new to dealing with memory, allocation, etc. (that is probably obvious) but I'd like to know a few things:
How can I use #autoreleasepool to reduce my memory impact? What do I do with memory warnings, what do I write in the didRecieveMemoryWarning since I'm using ARC?
Would removing NSLog statements help speed things up?
And the most important question:
Why does my app take up so much memory and how can I reduce my whopping 60 MB footprint?
I'd really appreciate any help! Thanks in advance!
EDIT: After testing on the iPhone 4 (A4), we noticed that the app doesn't crash when run whereas on devices with less than 256 MB of RAM it does.
I finally solved the issue. I spent a few hours pondering why my application could possibly take up more RAM than Angry Birds or Doodle Jump. That just didn't make sense, because my app does no CALayer Drawing, or complex Open GL Graphics Rendering, or heavy web connection.
I found this slideshow while searching for answers and slide 17 listed the ways to reduce memory footprint. One thing that stuck out was PNGCrush (Graphics Compression).
My app contains a lot of custom graphics (PNG files), but I hadn't thought of them affecting my app in any way, apparently images (when not optimized properly) severely increase an applications memory footprint.
After installing PNGCrush and using it on a particularly large image (3.2 MB) and then deleting a few unused images I ended up reducing my apps memory footprint from 60+ MB and severe lag to 35 MB and no lag. That took a whopping five minutes.
I haven't finished "crushing" all my images, but when I do I'll update everyone on the final memory footprint.
For all those interested, here is a link to a blog that explains how to install PNGCrush (it's rather complicated).
UPDATE: Instead of using the PNGCrush process (which is very helpful, although time consuming with lots of images) I now use a program called ImageOptim that provides a GUI for multiple scripts like PNGCrush. Heres a short description:
ImageOptim seamlessly integrates various optimisation tools: PNGOUT, AdvPNG, PNGCrush, extended OptiPNG, JpegOptim, jpegrescan, jpegtran, and Gifsicle.
Here's a link to the website with a free download for OS X 10.6 - 10.8. Note, I am not a developer, publisher or advertiser of this software.

CUDA - nvidia driver crash while running

I run a raytracer in CUDA with N Bounces (each ray will bounce N times).
I view the results using openGL.
once N is small (1~4) everything works great. once i make N big (~10) each thread (about 800x1000) has to do a lot of computing and this when the screen goes black, and than back on, with the note that my nvidia crash.
i searched online and think now that what cause it some sort of a watch-dog timer since i use the same graphic card for my display and my computing (computing takes more than 2 sec so the driver reset itself).
is there a command to make the host (cpu) WAIT for the device(gpu) for as long as it takes?
what do i need to do? im stuck :(
thanks
Based on your description, you are running on Windows Vista or Windows 7. Windows operating systems have a watchdog timer, as you guessed. The watchdog timer only applies to GPUs with displays attached.
The easiest solution is to run 2 or more GPUs, and run CUDA on GPU(s) without a display attached.
You can disable the watchdog timer. See this question for more details. However you should do so with care—remember that when you have a long running kernel on your primary display GPU you will make your computer completely unresponsive (at least you won't be able to see what it is doing) until the kernel completes.

How To Simulate Lower CPU Processor Machines For Browser Testing

We have some users which are using lower-CPU powered machines and they're encountering slow response times using our web application. Is there any way for me to do testing so that I can simulate lower CPU rates?
For example, I have 2.3 Ghz computing power, can I lower it to 1.6 Ghz or lower so that I may be able to test it?
BTW, our customers are using Windows. I have to simulate low computing power on Internet Explorer as browser.
Most new CPUs multiplier can easily be lowered (Intel: Speedstep, AMD: PowerNow!). This is used to save power. With RMclock you can manually adjust your multiplier and thus lower your frequency and make your pc slower. I use this tool myself so I can tell you that it works.
http://cpu.rightmark.org/products/rmclock.shtml
The virtual machine Bochs(pronounced boxes) allows you to set a instructions per second directive. It's probably the slowest emulator out there as it is though...
Create some virtual machines.
You can use VirtualPC or VirtualBox both are free.
I would recommend to start something on the background which eats up all your processor cycles.
A program which finds primenumbers or something similar.
Another slight option in addition to those above is to boot windows in a lower resource config. Go to the start menu,, select run and type MSCONFIG. You can go to the boot tab, click on advanced options and limit the memory and number of of processsors. It's not as robust as the above, but it does give you another option.
Lowering the CPU clock doesn't always give expected results.
Newer CPUs feature architecture improvements which make them more efficient on an equvialent clock basis than older chips. Incidentally, because of this virtual machines are a bad way of testing performance for "older" tech as well.
Your best bet is to simply buy a couple of older machines. Using similar RAM (types and amounts), processor, motherboard chipsets, hard drives, and video cards. All of which feed into the total performance of the machine itself.
I bring the other components up because changing just one of them can have an impact on even browser performance. A prime example is memory. If your clients are constrained to something like 512MB of RAM, the machines could be performing a lot of hard drive access for VM swaps, even for just running the browser. In this situation downgrading the clock speed on your processor while still retaining your 2GB (assuming) of RAM would still not perform anywhere near the same even if everything else was equal.
Isak Savo'sanswer works, but can be a bit finicky, as the modern tpl is going to try and limit cpu load as much as possible. When I tested it out, It was hard (though possible with some testing) to consistently get the types of cpu usages I wanted.
Then I remembered, http://www.cpukiller.com/, which does this already. Highly recommended. As an aside, I found this util from playing old 90s games on modern machines, back when frame rate was pegged to cpu clock time, making playing them on modern computers way too fast. Great utility.
Another big difference between high-performance and low-performance CPUs is the number of cores available. This can realistically differ by a factor of 4, way more than the difference in clock frequency you're likely to encounter.
You can solve this by setting the thread affinity. Even IE6 will use 13 threads just to show google.com. That means it will benefit from a multi-core CPU. But if you set the thread affinity to one core only, all 13 IE threads will have to share that one core.
I understand that this question is pretty old, but here are some receipts I personally use (not only for Web development):
BES. I'm getting some weird results while using it.
Go to Control Panel\All Control Panel Items\Power Options\Edit Plan Settings\Change Advanced Power Settings, then go to the "Processor" section and set it's maximum state to 5% (or something else). It works only if your processor supports dynamic multiplier change and ACPI driver is installed correctly.
Run Task Manager and set processor affinity to a single core (or whatever number of cores you want) for your browser's (or any other's) process. Not a best practice for browsers, because JavaScript implementations are usually single-threaded, but, as far as I see, modern browsers actually DO use multiple cores.
There are a few different methods to accomplish this.
If you're using VirtualBox, go into the Settings for the VM you want to slow the CPU speed for. Go to System > Processor, then set the Execution Cap. The percentage controls how slow it will go: lower values are slower relative to the regular speed. In practice, I've noticed the results to be choppy, although it does technically work.
It is also possible to set the CPU speed for the whole system. In the Windows 10 Settings app, go to System > Power & Sleep. Then click Additional Power Settings on the right hand side. Go to Change Plan Settings for the currently selected plan, then click Change Advanced Power Plan Settings. Scroll down to Processor Power Management and set the Maximum Processor State. Again, this is a percentage. Although this does work, I find that in practice, it doesn't have a big impact even when the percentage is set very low.
If you're dealing with a videogame that uses DirectX or OpenGL and doesn't have a framerate cap, another common method is to force Vsync on in your graphics driver settings. This will usually slow the rendering to about 60 FPS which may be enough to play at a reasonable rate. However, it will only work for applications using 3D hardware rendering specifically.
Finally: if you'd rather not use a VM, and don't want to change a system global setting, but would rather simulate an old CPU for one specific process only, then I have my own program to do that called Old CPU Simulator.
The main brain of the operation is a command line tool written in C++, but there is also a GUI wrapper written in C#. The GUI requires .NET Framework 4.0. The default settings should be fine in most cases - just select the CPU you'd like to simulate under Target Rate, then hit New and browse for the program you'd like to run.
https://github.com/tomysshadow/OldCPUSimulator (click the Releases tab on the right for binaries.)
The concept is to suspend and resume the process at a precise rate, and because it happens so quickly the process will appear to just be running slowly. For example, by suspending a process for 3 milliseconds, then resuming it for 1 millisecond, it will appear to be running at 25% speed. By controlling the ratio of time suspended vs. time resumed, it is possible to simulate different speeds. This is completely API agnostic (it doesn't hook DirectX, OpenGL, etc. it'll work with a command line program if you want.)
Old CPU Simulator does not ask for a percentage, but rather, the clock speed to simulate (which it calls the Target Rate.) It then automatically determines, based on your CPU's real clock speed, the percentage to use. Although clock speed is not the only factor that has improved computer performance over time (there are also SSDs, faster GPUs, more RAM, multithreaded performance, etc.) it's a good enough approximation to get fairly consistent results across machines given the same Target Rate. It also supports other options that may help with consistency, such as setting the process affinity to one.
It implements three different methods of suspending and resuming a process and will use the best available: NtSuspendProcess, NtQuerySystemInformation, or Toolhelp Snapshots. It also uses timeBeginPeriod and timeEndPeriod to achieve high precision timing without busy looping. Note that this is not an emulator; the binary still runs natively. If you like, you can view the source to see how it's implemented - it's not a large project. On my machine, Old CPU Simulator uses less than 1% CPU and less than 1 MB of memory, so the program itself is quite efficient (unlike running intensive programs to intentionally slow the CPU.)