Improving the efficiency of Kinect for Windows DTWGestureRecognizer Application - kinect

Currently I am using the DTWGestureRecognizer open source tool for Kinect SDK v1.5. I have recorded a few gestures and use them to navigate through Windows 7. I also have implemented voice control for simple things such as opening PowerPoint, Chrome, etc.
My main issue is that the application uses quite a bit of my CPU power which causes it to become slow. During gestures and voice commands, the CPU usage sometimes spikes to 80-90%, which causes the application to be unresponsive for a few seconds. I am running it on a 64 bit Windows 7 machine with an i5 processor and 8 GB of RAM. I was wondering if anyone with any experience using this tool or Kinect in general has made it more efficient and less performance hogging.
Right now I removed sections which display the RGB video and the Depth video but even doing that did not make a big impact. Any help is appreciated, thanks!

Some of the factors I can think of are
Reduce the resolution.
Reduce the frames being recorded/processed by the application using polling model i.e. OpenNextFrame(int millisecondsWait) method of DepthStream, ColorStream & SkeletonStream
instead of event model.
Tracking mode is Default instead of Seated(sensor.SkeletonStream.TrackingMode =
SkeletonTrackingMode.Default) as seated consumes more resources.
Use sensor.MapDepthFrameToColorFrame instead of calling sensor.MapDepthToColorImagePoint method in a loop.
Last and most imp. is the algorithm used in the open source tool.

Related

HoloLens external rendering

Does soneone of you have a good solution for external rendering for Microsoft HoloLens Apps? Specified: Is it possible to let my laptop render an amount of 3D objects that is too much for the HoloLens GPU and then display them with the HoloLens by wifi including the spatial mapping and interaction?
It's possible to render remotely both directly from the unity editor and from a built application.
While neither achieves your goal of a "good solution" they both allow very intensive applications to at least run at all.
This walks you through how to add it to an app you're building.
https://learn.microsoft.com/en-us/windows/mixed-reality/add-holographic-remoting
This is for running directly from the editor:
https://blogs.unity3d.com/2018/05/30/create-enhanced-3d-visuals-with-holographic-emulation-in-uwp/
I don't think this is possible since, you can't really access the OS or the processor at all on the HoloLens. Even if you do manage to send the data to a 3rd party to process, the data will still need to be run back through the HoloLens which is really just the same as before.
You may find a way to perhaps hook up a VR backpack to it but even then, I highly doubt it would be possible.
If you are having trouble rendering 3D objects, then you should reduce the number of triangles, get a lower resolution shader on it, or reduce the size of the object. The biggest factor in processing 3D objects on the HoloLens is how much space is being drawn on the lens. If your object takes up 25% of the view instead of 100% it will be easier to process on the HoloLens.
Also if you can't avoid a lot of objects in the scene maybe check out LOD, which reduces the resolution of objects based off of distance to it and vice versa.

Cocos2dx performance issue on Windows Phone 8

I'm trying to port an android/iOS game to windows phone 8(cocos2dx v 2.2). I'm using the exact same code base that I've used for android and iOS. The game functions just fine, but I facing some major FPS drop. The game runs flawlessly at 60FPS in android and iOS, but I'm getting roughly about 35FPS on wp8. Has this got to do anything with differences in OpenGL and directX?
I doubt its got to do with the game's logic and calculations because when the game starts in windows phone, it starts with 60FPS on the main menu, which has got like 5 sprites. But as I add more sprites on the screen, say about 30 of them(average number of sprites when I'm IN the game) the FPS rapidly drops to 35-40 range. Note that there are no schedulers or update functions running at this point. I did the same test on Android, but the FPS didn't drop. Does the win8 port of cocos2dx suck?
Any help,comments or redirection to useful articles would be appreciated.
Thank you.
In case anyone runs into similar issue, I reduced the number of children in the scene and deployed the build in release mode. Gave a major boost to the FPS. Also, I had a bunch of float to string and int to string conversions happening in every frame inside the update function. That was eating away on the processing speed too.
Actually, the Cocos2dx port for WP8 is ok, but outdated. Cocos2d-x is now at 3.0 beta, but the WP8 was left at 2.0 alpha.
Anyway... in Cocos there are some recursive drawing functions which are very heavy on the CPU, and also, keep in mind that even though WP8 is supposed tu support arrays, lists, maps etc. they are very slow on WP8.
And since you came to this subject, Please let me know if you managed to successfully put cocos2d-x on an XAML+D3D Interop project. I am getting tons of crashes.
EDIT: Indeed, the recursive calls which process (draw or update) child "CCNode"s are very heavy on the device. However, after putting Cocos2d-x ver. 2.0alpha for WP8 into a XAML+D3D interop project, I found a whole lot of memory related issues. Apparently, after doing this (or just because I don't know how to properly configure my VS project and allow loose addressing), a lot of uninitialized pointers and data cause some memory overlaps, leading to major crashes.
This proves only that it was truely an alpha release :) Too bad no newer version of Cocos2d-x for Wp8 is available.

Is QML worth using for Embedded systems which run around 600MHz and no GPU?

Im planning to build a Embedded system which is almost like an organizer i.e. which handles contacts, games, applications & wifi/2G/3G for internet. I planned to build the UI with QML because of its easy to use and quick application building nature. And to have a linux kernel.
But after reading these articles:
http://qt-project.org/forums/viewthread/5820 &
http://en.roolz.org/Blog/Entries/2010/10/29_Qt_QML_on_embedded_devices.html
I am depressed and reconsidering my idea of using QML!
My hardware will be with these configurations : Processor around 600MHz, RAM 128MB and no GPU.
Please give comments on this and suggest me some alternatives for this.
Thanks in Advance.
inblueswithu
I have created a QML application for Nokia E63 which has 369MHz processor, 128MB RAM. I don't think it has a GPU. The application is a Stop Watch application. I have animated button click events like jumping (jumping balls). The animations are really smooth even when two button jumps at the same time. A 600MHz processor is expected to handle QML easily.
This is the link for the sis file http://store.ovi.com/content/184985. If you have a Nokia mobile you can test it.
May be you should consider building QML elements by hand instead of doing it from Photoshop or Gimp. For example using Item in the place of Rectangle will be optimal. So you can give it a try. May be by creating a rough sketch with good amount of animations to check whether you processor can handle that. Even if it don't work as expected then consider Qt to build your UI.
QML applications work fine on low-specs Nokia devices. I have made one for 5800 XPressMusic smartphone without any problems.

Cocoa IOSurfaces and synchronization with background task pulling frames via Quicktime

I've a question regarding IOSurface on Cocoa.
After an extensive research required to switch my OPENGL realtime application to 64 bit, I've taken the only path to support Quicktime playback spawning a background thread that pulls the frames installing a frame-ready callback and then with QTVisualContextCopyImageForTime , and pass the IOSurfaceRef through RPC to the parent process.
Everything works fine but there's one main issue. In my 32 bit application I was able to serialize any call to the GL subsystem by rendering a frame, pull the QT frames for the next pass and then wait for the next V-sync. This produced a very smooth and stable result.
Using the IOSurface technique gives me no way to synchronize when my app draws a frame and when the background process pulls the IOSurface from the quicktime movie. The result is that, on a random basis, I experience performances SPYKES. Indeed using the OPENGL driver monitor raises the CPU WAIT cycles up to 10% in my 64 bit app, while I have 0% CPU Wait graph under 32 bit.
Anyone here used IOSurface in a real world application and faced issues like this one ? I've though about an interprocess mutex/lock , but considering I need to lock/unlock about 120 times x second, I was not able to find a valid solution, it doesn't seems that darwin has something like the NAMED SIGNALS available in Win32...
Any suggestion, or I should take a totally different approach to the problem ?
Thanks !

How To Simulate Lower CPU Processor Machines For Browser Testing

We have some users which are using lower-CPU powered machines and they're encountering slow response times using our web application. Is there any way for me to do testing so that I can simulate lower CPU rates?
For example, I have 2.3 Ghz computing power, can I lower it to 1.6 Ghz or lower so that I may be able to test it?
BTW, our customers are using Windows. I have to simulate low computing power on Internet Explorer as browser.
Most new CPUs multiplier can easily be lowered (Intel: Speedstep, AMD: PowerNow!). This is used to save power. With RMclock you can manually adjust your multiplier and thus lower your frequency and make your pc slower. I use this tool myself so I can tell you that it works.
http://cpu.rightmark.org/products/rmclock.shtml
The virtual machine Bochs(pronounced boxes) allows you to set a instructions per second directive. It's probably the slowest emulator out there as it is though...
Create some virtual machines.
You can use VirtualPC or VirtualBox both are free.
I would recommend to start something on the background which eats up all your processor cycles.
A program which finds primenumbers or something similar.
Another slight option in addition to those above is to boot windows in a lower resource config. Go to the start menu,, select run and type MSCONFIG. You can go to the boot tab, click on advanced options and limit the memory and number of of processsors. It's not as robust as the above, but it does give you another option.
Lowering the CPU clock doesn't always give expected results.
Newer CPUs feature architecture improvements which make them more efficient on an equvialent clock basis than older chips. Incidentally, because of this virtual machines are a bad way of testing performance for "older" tech as well.
Your best bet is to simply buy a couple of older machines. Using similar RAM (types and amounts), processor, motherboard chipsets, hard drives, and video cards. All of which feed into the total performance of the machine itself.
I bring the other components up because changing just one of them can have an impact on even browser performance. A prime example is memory. If your clients are constrained to something like 512MB of RAM, the machines could be performing a lot of hard drive access for VM swaps, even for just running the browser. In this situation downgrading the clock speed on your processor while still retaining your 2GB (assuming) of RAM would still not perform anywhere near the same even if everything else was equal.
Isak Savo'sanswer works, but can be a bit finicky, as the modern tpl is going to try and limit cpu load as much as possible. When I tested it out, It was hard (though possible with some testing) to consistently get the types of cpu usages I wanted.
Then I remembered, http://www.cpukiller.com/, which does this already. Highly recommended. As an aside, I found this util from playing old 90s games on modern machines, back when frame rate was pegged to cpu clock time, making playing them on modern computers way too fast. Great utility.
Another big difference between high-performance and low-performance CPUs is the number of cores available. This can realistically differ by a factor of 4, way more than the difference in clock frequency you're likely to encounter.
You can solve this by setting the thread affinity. Even IE6 will use 13 threads just to show google.com. That means it will benefit from a multi-core CPU. But if you set the thread affinity to one core only, all 13 IE threads will have to share that one core.
I understand that this question is pretty old, but here are some receipts I personally use (not only for Web development):
BES. I'm getting some weird results while using it.
Go to Control Panel\All Control Panel Items\Power Options\Edit Plan Settings\Change Advanced Power Settings, then go to the "Processor" section and set it's maximum state to 5% (or something else). It works only if your processor supports dynamic multiplier change and ACPI driver is installed correctly.
Run Task Manager and set processor affinity to a single core (or whatever number of cores you want) for your browser's (or any other's) process. Not a best practice for browsers, because JavaScript implementations are usually single-threaded, but, as far as I see, modern browsers actually DO use multiple cores.
There are a few different methods to accomplish this.
If you're using VirtualBox, go into the Settings for the VM you want to slow the CPU speed for. Go to System > Processor, then set the Execution Cap. The percentage controls how slow it will go: lower values are slower relative to the regular speed. In practice, I've noticed the results to be choppy, although it does technically work.
It is also possible to set the CPU speed for the whole system. In the Windows 10 Settings app, go to System > Power & Sleep. Then click Additional Power Settings on the right hand side. Go to Change Plan Settings for the currently selected plan, then click Change Advanced Power Plan Settings. Scroll down to Processor Power Management and set the Maximum Processor State. Again, this is a percentage. Although this does work, I find that in practice, it doesn't have a big impact even when the percentage is set very low.
If you're dealing with a videogame that uses DirectX or OpenGL and doesn't have a framerate cap, another common method is to force Vsync on in your graphics driver settings. This will usually slow the rendering to about 60 FPS which may be enough to play at a reasonable rate. However, it will only work for applications using 3D hardware rendering specifically.
Finally: if you'd rather not use a VM, and don't want to change a system global setting, but would rather simulate an old CPU for one specific process only, then I have my own program to do that called Old CPU Simulator.
The main brain of the operation is a command line tool written in C++, but there is also a GUI wrapper written in C#. The GUI requires .NET Framework 4.0. The default settings should be fine in most cases - just select the CPU you'd like to simulate under Target Rate, then hit New and browse for the program you'd like to run.
https://github.com/tomysshadow/OldCPUSimulator (click the Releases tab on the right for binaries.)
The concept is to suspend and resume the process at a precise rate, and because it happens so quickly the process will appear to just be running slowly. For example, by suspending a process for 3 milliseconds, then resuming it for 1 millisecond, it will appear to be running at 25% speed. By controlling the ratio of time suspended vs. time resumed, it is possible to simulate different speeds. This is completely API agnostic (it doesn't hook DirectX, OpenGL, etc. it'll work with a command line program if you want.)
Old CPU Simulator does not ask for a percentage, but rather, the clock speed to simulate (which it calls the Target Rate.) It then automatically determines, based on your CPU's real clock speed, the percentage to use. Although clock speed is not the only factor that has improved computer performance over time (there are also SSDs, faster GPUs, more RAM, multithreaded performance, etc.) it's a good enough approximation to get fairly consistent results across machines given the same Target Rate. It also supports other options that may help with consistency, such as setting the process affinity to one.
It implements three different methods of suspending and resuming a process and will use the best available: NtSuspendProcess, NtQuerySystemInformation, or Toolhelp Snapshots. It also uses timeBeginPeriod and timeEndPeriod to achieve high precision timing without busy looping. Note that this is not an emulator; the binary still runs natively. If you like, you can view the source to see how it's implemented - it's not a large project. On my machine, Old CPU Simulator uses less than 1% CPU and less than 1 MB of memory, so the program itself is quite efficient (unlike running intensive programs to intentionally slow the CPU.)