Accurate way to detect rendering speed - rendering

I'm currently brainstorming for an idea of mine that involves a p2p render farm, somewhat like renderfarm.fi but in the difference that you pay for the service and contributors to the processing pool get paid.
Currently renderfarms measure price based on GHZ/h, but when the computers rendering are untrusted is there a good way to measure the equivilant GHZ/h of a computer, considering the computers could be partially loaded with other programs slowing down true time spent rendering, etc?

Because your worker process can ask the OS counters how much execution time they've received and that can be matched up with progress on the work package you can pay out based upon work-units-completed, but charge in GHz/h. You know you can't trust the user's clock (or anything else for that matter), but you can verify the work units returned and approximate their computational complexity by combining the program counters from multiple peers.
You have no way to know for sure that the system is or is not particularly loaded, but you do know if work went out and came back. However you will have to verify that the work was done correctly. Probably means over-provisioning and running every render twice on two different machines to ensure someone isn't inserting garbage results that are faster to compute.
Good luck. I don't know how you'll be able to beat the likes of Amazon with them charging ~$0.10 per GHz/h.

The operating system can, and most likely will measure the actual CPU time taken up by the process. As such, that can be used as a measure of how much practical time the process itself has spent running on the machine's CPU. The CPU time doesn't get skewed in any direction due to other processes running on the background, so it's very ideal for this purpose.
The CPU time itself is the resource such rendering services sell, as such it's logical to measure it per user/client basis and then price the service accordingly by the CPU time spent by the user/client of the render farm.

Related

is it possible to synchronize two computers at better than 1 ms accuracy using any internet protocols?

Say I have two computers - one located in Los Angeles and another located in Boston. Are there any known protocols or linux commands that could synchronize both of those computer clocks to better than 1 ms and NOT use GPS at all? I am under the impression the answer is no (What's the best way to synchronize times to millisecond accuracy AND precision between machines?)
Alternatively, are there any standard protocols (like NTP) in which the relative time difference could be known accurately between these two computers even if the absolutely time synchronization is off?
I am just wondering if there are any free or inexpensive ways to get better than 1 ms time accuracy without having to resort to GPS.
I don't know of any known protocol (perhaps there is) but can offer a method similar to the way scientists measure speed near that of light:
Have a process on both servers "pinging" the other server and waiting for a response, and timing the time it took for a response. Then start pinging periodically exactly when you expect the last ping to come in. Averaging (and discarding any far off samples) you will have the two servers after a while "thumping away" at the same rhythm. The two can also tell how much time between each "beat" is taking for each of them at a very high accuracy, by dividing the count of beats in the (long) period of time.
After the "rhythm" is established, if you know that one of the server's time is correct, or you want to use its time as the base, then you know what time it is when your server's signal reaches the other server. Together with the response it sends you what time IT has. You can then use that time to establish synchronization with your time system.
Last but not least, most operating systems give the non-kernel user the ability to act only in at least 32 milliseconds of accuracy, that is: you cannot expect something to happen exactly within less milliseconds than that. The only way to overcome that is to have a "native" DLL that can react and run with the clock. That too will give you only a certain speed of reaction, depending on the system (hardware and software).
Read about Real-Time systems and the "server" you are talking about (Windows? Linux? Embedded software on a microchip? Something else?)

What are some factors that could affect program runtime?

I'm doing some work on profiling the behavior of programs. One thing I would like to do is get the amount of time that a process has run on the CPU. I am accomplishing this by reading the sum_exec_runtime field in the Linux kernel's sched_entity data structure.
After testing this with some fairly simple programs which simply execute a loop and then exit, I am running into a peculiar issue, being that the program does not finish with the same runtime each time it is executed. Seeing as sum_exec_runtime is a value represented in nanoseconds, I would expect the value to differ within a few microseconds. However, I am seeing variations of several milliseconds.
My initial reaction was that this could be due to I/O waiting times, however it is my understanding that the process should give up the CPU while waiting for I/O. Furthermore, my test programs are simply executing loops, so there should be very little to no I/O.
I am seeking any advice on the following:
Is sum_exec_runtime not the actual time that a process has had control of the CPU?
Does the process not actually give up the CPU while waiting for I/O?
Are there other factors that could affect the actual runtime of a process (besides I/O)?
Keep in mind, I am only trying to find the actual time that the process spent executing on the CPU. I do not care about the total execution time including sleeping or waiting to run.
Edit: I also want to make clear that there are no branches in my test program aside from the loop, which simply loops for a constant number of iterations.
Thanks.
Your question is really broad, but you can incur context switches for various reasons. Calling most system calls involves at least one context switch. Page faults cause contexts switches. Exceeding your time slice causes a context switch.
sum_exec_runtime is equal to utime + stime from /proc/$PID/stat, but sum_exec_runtime is measured in nanoseconds. It sounds like you only care about utime which is the time your process has been scheduled in user mode. See proc(5) for more details.
You can look at nr_switches both voluntary and involuntary which are also part of sched_entity. That will probably account for most variation, but I would not expect successive runs to be identical. The exact time that you get for each run will be affected by all of the other processes running on the system.
You'll also be affected by the amount of file system cache used on your system and how many file system cache hits you get in successive runs if you are doing any IO at all.
To give a very concrete and obvious example of how other processes can affect the run time of the current process, think about if you are exceeding your physical RAM constraints. If your program asks for more RAM, then the kernel is going to spend more time swapping. That time swapping will be accounted in stime but will vary depending on how much RAM you need and how much RAM is available. There are lot's of other ways that other processes can affect your process's run time. This is just one example.
To answer your 3 points:
sum_exec_runtime is the actual time the scheduler ran the process including system time
If you count switching to the kernel as the process giving up the CPU, then yes, but it does not necessarily mean a different user process may get the CPU back once the kernel is done.
I think I've already answered this question that there are lot's of factors.

Webservice wcf performance counters for queue

Am trying to performance test a wcf webservice which should get a lot of traffic. Which performance counters are sensible to use and for which purpose..Naturally I am looking at CPU and RAM, but I would like to know when IIS is queing and when its having trouble...
Any advice on sensible performance counters gratefully received...
Cheers alex
MSDN has an entire section on WCF administration and diagnostics, and specifically, for performance counters in WCF.
There are also specific sections for performance counters hosted service calls, as well as for the endpoint and for operations.
I would suggest looking through those first, as there is a good amount of valuable information there.
Analysing performance counters is complicated and takes a lot of practice, which is my way of saying that I am not experienced enough to give a complete list.
You are going to look for some specific things to start with.
First off is of course how long it takes to return the webservice calls. This tells you if you even have a performance issue at that load.
Next every one looks at CPU. This really does not tell you a lot however.
RAM is good, but you want to know how often your app is paging to disk, so check the Page Faults/sec.
Check your logical and physical disks for Current Disk Queue Length. If your physical disk is queing at all, you are reading/writing to much to the disk.
Beyond that you would normally be trying to find a specific and likely obscure problem.
I usually take performance testing in stages. Do a first test with the basics and if a particular page is having a problem look at the load it is causing.
If the whole production server is not performing adequately, it is easier to add more hardware, but I prefer looking at the code that is running and make that better.
Before you run your performance monitors, you want to add the registry key:
HKLM/Services/CurrentControlSet/service/
Add ServiceModelService 4.0.0.0
under that add Performance then add a DWORD FileMappingFile.
The size for that will be number of services exposed * 33 * 350.
In your config you would then add
<system.ServiceModel>
<diagnostics performanceCounters="ServiceOnly"/>
</system.ServiceModel>
You can watch the following counters:
CPU / RAM (for memory leaks) / for each service Call and Call Duration as well as Calls Outstanding
CPU will show you how heavily your are saturating your server
RAM will show if you have memory leaks if it continues to grow and grow and grow
Calls will show the number of calls you are getting accumulative,
Calls Per Second will give you the volume you're handling
Calls Outstanding are clients that are waiting because your services could not handle the volume.
If you find some questionable numbers in these groupings then start looking at other elements like Calls Faulted or Calls Failed. (not sure of the difference between a failure and a fault)
It is rare that you would need to dig further into the issues than what the service only numbers will provide. When you get into the other two sets of counters your shared memory utilization gets really high.

Testing Real Time Operating System for Hardness

I have an embedded device (Technologic TS-7800) that advertises real-time capabilities, but says nothing about 'hard' or 'soft'. While I wait for a response from the manufacturer, I figured it wouldn't hurt to test the system myself.
What are some established procedures to determine the 'hardness' of a particular device with respect to real time/deterministic behavior (latency and jitter)?
Being at college, I have access to some pretty neat hardware (good oscilloscopes and signal generators), so I don't think I'll run into any issues in terms of testing equipment, just expertise.
With that kind of equipment, it ought to be fairly easy to sync the o-scope to a steady clock, produce a spike each time the real-time system produces an output, an see how much that spike varies from center. The less the variation, the greater the hardness.
To clarify Bob's answer maybe:
Use the signal generator to generate a pulse at some varying frequency.
Random distribution across some range would be best.
use the signal generator (trigger signal) to start the scope.
the RTOS has to respond, do it thing and send an output pulse.
feed the RTOS output into input 2 of the scope.
get the scope to persist/collect mode.
get the scope to start on A , stop on B. if you can.
in an ideal workd, get it to measure the distribution for you. A LeCroy would.
Start with a much slower trace than you would expect. You need to be able to see slow outliers.
You'll be able to see the distribution.
Assuming a normal distribution the SD of the response time variation is the SOFTNESS.
(This won't really happen in practice, but if you don't get outliers it is reasonably useful. )
If there are outliers of large latency, then the RTOS is NOT very hard. Does not meet deadlines well. Unsuitable then it is for hard real time work.
Many RTOS-like things have a good left edge to the curve, sloping down like a 1/f curve.
Thats indicitive of combined jitters. The thing to look out for is spikes of slow response on the right end of the scope. Keep repeating the experiment with faster traces if there are no outliers to get a good image of the slope. Should be good for some speculative conclusion in your paper.
If for your application, say a delta of 1uS is okay, and you measure 0.5us, it's all cool.
Anyway, you can publish the results ( and probably in the publish sense, but certainly on the web.)
Link from this Question to the paper when you've written it.
Hard real-time has more to do with how your software works than the hardware on its own. When asking if something is hard real-time it must be applied to the complete system (Hardware, RTOS and application). This means hard or soft real-time is system design issues.
Under loading exceeding the specification even a hard real-time system will fail (hopefully with proper failure indication) while a soft real-time system with low loading would give hard real-time results. How much processing must happen in time and how much pre/post processing can be performed is the real key to hard/soft real-time.
In some real-time applications some data loss is not a failure it should just be below a certain level, again a system criteria.
You can generate inputs to the board and have a small application count them and check at what level data is going to be lost. But that gives you a rating specific to that system running that application. As soon as you start doing more processing your computational load increases and you now have a different hard real-time limit.
This board will running a bare bones scheduler will give great predictable hard real-time performance for most tasks.
Running a full RTOS with heavy computational load you probably only get soft real-time.
Edit after comment
The most efficient and easiest way I have used to measure my software's performance (assuming you use a schedular) is by using a free running hardware timer on the board and to time stamp my start and end of my cycle. Or if you run a full RTOS time stamp you acquisition and transition. Save your Max time and run a average on the values over a second. If your average is around 50% and you max is within 20% of your average you are OK. If not it is time to refactor your application. As your application grows the cycle time will grow. You can monitor the effect of all your software changes on your cycle time.
Another way is to use a hardware timer generate a cyclical interrupt. If you are in time reset the interrupt. If you miss the deadline you have interrupt handler signal a failure. This however will only give you a warning once your application is taking to long but it rely on hardware and interrupts so you can't miss.
These solutions also eliminate the requirement to hook up a scope to monitor the output since the time information can be displayed in any kind of terminal by a background task. If it is easy to monitor you will monitor it regularly avoiding solving the timing problems at the end but as soon as they are introduced.
Hope this helps
I have the same board here at work. It's a slightly-modified 2.6 Kernel, I believe... not the real-time version.
I don't know that I've read anything in the docs yet that indicates that it is meant for strict RTOS work.
I think that this is not a hard real-time device, since it runs no RTOS.
I understand being geek, but using oscilloscope to test a computer with ethernet/usb/other digital ports and HUGE internal state (RAM) is both ineffective and unreliable.
Instead of watching wave forms, you can connect any PC to the output port and run proper statistical analysis.
The established procedure (if the input signal is analog by nature) is to test system against several characteristic inputs - traditionally spikes, step functions and sine waves of different frequencies - and measure phase shift and variance for each input type. Worst case is then used in specifications of the system.
Again, if you are using standard ports, you can easily generate those on PC. If the input is truly analog, a separate DAC or simply a good sound card would be needed.
Now, that won't say anything about OS being real-time - it could be running vanilla Linux or even Win CE and still produce good and stable results in those tests if hardware is fast enough.
So, you need to simulate heavy and varying loads on processor, memory and all ports, let it heat and eat memory for a few hours, and then repeat tests. If latency stays constant, it's hard real-time. If it doesn't, under any load and input signal type, increase above acceptable limit, it's soft. Otherwise, it's advertisement.
P.S.: Implication is that even for critical systems you don't actually need hard real-time if you have hardware.

Is there an ideal number of network operations for iPhone OS?

I'm using NSOperation and NSOperationQueue to handle all of my networking threads so my interface can remain responsive while handling data transfer over the internet. Currently, I've got my operation queue set to a maximum concurrent operation count of 5, and it seems to work well.
I'm wondering, though, if there is a more ideal number of concurrent network operations that would best maximize the available resources without choking the hardware. Are there any recommendations, or steps I might take to measure and find out for myself?
Given the iPhone (currently) runs a single core, I would guess 5 is around the right number.
But the only way to be sure would be to instrument it and find out what the usage looks like (CPU, Memory and Network). Network usage you could get based on the data transferred - but its hard to know what a reaspnable usage would be. I'm not sure if it is possible to get CPU/Memory statistics from the iPhone.
If you are doing large transfers, then more connections probably wont help much. If you are doing lots of small transfers, then more connections will help work around the back and forth of setting up and tearing down the connection.