I got this simple question which confused me a bit. I got 2 processors Both of which can individually do 1 billion operations in 33.0360723.
Yet both of them together do the operations in 27.4996964.
This makes no sense for me, if the time for a task for one processor is X, then should it not be X/2 for both of them together?
My code:
Function calc(ByVal i As Integer, ByVal result As String)
Math.Sqrt(i)
Return True
End Function
Sub Main()
Dim result As String = Nothing
Dim starttime As TimeSpan
starttime = DateTime.Now.TimeOfDay
For i = 0 To 1000000000
calc(i, result)
Next
Console.WriteLine("A single processor runs 1 billion operations in: ")
Console.WriteLine(DateTime.Now.TimeOfDay - starttime)
starttime = DateTime.Now.TimeOfDay
Parallel.For(0, 1000000000, Function(i) calc(i, result))
Console.WriteLine("All your processors run 1 billion operations in: ")
Console.WriteLine(DateTime.Now.TimeOfDay - starttime)
Console.ReadLine()
End Sub
PS: I did the code for this in VB.net.
If a person can walk 2 miles in 30 minutes, how long will it take 2 people to walk the same 2 miles?
All jokes aside, the documentation at MSDN says:Executes a for (For in Visual Basic) loop in which iterations MAY run in parallel. the keyword here is MAY.
You are letting the CLR do the work and experience says that .net CLR does not always work the way you thought it would.
In my case (copy-pasted the code) single processor - 21.495 seconds, all processors: 7.03 seconds. I have an i7 870 CPU on 32 bit Windows 7.
In Parallel.For the order of iteration is not necessarily in the same order of the loops.
Also what your function does is sqrt(i) which means one processor might be doing sqrt(smallernumbers) and another sqrt(largernumbers) .
Simple answer is the work done by each processor is not exactly half of the whole work you gave them and so they are not likely to be equal.
One processor might have done more work and other might have completed its work and wait for the other. Or one of the processor might have been preempted by the operating system to do some important stuff while your working thread may have been waiting.
Related
My model is gradually slower down to an unacceptable speed(i.e. from 200 ticks per second to several seconds for one tick). I'd like to understand what the causes to this problem. What is a simplest way to check which part of the model is increasingly consuming the time? I tried used some other java profiler before but it's not good and difficault to understand.
A Java profiler like YourKit is the best way approach since it will provide the code "hots pots" in terms of the execution times for each class method. Alternatively, you can insert a few timing functions in parts of your model that you suspect contribute to most of the execution time, for example:
long start = System.nanoTime();
// some model code here
long end= System.nanoTime();
System.println("Step A time in seconds: " + (end - start)/1E9);
I have the following problem on a Windows 7 using VB.net with .NET Framework 4.0.
I have to send via a serial port a buffer of byte. The PC act like master and a device connected as slave receive the buffer. Each byte must be spaced from its next by a certain amount of time expressed in microseconds.
This is my code snippet
Dim t1 As New Stopwatch
Dim WatchTotal As New Stopwatch
WatchTotal.Reset()
WatchTotal.Start()
t1.Stop()
For i As Integer = 0 To _buffer.Length - 1
SerialPort1.Write(_buffer, i, 1)
t1.Reset()
t1.Start()
While ((1000000000 * t1.ElapsedTicks / Stopwatch.Frequency) < 50000) ' wait 50us
End While
t1.Stop()
Next
WatchTotal.Stop()
Debug.Print(WatchTotal.ElapsedMilliseconds)
Everything loop inside a thread.
Everything works correctly but on a machine with Windows 7 the Serial.write of 1 byte takes 1ms so if we have to send 1024 bytes it takes 1024ms.
This is confirmed by printing the elapsed time
Debug.Print(WatchTotal.ElapsedMilliseconds)
The problem seems to be in SerialPort.Write method.
The same code on a machine with Windows 10 takes less than 1ms.
The problem is more visible when we have to send many buffers of byte, in this case we send 16 buffers of 1027 bytes. On Win 7 it takes less than 20 seconds, in Win10 it takes the half or less (to send 1 buffers of 1027 bytes it takes approximately 120-150ms and less than 5 seconds to send 16 buffers of data).
Does anyone have any idea what that might depend on?
Thanks
EDIT 22/05/2020
If i remove the debug printing and the little delay to pause the communication i always have about 1027ms for sending 1027 bytes so i think that the problem belong only to SerialPort method and not to the timing or stopwatch object. This happen on a Windows 7 machine. The same executable on a Windows 10 machine go fast as expected.
For i As Integer = 0 To _buffer.Length - 1
SerialPort1.Write(_buffer, i, 1)
Next
One thing, your wait code seems cumbersome, try this.
'one tick is 100ns, 10 ticks is as microsecond
Const wait As Long = 50L * 10L ' wait 50us as ticks
While t1.ElapsedTicks < wait
End While
Busy loops are problematic and vary from machines. As I recall serial port handling was not very good on Win7, but I could be mistaken.
It is hard to believe that the receiver is that time sensitive.
If the Win7 workstation doesn't have or isn't using a high resolution timer, then that could account for the described difference.
From the Remarks section of the StopWatch class:
The Stopwatch measures elapsed time by counting timer ticks in the underlying timer mechanism. If the installed hardware and operating system support a high-resolution performance counter, then the Stopwatch class uses that counter to measure elapsed time. Otherwise, the Stopwatch class uses the system timer to measure elapsed time. Use the Frequency and IsHighResolution fields to determine the precision and resolution of the Stopwatch timing implementation.
Check the IsHighResolution field to determine if this is what is occurring.
For the sake of completeness, I've searched and read other articles here, such as:
Parallel.ForEach not spinning up new threads
but they don't seem to address my case, so off we go:
I have a Parallel.ForEach of an array structure, like so:
Dim opts As New ParallelOptions
opts.MaxDegreeOfParallelism = intThreads
Parallel.ForEach(udtExecPlan, opts,
Sub(udtStep)
Dim strItem As String = udtStep.strItem
Basically, for each item, I do some nested loops and end up calling a function with those loop assignments as parameters.
The function executes a series of intense calculations (which takes up most of the function's processing time) and records the results on an MSSQL table, and if some conditions are met, the function returns True, else False. If the result is True, then I simply Return from the parallel function Sub(udtStep) and another item from the array should continue. If the result is False, I simply go through another interation of the deepest nested loop, and so on, working towards the completion of the other outer loops, etc. So, in a nutshell, all nested loops are inside the main Parallel.ForEach loop, like so:
Parallel.ForEach(udtExecPlan, opts,
Sub(udtStep)
Dim strItem As String = udtStep.strItem
If Backend.Exists(strItem) Then Return
For intA As Integer = 1 To 5
For intB As Integer = 1 To 50
Dim booResult As Boolean = DoCalcAndDBStuff(strItem, intA, intB)
If booResult = True Then Return
Next intB
Next intA
End Sub)
It is important to notice that udtExecPlan has about 585 elements. Each item takes from 1 minute to several hours to complete.
Now, here's the problem:
Whether I do this:
Dim opts As New ParallelOptions
opts.MaxDegreeOfParallelism = intThreads
Parallel.ForEach(udtExecPlan, opts,
where intThreads is the core count, or whatever number I assign to it (tried 5, 8, 62, 600), or whether I simply omit the whole the ParallelOptions declaration and opts from the Parallel.ForEach statement, I notice it will spin up as many threads I have specified upto the total amount of cores (including HT cores) in my system. That is all fine and well.
For example, on an HP DL580G7 server with 32 cores / 64 HT cores and 128GB RAM, I can see 5, 8, 62 or 64 (using the 600 option) threads busy on the Task Manager, which is what I'd expect.
However, as the items on the array are processed, the threads on Task Manager "die off" (go from around 75% utilization to 0%) and are never spun up again, until only 1 thread is working. For example, if I set intThreads to 62 (or unlimited if I omitted the whole ParallelOptions declaration and opts from the Parallel.ForEach statement), I can see on the db table that 62 (or 64) items have been processed, but from then on, it just falls back to 1 thread and 1 item at a time.
I was expecting that a new thread would be spun up as soon as an item was done, as there are some 585 items to go through. It is almost as if 62 or 64 items are done in parallel and then on only 1 item until completion, which renders the whole server practically idling thereafter.
What am I missing?
I have tried some other different processes with a main Parallel.For loop (no other outer loop present, just as in this example) and get the same behaviour.
Why? Any thoughts welcome.
I am using VS.NET 2015 with .NET Framework 4.6.1, W2K8R2 fully patched.
Thanks!
backstory
I put together a simple multi-threaded brute-force hash hacking program for a job application test requirement.
Here are some of the particulars
It functions properly, but the performance is quite a bit different between my initial version and this altered portion.
factors
The reason for the alteration was due to increased number of possible combinations between the sample data processing and the test/challenge data processing.
The application test sample was 16^7 total combinations.. which is of course less that uint32 (or 16^8).
the challenge is a 9 length hashed string that produces a hashed long value (that I was given); thus it is 16^9. The size difference was something I accounted for, which is why I took the easy route of putting the initial program together targeting the 7 length Hashed string - getting it to function properly on a smaller scale.
overall
The issue isn't just the increased combinations, it is dramatically slower due to the loop operating using long/int64 or uint64..
when I crunched the numbers using int32 (not even uint32) data types.. I could hear my comp kick it up a notch.. The entire check was done in under 4 minutes. that's 16777216 (16^6) combination checks per thread..
noteworthy - multithreading
I broke everything into worker threads.. 16 of them, 1 for each of the beginning characters.. thus I'm only looking for 16^8 combination on each thread now... which is 1 freaking unit higher than uint32 value (includes 0)...
I'll give a final thought after I put up this code segment..
The function is as followed:
Function Propogate() As Boolean
Propogate = False
Dim combination As System.Text.StringBuilder = New System.Text.StringBuilder(Me.ScopedLetters)
For I As Integer = 1 To (Me.ResultLength - Me.ScopedLetters.Length) Step 1
combination.Append(Me.CombinationLetters.Chars(0))
Next
'Benchmarking feedback - This simply adds values to a list to be checked against to denote progress
Dim ProgressPoint As New List(Of Long)
'###############################
'#[THIS IS THE POINT OF INTEREST]
'# Me.CombinationLetters = 16 #
'# Me.ResultLength = 7 Or 9 # The 7 was the sample size provided.. 9 was the challenge/test
'# Me.ScopedLetters.Length = 1 # In my current implementation
'###############################
Dim TotalCombinations As Long = CType(Me.CombinationLetters.Length ^ (Me.ResultLength - Me.ScopedLetters.Length), Long)
ProgressPoint.Add(1)
ProgressPoint.Add(CType(TotalCombinations / 5, Long))
ProgressPoint.Add(CType(TotalCombinations * 2 / 5, Long))
ProgressPoint.Add(CType(TotalCombinations * 3 / 5, Long))
ProgressPoint.Add(CType(TotalCombinations * 4 / 5, Long))
ProgressPoint.Add(CType(TotalCombinations, Long))
For I As Long = 1 To TotalCombinations Step 1
Me.AddKeyHash(combination.ToString) 'The hashing arithmetic and Hash value check is done at this call.
Utility.UpdatePosition(Me.CombinationLetters, combination) 'does all the incremental character swapping and string manipulation..
If ProgressPoint.Contains(I) Then
RaiseEvent OnProgress(CType((I / TotalCombinations) * 100, UInteger).ToString & " - " & Me.Name)
End If
Next
Propogate = True
End Function
I already have an idea of what I could try, drop it down the int32 again and put another loop around this loop (16 iterations)
But there might be better alternative, so I would like to hear from the community on this one.
Would a For Loop using double point precision cycle better?
by the way, how coupled is long types and arithmetic to cpu architecture.. specifically cacheing?
My development comp is old.. Pentium D running XP Professional x64 .. my excuse is that if it runs in my environment, it will likely run on Win Server 2003..
In the end, this could have likely been a hardware issue.. my old workstation did not survive much longer after doing this project.
I have a machine which uses an NTP client to sync up to internet time so it's system clock should be fairly accurate.
I've got an application which I'm developing which logs data in real time, processes it and then passes it on. What I'd like to do now is output that data every N milliseconds aligned with the system clock. So for example if I wanted to do 20ms intervals, my oututs ought to be something like this:
13:15:05:000
13:15:05:020
13:15:05:040
13:15:05:060
I've seen suggestions for using the stopwatch class, but that only measures time spans as opposed to looking for specific time stamps. The code to do this is running in it's own thread, so should be a problem if I need to do some relatively blocking calls.
Any suggestions on how to achieve this to a reasonable (close to or better than 1ms precision would be nice) would be very gratefully received.
Don't know how well it plays with C++/CLR but you probably want to look at multimedia timers,
Windows isn't really real-time but this is as close as it gets
You can get a pretty accurate time stamp out of timeGetTime() when you reduce the time period. You'll just need some work to get its return value converted to a clock time. This sample C# code shows the approach:
using System;
using System.Runtime.InteropServices;
class Program {
static void Main(string[] args) {
timeBeginPeriod(1);
uint tick0 = timeGetTime();
var startDate = DateTime.Now;
uint tick1 = tick0;
for (int ix = 0; ix < 20; ++ix) {
uint tick2 = 0;
do { // Burn 20 msec
tick2 = timeGetTime();
} while (tick2 - tick1 < 20);
var currDate = startDate.Add(new TimeSpan((tick2 - tick0) * 10000));
Console.WriteLine(currDate.ToString("HH:mm:ss:ffff"));
tick1 = tick2;
}
timeEndPeriod(1);
Console.ReadLine();
}
[DllImport("winmm.dll")]
private static extern int timeBeginPeriod(int period);
[DllImport("winmm.dll")]
private static extern int timeEndPeriod(int period);
[DllImport("winmm.dll")]
private static extern uint timeGetTime();
}
On second thought, this is just measurement. To get an action performed periodically, you'll have to use timeSetEvent(). As long as you use timeBeginPeriod(), you can get the callback period pretty close to 1 msec. One nicety is that it will automatically compensate when the previous callback was late for any reason.
Your best bet is using inline assembly and writing this chunk of code as a device driver.
That way:
You have control over instruction count
Your application will have execution priority
Ultimately you can't guarantee what you want because the operating system has to honour requests from other processes to run, meaning that something else can always be busy at exactly the moment that you want your process to be running. But you can improve matters using timeBeginPeriod to make it more likely that your process can be switched to in a timely manner, and perhaps being cunning with how you wait between iterations - eg. sleeping for most but not all of the time and then using a busy-loop for the remainder.
Try doing this in two threads. In one thread, use something like this to query a high-precision timer in a loop. When you detect a timestamp that aligns to (or is reasonably close to) a 20ms boundary, send a signal to your log output thread along with the timestamp to use. Your log output thread would simply wait for a signal, then grab the passed-in timestamp and output whatever is needed. Keeping the two in separate threads will make sure that your log output thread doesn't interfere with the timer (this is essentially emulating a hardware timer interrupt, which would be the way I would do it on an embedded platform).
CreateWaitableTimer/SetWaitableTimer and a high-priority thread should be accurate to about 1ms. I don't know why the millisecond field in your example output has four digits, the max value is 999 (since 1000 ms = 1 second).
Since as you said, this doesn't have to be perfect, there are some thing that can be done.
As far as I know, there doesn't exist a timer that syncs with a specific time. So you will have to compute your next time and schedule the timer for that specific time. If your timer only has delta support, then that is easily computed but adds more error since the you could easily be kicked off the CPU between the time you compute your delta and the time the timer is entered into the kernel.
As already pointed out, Windows is not a real time OS. So you must assume that even if you schedule a timer to got off at ":0010", your code might not even execute until well after that time (for example, ":0540"). As long as you properly handle those issues, things will be "ok".
20ms is approximately the length of a time slice on Windows. There is no way to hit 1ms kind of timings in windows reliably without some sort of RT add on like Intime. In windows proper I think your options are WaitForSingleObject, SleepEx, and a busy loop.