backstory
I put together a simple multi-threaded brute-force hash hacking program for a job application test requirement.
Here are some of the particulars
It functions properly, but the performance is quite a bit different between my initial version and this altered portion.
factors
The reason for the alteration was due to increased number of possible combinations between the sample data processing and the test/challenge data processing.
The application test sample was 16^7 total combinations.. which is of course less that uint32 (or 16^8).
the challenge is a 9 length hashed string that produces a hashed long value (that I was given); thus it is 16^9. The size difference was something I accounted for, which is why I took the easy route of putting the initial program together targeting the 7 length Hashed string - getting it to function properly on a smaller scale.
overall
The issue isn't just the increased combinations, it is dramatically slower due to the loop operating using long/int64 or uint64..
when I crunched the numbers using int32 (not even uint32) data types.. I could hear my comp kick it up a notch.. The entire check was done in under 4 minutes. that's 16777216 (16^6) combination checks per thread..
noteworthy - multithreading
I broke everything into worker threads.. 16 of them, 1 for each of the beginning characters.. thus I'm only looking for 16^8 combination on each thread now... which is 1 freaking unit higher than uint32 value (includes 0)...
I'll give a final thought after I put up this code segment..
The function is as followed:
Function Propogate() As Boolean
Propogate = False
Dim combination As System.Text.StringBuilder = New System.Text.StringBuilder(Me.ScopedLetters)
For I As Integer = 1 To (Me.ResultLength - Me.ScopedLetters.Length) Step 1
combination.Append(Me.CombinationLetters.Chars(0))
Next
'Benchmarking feedback - This simply adds values to a list to be checked against to denote progress
Dim ProgressPoint As New List(Of Long)
'###############################
'#[THIS IS THE POINT OF INTEREST]
'# Me.CombinationLetters = 16 #
'# Me.ResultLength = 7 Or 9 # The 7 was the sample size provided.. 9 was the challenge/test
'# Me.ScopedLetters.Length = 1 # In my current implementation
'###############################
Dim TotalCombinations As Long = CType(Me.CombinationLetters.Length ^ (Me.ResultLength - Me.ScopedLetters.Length), Long)
ProgressPoint.Add(1)
ProgressPoint.Add(CType(TotalCombinations / 5, Long))
ProgressPoint.Add(CType(TotalCombinations * 2 / 5, Long))
ProgressPoint.Add(CType(TotalCombinations * 3 / 5, Long))
ProgressPoint.Add(CType(TotalCombinations * 4 / 5, Long))
ProgressPoint.Add(CType(TotalCombinations, Long))
For I As Long = 1 To TotalCombinations Step 1
Me.AddKeyHash(combination.ToString) 'The hashing arithmetic and Hash value check is done at this call.
Utility.UpdatePosition(Me.CombinationLetters, combination) 'does all the incremental character swapping and string manipulation..
If ProgressPoint.Contains(I) Then
RaiseEvent OnProgress(CType((I / TotalCombinations) * 100, UInteger).ToString & " - " & Me.Name)
End If
Next
Propogate = True
End Function
I already have an idea of what I could try, drop it down the int32 again and put another loop around this loop (16 iterations)
But there might be better alternative, so I would like to hear from the community on this one.
Would a For Loop using double point precision cycle better?
by the way, how coupled is long types and arithmetic to cpu architecture.. specifically cacheing?
My development comp is old.. Pentium D running XP Professional x64 .. my excuse is that if it runs in my environment, it will likely run on Win Server 2003..
In the end, this could have likely been a hardware issue.. my old workstation did not survive much longer after doing this project.
Related
For the sake of completeness, I've searched and read other articles here, such as:
Parallel.ForEach not spinning up new threads
but they don't seem to address my case, so off we go:
I have a Parallel.ForEach of an array structure, like so:
Dim opts As New ParallelOptions
opts.MaxDegreeOfParallelism = intThreads
Parallel.ForEach(udtExecPlan, opts,
Sub(udtStep)
Dim strItem As String = udtStep.strItem
Basically, for each item, I do some nested loops and end up calling a function with those loop assignments as parameters.
The function executes a series of intense calculations (which takes up most of the function's processing time) and records the results on an MSSQL table, and if some conditions are met, the function returns True, else False. If the result is True, then I simply Return from the parallel function Sub(udtStep) and another item from the array should continue. If the result is False, I simply go through another interation of the deepest nested loop, and so on, working towards the completion of the other outer loops, etc. So, in a nutshell, all nested loops are inside the main Parallel.ForEach loop, like so:
Parallel.ForEach(udtExecPlan, opts,
Sub(udtStep)
Dim strItem As String = udtStep.strItem
If Backend.Exists(strItem) Then Return
For intA As Integer = 1 To 5
For intB As Integer = 1 To 50
Dim booResult As Boolean = DoCalcAndDBStuff(strItem, intA, intB)
If booResult = True Then Return
Next intB
Next intA
End Sub)
It is important to notice that udtExecPlan has about 585 elements. Each item takes from 1 minute to several hours to complete.
Now, here's the problem:
Whether I do this:
Dim opts As New ParallelOptions
opts.MaxDegreeOfParallelism = intThreads
Parallel.ForEach(udtExecPlan, opts,
where intThreads is the core count, or whatever number I assign to it (tried 5, 8, 62, 600), or whether I simply omit the whole the ParallelOptions declaration and opts from the Parallel.ForEach statement, I notice it will spin up as many threads I have specified upto the total amount of cores (including HT cores) in my system. That is all fine and well.
For example, on an HP DL580G7 server with 32 cores / 64 HT cores and 128GB RAM, I can see 5, 8, 62 or 64 (using the 600 option) threads busy on the Task Manager, which is what I'd expect.
However, as the items on the array are processed, the threads on Task Manager "die off" (go from around 75% utilization to 0%) and are never spun up again, until only 1 thread is working. For example, if I set intThreads to 62 (or unlimited if I omitted the whole ParallelOptions declaration and opts from the Parallel.ForEach statement), I can see on the db table that 62 (or 64) items have been processed, but from then on, it just falls back to 1 thread and 1 item at a time.
I was expecting that a new thread would be spun up as soon as an item was done, as there are some 585 items to go through. It is almost as if 62 or 64 items are done in parallel and then on only 1 item until completion, which renders the whole server practically idling thereafter.
What am I missing?
I have tried some other different processes with a main Parallel.For loop (no other outer loop present, just as in this example) and get the same behaviour.
Why? Any thoughts welcome.
I am using VS.NET 2015 with .NET Framework 4.6.1, W2K8R2 fully patched.
Thanks!
I got this simple question which confused me a bit. I got 2 processors Both of which can individually do 1 billion operations in 33.0360723.
Yet both of them together do the operations in 27.4996964.
This makes no sense for me, if the time for a task for one processor is X, then should it not be X/2 for both of them together?
My code:
Function calc(ByVal i As Integer, ByVal result As String)
Math.Sqrt(i)
Return True
End Function
Sub Main()
Dim result As String = Nothing
Dim starttime As TimeSpan
starttime = DateTime.Now.TimeOfDay
For i = 0 To 1000000000
calc(i, result)
Next
Console.WriteLine("A single processor runs 1 billion operations in: ")
Console.WriteLine(DateTime.Now.TimeOfDay - starttime)
starttime = DateTime.Now.TimeOfDay
Parallel.For(0, 1000000000, Function(i) calc(i, result))
Console.WriteLine("All your processors run 1 billion operations in: ")
Console.WriteLine(DateTime.Now.TimeOfDay - starttime)
Console.ReadLine()
End Sub
PS: I did the code for this in VB.net.
If a person can walk 2 miles in 30 minutes, how long will it take 2 people to walk the same 2 miles?
All jokes aside, the documentation at MSDN says:Executes a for (For in Visual Basic) loop in which iterations MAY run in parallel. the keyword here is MAY.
You are letting the CLR do the work and experience says that .net CLR does not always work the way you thought it would.
In my case (copy-pasted the code) single processor - 21.495 seconds, all processors: 7.03 seconds. I have an i7 870 CPU on 32 bit Windows 7.
In Parallel.For the order of iteration is not necessarily in the same order of the loops.
Also what your function does is sqrt(i) which means one processor might be doing sqrt(smallernumbers) and another sqrt(largernumbers) .
Simple answer is the work done by each processor is not exactly half of the whole work you gave them and so they are not likely to be equal.
One processor might have done more work and other might have completed its work and wait for the other. Or one of the processor might have been preempted by the operating system to do some important stuff while your working thread may have been waiting.
I have a very long string, I'd like to find all the NaN values and replace it 'Null'. This long string was converted from a 120 x 150000 cell. The reason for converting it into a long string was I was going to convert it into a one giant SQL query as fastinsert and datainsert can can be very slow and sometimes I'm running out of heap space. The idea is to do the following
exec(sqlConnection, long_string)
I tried using the regexpreop to replace NaN with null but it seems very slow. Is there an alternative way.
long_string = regexprep(long_string,'NaN','null');
As Floris mentioned regexp is a very strong command and as a result is slower than other find commands.
In addition to Floris suggestion you can try using strrep which works in your case, since you are not using any of the special powers of regexp.
Here is an example:
str = char('A' + rand(1,120 * 15000)*('z'-'A'));
tic
str2 = strrep(str, 'g', 'null');
disp('strrep: '), toc
tic
str3 = regexprep(str, 'g','null');
disp('regexprep: '), toc
On my computer it will return:
strrep:
Elapsed time is 0.004640 seconds.
regexprep:
Elapsed time is 4.004671 seconds.
regex is very powerful, but can be slow because of its flexibility. If you still have the original cell array - and assuming it contains only strings - the following line of code should work, and very fast:
cellArray{find(ismember(cellArray,'NaN'))} = 'null';
ismember finds all the elements in cellArray that are NaN, returning a boolean array with the same shape as cellArray; the find operation turns these into indices of the elements that are NaN, and then you just assign the value null to all of them.
It must be said that 120 x 150,000 is a VERY large cell array - it will occupy over 2 GB even with just a single character in each cell (I know this because I just created a 120x15000 cell array, and it was 205,500,000 bytes). It might be worth processing this in smaller chunks rather than all at once. Especially if you know that the NaN would occur only in some of the columns, for example.
Processing a GB sized string, especially when you can't operate in-place (you are changing the size of the string with every replacement, and it's getting longer, not shorter) is going to be dead slow. It's possible you could write a short mex function to do this if you really have no other option - that could be pretty fast.
Implementing this pid code that I have, mainly what info I need to pass into the function. There are six variables to pass but I don't really know what to enter.
A bit of background, I am automating my home brewery, and although it is all up and running, the temperature control of the RIMs is all over the place. For those not familiar with what a RIMs is, it is a way of ensuring the grains that are being soaked are kept at a very constant temperature. It does this by using a pump and a heating element to heat fluid taken from the bottom of the soaking vessel and passing it past the heating element and heating the fluid as it goes if needed. The code I have running at the moment is dumb so I need to replace it with something more intelligent, like a PID!
To heat the heater element Plan on using a simple function called every second that will change the amount of time the element is powered from 100ms to 1000ms depending on how much correction to the temperature is needed.
Ok, so I have the code, its just how to use it! I want to get it up and running in a stand alone windows form project using vb.net. I know I need to play with the PID values so it suits my application, but what to enter to get me started?
Many thanks for any help!
Public Function PID_output(ByVal process As Double, ByVal setpoint As Double, ByVal Gain As Double, _
ByVal Integ As Double, ByVal deriv As Double, ByVal deltaT As Double)
Dim Er As Double
Dim Derivative As Double
Dim Proportional As Double
Static Olderror As Double
Static Cont As Double
Static Integral As Double
Static Limiter_Switch As Double
Limiter_Switch = 1
Er = setpoint - process
If ((Cont >= 1 And Er > 0) Or (Cont <= 0 And Er < 0) Or (Integ >= 9999)) Then
Limiter_Switch = 0
Else
Limiter_Switch = 1
End If
Integral = Integral + Gain / Integ * Er * deltaT * Limiter_Switch
Derivative = Gain * deriv * (Er - Olderror) / deltaT
Proportional = Gain * Er
Cont = Proportional + Integral + Derivative
Olderror = Er
If (Cont > 1) Then
Cont = 1
End If
If (Cont < 0) Then
Cont = 0
End If
Return (Cont)
End Function
Using PID for heating process is a bit trickier than typical PID applications. The reason is; when you need to heat probably you start energizing a heater. The power of that heater is the key to how much energy you can pump into the system so that how fast you can heat it up. However, when it comes to cooling, cooling is normally achieved by the nature of the environment.
What I'm trying to say is: the system is not symmetrical.
May I offer you to use different control set according to the "direction" of your control. If the process is cooler than your setpoint, then use controller 1. But, if system is hotter that your setpoint, use controller 2.
For system 1, I definitely advise to use D term because D term is a kind of limiter of how fast your controller is building up the heat on progress. The worry is; many heat control systems have a considerable thermal inertia and lag (delay) of reading back the feedback. This often results in high overshoot (progress is reaching and passing the setpoint with considerable amounts). If exaggerated, will forever oscillate (fluctuate) :)
Also, for cooling effect, say environment is 20deg. Now, if your setpoint is 100 this is something, if your setpoint if 1000 this is totally different. because the Delta T will be different (80 and 980 degrees respectively) and the system response of "trying to get cool" will be a function of Delta T.
Cooling progress is not linear but exponential (like a capacitor discharging over a constant resistor). If your setpoint shall not be changing every day, then you are fine. But if otherwise, you'd better divide your setpoint space into regions and use different controller parameters for different setpoints too.
Where to start:
There are different PID tuning thumb of rules. Look at Ziegler Nichols method. But basically, get your progress cool (initial conditions) then give it full throttle heating power and record the time-heat graph. This graph is called step response and will be estimating the thermal inertial and the system lag for you. This will tell you the typical starting PID values when you check Ziegler Nichols method.
A question about a piece of lua code came up during a code review I had recently. The code in question is flushing a cache and reinitializing it with some data:
for filename,_ in pairs(fileTable) do
fileTable[filename] = nil
end
-- reinitialize here
Is there any reason the above loop shouldn't be replaced with this?
fileTable = { }
-- reinitialize here
It is due to table resize / rehash overhead. When a table is created, it is empty. When you insert an element, a rehash takes place and table size is grown to 1. The same happens, when you insert another element. The rule is that a table is grown whenever there's insufficient space (in either array or hash part) to hold another element. The new size is the smallest power of 2 that can accomodate required number of elements. E.g. rehash occurs when you insert an element, if a table holds 0, 1, 2, 4, 8, etc. elemnts.
Now the technique you're describing saves those rehashes, as Lua does not shrink tables. So when you have frequent fill / flush table operations it's better (performance wise) to do it like in your example than to create an empty table.
Update:
I've put up a little test:
local function rehash1(el, loops)
local table = {}
for i = 1, loops do
for j = 1, el do
table[j] = j
end
for k in ipairs(table) do table[k] = nil end
end
end
local function rehash2(el, loops)
for i = 1, loops do
local table = {}
for j = 1, el do
table[j] = j
end
end
end
local function test(elements, loops)
local time = os.time();
rehash1(elements, loops);
local time1 = os.time();
rehash2(elements, loops);
local time2 = os.time();
print("Time nils: ", tostring(time1 - time), "\n");
print("Time empty: ", tostring(time2 - time1), "\n");
end
Results are quit interesting. Running test(4, 10000000) on Lua 5.1 gave 7 seconds for nils and 10 seconds for empties. For tables bigger than 32 elements, the empty version was faster (the bigger the table, the bigger the difference). test(128, 400000) gave 9 seconds for nils and 5 seconds for empties.
Now on LuaJIT, where alloc and gc operations are relatively slow, running test(1024, 1000000) gave 3 seconds for nils and 7 seconds for empties.
P.S. Notice the sheer performance difference between plain Lua and LuaJIT. For 1024 element tables, plain Lua did 100,000 test iterations in about 20 seconds, LuaJIT did 1,000,000 iterations over 10 seconds!
Unless you have evidence otherwise, you'd be better off trusting Lua's garbage collection: just create a new, empty table when you need it.
Allocating a new table is a costly operation in Lua (which is true for any object allocation in pretty much any dynamic language). Additionally, constantly "losing" newly created to GC will put additional strain on performance, as well as memory, since every created table will still be in memory until GC actually comes to claim it.
Technique in your example trades those disadvantages for time required to explicitly remove all elements in table. This will always be a memory save and, depending on amount of elements, may often be a performance improvement as well.