Calculate estimated time of finding string where SHA256(string) starts with n zeros - sha256

i have a program that is finding for any string SHA 256 hash that is starting with n zeros.
It works like in bitcoin, so i get string for example "FOO", where im finding "nonce" where SHA(FOO+nonce) starts with for example 8 zeros. That nonce is 1231004720 (you can check it BTW)
The program of course works like a charm. I know, that things takes time, but i cant find, if i could calculate estimated time. If i can calculate n=10 in one hour, how much it can take to get 11
leading zeros hash. Somewhere i readed thats its "factor of 16", but IDKif its true and if yes, why ?
Thanks

Related

Fortran: can you explain this formatting string

I have a Fortran program which I need to modify, so I'm reading it and trying to understand. Can you please explain what the formatting string in the following statement means:
write(*,'(1p,(5x,3(1x,g20.10)))') x(jr,1:ncols)
http://www.fortran.com/F77_std/rjcnf0001-sh-13.html
breifly, you are writing three general (g) format floats per line. Each float has a total field width of 20 characters and 10 places to the right of the decimal. Large magnitude numbers are in exponential form.
The 1xs are simply added spaces (which could as well have been accomplished by increasing the field width ie, g21.10 since the numbers are right justified. The 5x puts an additional 5 spaces at the beginning of each line.
The somewhat tricky thing here is tha lead 1p which is a scale factor. It causes the mantissa of all exponential form numbers produced by the following g format to be multiplied by 10, and the exponent changed accordingly, ie instead of the default,
g17.10 -> b0.1234567890E+12
you get:
1p,g17.10 -> b1.2345678900E+11
b denotes a blank in the output. Be sure to allow room for a - in your field width count...
for completeness in the case of scale greater than one the number of decimal places is reduced (preserving the total precision) ie,
3p,g17.10 -> b123.45678900E+09 ! note only 8 digits after the decimal
that is 1p buys you a digit of precision over the default, but you don't get any more. Negative scales cost you precision, preserving the 10 digits:
-7p,g17.10 -> b0.0000000123E+19
I should add, the p scale factor edit descriptor does something completely different on input. Read the docs...
I'd like to add slightly to George's answer. Unfortunately this is a very nasty (IMO) part of Fortran. In general, bear in mind that a Fortran format specification is automatically repeated as long as there are values remaining in the input/output list, so it isn't necessary to provide formats for every value to be processed.
Scale factors
In the output, all floating point values following kP are multiplied by 10k. Fields containing exponents (E) have their exponent reduced by k, unless the exponent format is fixed by using EN (engineering) or ES (scientific) descriptors. Scaling does not apply to G editing, unless the value is such that E editing is applied. Thus, there is a difference between (1P,G20.10) and (1P,F20.10).
Grouping
A format like n() repeats the descriptors within parentheses n times before proceeding.

Replace NaN with Null of a very long string in Matlab

I have a very long string, I'd like to find all the NaN values and replace it 'Null'. This long string was converted from a 120 x 150000 cell. The reason for converting it into a long string was I was going to convert it into a one giant SQL query as fastinsert and datainsert can can be very slow and sometimes I'm running out of heap space. The idea is to do the following
exec(sqlConnection, long_string)
I tried using the regexpreop to replace NaN with null but it seems very slow. Is there an alternative way.
long_string = regexprep(long_string,'NaN','null');
As Floris mentioned regexp is a very strong command and as a result is slower than other find commands.
In addition to Floris suggestion you can try using strrep which works in your case, since you are not using any of the special powers of regexp.
Here is an example:
str = char('A' + rand(1,120 * 15000)*('z'-'A'));
tic
str2 = strrep(str, 'g', 'null');
disp('strrep: '), toc
tic
str3 = regexprep(str, 'g','null');
disp('regexprep: '), toc
On my computer it will return:
strrep:
Elapsed time is 0.004640 seconds.
regexprep:
Elapsed time is 4.004671 seconds.
regex is very powerful, but can be slow because of its flexibility. If you still have the original cell array - and assuming it contains only strings - the following line of code should work, and very fast:
cellArray{find(ismember(cellArray,'NaN'))} = 'null';
ismember finds all the elements in cellArray that are NaN, returning a boolean array with the same shape as cellArray; the find operation turns these into indices of the elements that are NaN, and then you just assign the value null to all of them.
It must be said that 120 x 150,000 is a VERY large cell array - it will occupy over 2 GB even with just a single character in each cell (I know this because I just created a 120x15000 cell array, and it was 205,500,000 bytes). It might be worth processing this in smaller chunks rather than all at once. Especially if you know that the NaN would occur only in some of the columns, for example.
Processing a GB sized string, especially when you can't operate in-place (you are changing the size of the string with every replacement, and it's getting longer, not shorter) is going to be dead slow. It's possible you could write a short mex function to do this if you really have no other option - that could be pretty fast.

Advice for bit level manipulation

I'm currently working on a project that involves a lot of bit level manipulation of data such as comparison, masking and shifting. Essentially I need to search through chunks of bitstreams between 8kbytes - 32kbytes long for bit patterns between 20 - 40bytes long.
Does anyone know of general resources for optimizing for such operations in CUDA?
There has been a least a couple of questions on SO on how to do text searches with CUDA. That is, finding instances of short byte-strings in long byte-strings. That is similar to what you want to do. That is, a byte-string search is much like a bit-string search where the number of bits in the byte-string can only be a multiple of 8, and the algorithm only checks for matches every 8 bits. Search on SO for CUDA string searching or matching, and see if you can find them.
I don't know of any general resources for this, but I would try something like this:
Start by preparing 8 versions of each of the search bit-strings. Each bit-string shifted a different number of bits. Also prepare start and end masks:
start
01111111
00111111
...
00000001
end
10000000
11000000
...
11111110
Then, essentially, perform byte-string searches with the different bit-strings and masks.
If you're using a device with compute capability >= 2.0, store the shifted bit-strings in global memory. The start and end masks can probably just be constants in your program.
Then, for each byte position, launch 8 threads that each checks a different version of the 8 shifted bit-strings against the long bit-string (which you now treat like a byte-string). In each block, launch enough threads to check, for instance, 32 bytes, so that the total number of threads per block becomes 32 * 8 = 256. The L1 cache should be able to hold the shifted bit-strings for each block, so that you get good performance.

Efficiently: Random numbers in fixed range without repetitions

Hey guys, I know that there are a million questions on random numbers, but exactly because of that I searched a lot but I couldn't find something similar to mine - without implying it's not there. In any case, pardon me if I am repeating a question, just point me to it if that's the case.
So, I wanna do something simple in the most efficient way.
I want to generate randomly all N integers in the range [0, N], one by one, such that there are no repetitions.
I know, I can do this by inserting everything in a list, shuffle it, get the head and then remove head from the list. But then I will have shuffled my list of length N, N-1 times.
Any better / faster idea?
You can just do one shuffle, and then step through the list.
I'd recommend a Fisher-Yates shuffle.
This question has been asked a few times, and in each case the correct answer given is to shuffle an array (either the original, or an array of indices), however this isn't a satisfactory answer in cases where the number of possible indices is prohibitively large (either it's huge, or memory is tight, or you simply crave maximum efficiency for whatever reason).
As such I want to add an alternative for the sake of completeness. Now, this isn't truly random, so if that's what you need then do not use this, however, if your goal is simply "good enough" with minimal memory requirements then the following pseudo-code may be of interest:
function init:
start = random [0, length) // Pick a fully random starting index
stride = random [1, length - 1) // Pick a random step size
next_index = start
function advance_next_index:
next_index = (next_index + stride) % length
if next_index is equal to start then
start = (start + 1) % length
next_index = start
Here's an example of how to implement a re-usable function for grabbing pseudo-random values:
counter = length
function pseudo_random:
counter = counter + 1
if counter is equal to length then
init()
counter = 0
advance_next_index()
return next_index
Quite simply pseudo_random will call init once every length iterations, thus re-shuffling the "random" pattern of results produced by advance_next_index, and ensure that for every length values there is not a single duplicate.
To reiterate; this isn't a particularly random algorithm, so it must not be used in situations where true randomness is required. However, the results are random enough for some basic, non-critical, tasks, and it has a tiny memory footprint. For example, if you just want to randomise some behaviour in a game to avoid something becoming repetitive, or the data-set is large and never exposed to the user (in which case it is effectively random to them) it would take a long time to piece together the order and somehow exploit it.
If anyone knows of any better algorithms with similar properties then please share!

find duplicates

You are given an array of elements. Some/all of them are duplicates. Find them in 0(n) time and 0(1) space. Property of inputs - Number are in the range of 1..n where n is the limit of the array.
If the O(1) storage is a limitation on additional memory, and not an indication that you can't modify the input array, then you can do it by sorting while iterating over the elements: move each misplaced element to its "correct" place - if it's already occupied by the correct number then print it as a duplicate, otherwise take the "incorrect" existing content and place it correctly before continuing the iteration. This may in turn require correcting other elements to make space, but there's a stack of at most 1 and the total number of correction steps is limited to N, added to the N-step iteration you get 2N which is still O(N).
Since both the number of elements in the array and the range of the array are variable based on n, I don't believe you can do this. I may be wrong, personally I doubt it, but I've been wrong before :-)
EDIT: And it looks like I may be wrong again :-) See Tony's answer, I believe he may have nailed it. I'll delete this answer once enough people agree with me, or it gets downvoted too much :-)
If the range was fixed (say, 1..m), you could do it thus:
dim quant[1..m]
for i in 1..m:
quant[m] = 0
for i in 1..size(array):
quant[array[i]] = quant[array[i]] + 1
for i in 1..m:
if quant[i] > 1:
print "Duplicate value: " + i
This works since you can often trade off space against time in most algorithms. But, because the range also depends on the input value, the normal trade-off between space and time is not plausible.