OTP Google Acc. Compatible - cryptography

I'm working on the implementation of a OTP Google Acc. compatible.
So far, I've been using
-RFC2104(http://www.ietf.org/rfc/rfc2104.txt),
-RFC4226(http://www.ietf.org/rfc/rfc4226.txt),
-RFC6238(https://www.rfc-editor.org/rfc/rfc6238), and following this schema :
[Pseudo code Time OTP] (http://en.wikipedia.org/wiki/Google_Authenticator#Pseudocode_for_Time_OTP)
function GoogleAuthenticatorCode(string secret)
key := base32decode(secret)
message := floor(current Unix time / 30)
hash := HMAC-SHA1(key, message)
offset := value of last nibble of hash
truncatedHash := hash[offset..offset+3] //4 bytes starting at the offset
Set the first bit of truncatedHash to zero //remove the most significant bit
code := truncatedHash mod 1000000
pad code with 0 until length of code is 6
return code
Until " hash := HMAC-SHA1(key, message) " everything is ok. I checked multiple time the result through other HMAC-SHA1 converters. (Well, I think so).
But then, I think something must go wrong ... because obviously I'm not getting the same code as my google-authenticator app (android). (At least it's still a 6-digits value).
The part I'm not quiet sure to understand well is :
offset := value of last nibble of hash
truncatedHash := hash[offset..offset+3] //4 bytes starting at the offset
Set the first bit of truncatedHash to zero //remove the most significant bit
Could someone give me a more detailed explanation on this ?
Thanks,

My guess would be that you may take the value of offset incorrectly.
The statement
value of last nibble of hash
is pretty vague if you don't have a proper definition of bit and byte ordering.
Quoted wikipedia page has links to a number of implementations, I think this Java implementation is something to check your code against:
byte[] hash = ...
// Dynamically truncate the hash
// OffsetBits are the low order bits of the last byte of the hash
int offset = hash[hash.length - 1] & 0xF;

Related

Measuring Program Execution Time with Cycle Counters

I have confusion in this particular line-->
result = (double) hi * (1 << 30) * 4 + lo;
of the following code:
void access_counter(unsigned *hi, unsigned *lo)
// Set *hi and *lo to the high and low order bits of the cycle
// counter.
{
asm("rdtscp; movl %%edx,%0; movl %%eax,%1" // Read cycle counter
: "=r" (*hi), "=r" (*lo) // and move results to
: /* No input */ // the two outputs
: "%edx", "%eax");
}
double get_counter()
// Return the number of cycles since the last call to start_counter.
{
unsigned ncyc_hi, ncyc_lo;
unsigned hi, lo, borrow;
double result;
/* Get cycle counter */
access_counter(&ncyc_hi, &ncyc_lo);
lo = ncyc_lo - cyc_lo;
borrow = lo > ncyc_lo;
hi = ncyc_hi - cyc_hi - borrow;
result = (double) hi * (1 << 30) * 4 + lo;
if (result < 0) {
fprintf(stderr, "Error: counter returns neg value: %.0f\n", result);
}
return result;
}
The thing I cannot understand is that why is hi being multiplied with 2^30 and then 4? and then low added to it? Someone please explain what is happening in this line of code. I do know that what hi and low contain.
The short answer:
That line turns a 64bit integer that is stored as 2 32bit values into a floating point number.
Why doesn't the code just use a 64bit integer? Well, gcc has supported 64bit numbers for a long time, but presumably this code predates that. In that case, the only way to support numbers that big is to put them into a floating point number.
The long answer:
First, you need to understand how rdtscp works. When this assembler instruction is invoked, it does 2 things:
1) Sets ecx to IA32_TSC_AUX MSR. In my experience, this generally just means ecx gets set to zero.
2) Sets edx:eax to the current value of the processor’s time-stamp counter. This means that the lower 64bits of the counter go into eax, and the upper 32bits are in edx.
With that in mind, let's look at the code. When called from get_counter, access_counter is going to put edx in 'ncyc_hi' and eax in 'ncyc_lo.' Then get_counter is going to do:
lo = ncyc_lo - cyc_lo;
borrow = lo > ncyc_lo;
hi = ncyc_hi - cyc_hi - borrow;
What does this do?
Since the time is stored in 2 different 32bit numbers, if we want to find out how much time has elapsed, we need to do a bit of work to find the difference between the old time and the new. When it is done, the result is stored (again, using 2 32bit numbers) in hi / lo.
Which finally brings us to your question.
result = (double) hi * (1 << 30) * 4 + lo;
If we could use 64bit integers, converting 2 32bit values to a single 64bit value would look like this:
unsigned long long result = hi; // put hi into the 64bit number.
result <<= 32; // shift the 32 bits to the upper part of the number
results |= low; // add in the lower 32bits.
If you aren't used to bit shifting, maybe looking at it like this will help. If lo = 1 and high = 2, then expressed as hex numbers:
result = hi; 0x0000000000000002
result <<= 32; 0x0000000200000000
result |= low; 0x0000000200000001
But if we assume the compiler doesn't support 64bit integers, that won't work. While floating point numbers can hold values that big, they don't support shifting. So we need to figure out a way to shift 'hi' left by 32bits, without using left shift.
Ok then, shifting left by 1 is really the same as multiplying by 2. Shifting left by 2 is the same as multiplying by 4. Shifting left by [omitted...] Shifting left by 32 is the same as multiplying by 4,294,967,296.
By an amazing coincidence, 4,294,967,296 == (1 << 30) * 4.
So why write it in that complicated fashion? Well, 4,294,967,296 is a pretty big number. In fact, it's too big to fit in an 32bit integer. Which means if we put it in our source code, a compiler that doesn't support 64bit integers may have trouble figuring out how to process it. Written like this, the compiler can generate whatever floating point instructions it might need to work on that really big number.
Why the current code is wrong:
It looks like variations of this code have been wandering around the internet for a long time. Originally (I assume) access_counter was written using rdtsc instead of rdtscp. I'm not going to try to describe the difference between the two (google them), other than to point out that rdtsc does not set ecx, and rdtscp does. Whoever changed rdtsc to rdtscp apparently didn't know that, and failed to adjust the inline assembler stuff to reflect it. While your code might work fine despite this, it might do something weird instead. To fix it, you could do:
asm("rdtscp; movl %%edx,%0; movl %%eax,%1" // Read cycle counter
: "=r" (*hi), "=r" (*lo) // and move results to
: /* No input */ // the two outputs
: "%edx", "%eax", "%ecx");
While this will work, it isn't optimal. Registers are a valuable and scarce resource on i386. This tiny fragment uses 5 of them. With a slight modification:
asm("rdtscp" // Read cycle counter
: "=d" (*hi), "=a" (*lo)
: /* No input */
: "%ecx");
Now we have 2 fewer assembly statements, and we only use 3 registers.
But even that isn't the best we can do. In the (presumably long) time since this code was written, gcc has added both support for 64bit integers and a function to read the tsc, so you don't need to use asm at all:
unsigned int a;
unsigned long long result;
result = __builtin_ia32_rdtscp(&a);
'a' is the (useless?) value that was being returned in ecx. The function call requires it, but we can just ignore the returned value.
So, instead of doing something like this (which I assume your existing code does):
unsigned cyc_hi, cyc_lo;
access_counter(&cyc_hi, &cyc_lo);
// do something
double elapsed_time = get_counter(); // Find the difference between cyc_hi, cyc_lo and the current time
We can do:
unsigned int a;
unsigned long long before, after;
before = __builtin_ia32_rdtscp(&a);
// do something
after = __builtin_ia32_rdtscp(&a);
unsigned long long elapsed_time = after - before;
This is shorter, doesn't use hard-to-understand assembler, is easier to read, maintain and produces the best possible code.
But it does require a relatively recent version of gcc.

Dealing with Int64 value with Booksleeve

I have a question about Marc Gravell's Booksleeve library.
I tried to understand how booksleeve deal the Int64 value (i have billion long value in Redis actually)
I used reflection to undestand the Set long value overrides.
// BookSleeve.RedisMessage
protected static void WriteUnified(Stream stream, long value)
{
if (value >= 0L && value <= 99L)
{
int i = (int)value;
if (i <= 9)
{
stream.Write(RedisMessage.oneByteIntegerPrefix, 0, RedisMessage.oneByteIntegerPrefix.Length);
stream.WriteByte((byte)(48 + i));
}
else
{
stream.Write(RedisMessage.twoByteIntegerPrefix, 0, RedisMessage.twoByteIntegerPrefix.Length);
stream.WriteByte((byte)(48 + i / 10));
stream.WriteByte((byte)(48 + i % 10));
}
}
else
{
byte[] bytes = Encoding.ASCII.GetBytes(value.ToString());
stream.WriteByte(36);
RedisMessage.WriteRaw(stream, (long)bytes.Length);
stream.Write(bytes, 0, bytes.Length);
}
stream.Write(RedisMessage.Crlf, 0, 2);
}
I don't understand why, with more than two digits int64, the long is encoding in ascii?
Why don't use byte[] ? I know than i can use byte[] overrides to do this, but i just want to understand this implementation to optimize mine. There may be a relationship with the Redis storage.
By advance thank you Marc :)
P.S : i'm still very enthusiastic about your next major version, than i can use long value key instead of string.
It writes it in ASCII because that is what the redis protocol demands.
If you look carefully, it is always encoded as ASCII - but for the most common cases (0-9, 10-99) I've special-cased it, as these are very simple results:
x => $1\r\nX\r\n
xy => $2\r\nXY\r\n
where x and y are the first two digits of a number in the range 0-99, and X and Y are those digits (as numbers) offset by 48 ('0') - so decimal 17 becomes the byte sequence (in hex):
24-32-0D-0A-31-37-0D-0A
Of course, that can also be achieved simply via the writing each digit sequentially and offsetting the digit value by 48 ('0'), and handling the negative sign - I guess the answer there is simply "because I coded it the simple but obviously correct way". Consider the value -123 - which is encoded as $4\r\n-123\r\n (hey, don't look at me - I didn't design the protocol). It is slightly awkward because it needs to calculate the buffer length first, then write that buffer length, then write the value - remembering to write in the order 100s, 10s, 1s (which is much harder than writing the other way around).
Perfectly willing to revisit it - simply: it works.
Of course, it becomes trivial if you have a scratch buffer available - you just write it in the simple order, then reverse the portion of the scratch buffer. I'll check to see if one is available (and if not, it wouldn't be unreasonable to add one).
I should also clarify: there is also the integer type, which would encode -123 as :-123\r\n - however, from memory there are a lot of places this simply does not work.

vb xor checksum

This question may already have been asked but nothing on SO actually gave me the answer I need.
I am trying to reverse engineer someone else's vb.NET code and I am stuck with what a Xor is doing here. Here is 1 line of the body of a soap request that gets parsed (some values have been obscured so the checksum may not work in this case):
<HD>CHANGEDTHIS01,W-A,0,7753.2018E,1122.6674N, 0.00,1,CID_V_01*3B</HD>
and this is the snippet of vb code that checks it
LastStar = strValues(CheckLoop).IndexOf("*")
StrLen = strValues(CheckLoop).Length
TransCheckSum = Val("&h" + strValues(CheckLoop).Substring(LastStar + 1, (StrLen - (LastStar + 1))))
CheckSum = 0
For CheckString = 0 To LastStar - 1
CheckSum = CheckSum Xor Asc(strValues(CheckLoop)(CheckString))
Next '
If CheckSum <> TransCheckSum Then
'error with the checksum
...
OK, I get it up to the For loop. I just need an explanation of what the Xor is doing and how that is used for the checksum.
Thanks.
PS: As a bonus, if anyone can provide a c# translation I would be most grateful.
Using Xor is a simple algorithm to calculate a checksum. The idea is the same as when calculating a parity bit, but there is eight bits calculated across the bytes. More advanced algorithms like CRC and MD5 are often used to calculate checksums for more demanding applications.
The C# code would look like this:
string value = strValues[checkLoop];
int lastStar = value.IndexOf("*");
int transCheckSum = Convert.ToByte(value.Substring(lastStar + 1, 2), 16);
int checkSum = 0;
for (int checkString = 4; checkString < lastStar; checkString++) {
checkSum ^= (int)value[checkString];
}
if (checkSum != transCheckSum) {
// error with the checksum
}
I made some adjustments to the code to accomodate the transformation to C#, and some things that makes sense. I declared the variables used, and used camel case rather than Pascal case for local variables. I use a local variable for the string, instead of getting it from the collection each time.
The VB Val method stops parsing when it finds a character that it doesn't recognise, so to use the framework methods I assumed that the length of the checksum is two characters, so that it can parse the string "3B" rather than "3B</HD>".
The loop starts at the fourth character, to skip the first "<HD>", which should logically not be part of the data that the checksum should be calculated for.
In C# you don't need the Asc function to get the character code, you can just cast the char to an int.
The code is basically getting the character values and doing a Xor in order to check the integrity, you have a very nice explanation of the operation in this page, in the Parity Check section : http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/xor.html

(bitcoin) Calculate hash from getwork function - how to do it?

when I call getwork on my bitcoind server, I get the following:
./bitcoind getwork
{
"midstate" : "695d56ae173bbd0fd5f51d8f7753438b940b7cdd61eb62039036acd1af5e51e3",
"data" : "000000013d9dcbbc2d120137c5b1cb1da96bd45b249fd1014ae2c2b400001511000000009726fba001940ebb5c04adc4450bdc0c20b50db44951d9ca22fc5e75d51d501f4deec2711a1d932f00000000000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000",
"hash1" : "00000000000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000000000000000000000000010000",
"target" : "00000000000000000000000000000000000000000000002f931d000000000000"
}
This protocol does not seem to be documented. How do I compute the hash from this data. I think that this data is in little endian. So the first step is to convert everything to big endian? Once that is done, I calculate the sha256 of the data. The data can be divided in two chuncks of 64 bytes each. The hash of the first chuck is given by midstate and therefore does not have to be computed.
I must therefore hash the chunck #2 with sha256, using the midstate as the initial hash values. Once that is done, I end up with a hash of chunk 2, which is 32 bytes. I calculate the hash of this chunk one more time to get a final hash.
Then, do I convert everything to little endian and submit the work?
What is hash1 used for?
The hash calculation is documented at Block hashing algorithm.
Start there for the relatively simple basics. The basic data structures are documented in Protocol specification - Bitcoin Wiki. Note that the protocol definition (and the definition of work) more or less assumes that SHA-256 hashes are 256-bit little-endian values, rather than big-endian as the standard implies. See also
Getwork is more complicated and runs into more serious endian/byte ordering confusion.
First note that the getwork API is optimized to speed up the initial steps of mining.
The midstate and hash1 values are for these performance optimizations and can be ignored. Just look at the "data".
And when a standard sha256 implementation is used, only the first 80 bytes (160 hex characters) of the "data" are hashed.
Unfortunately, the JSON data presented in the getwork data structure has different endian characteristics than what is needed for hashing in the block example above.
They all say to go to the source for the answer, but the C++ source can be big and confusing. A simple alternative is the poold.py code. There is discussion of it here: New mining pool for testing. You only need to look at the first few lines of the "checkwork" routine, and the "bufreverse" and "bytereverse" functions, to get the byte ordering right. In the end it is just a matter of doing a reversal of the bytes in each 32-bit segment of the data. Yes - very odd. But endian issues are tricky and can end up that way....
Some other helpful information on the way "getwork" works can be found in discussions at:
Do I understand header hashing?
Stupid newbie question about the nonce
Note that finding the signal to noise in the original Bitcoin forum is getting very hard, and there is currently an Area51 proposal for a StackExchange site for Bitcoin and Crypto Currency in general. Come join us!
It sounds right, there is a script in javascript that do calculate the hash but I do not fully understand it so I don't know, maybe you understand it better if you look.
this.tryHash = function(midstate, half, data, hash1, target, nonce){
data[3] = nonce;
this.sha.reset();
var h0 = this.sha.update(midstate, data).state; // compute first hash
for (var i = 0; i < 8; i++) hash1[i] = h0[i]; // place it in the h1 holder
this.sha.reset(); // reset to initial state
var h = this.sha.update(hash1).state; // compute final hash
if (h[7] == 0) {
var ret = [];
for (var i = 0; i < half.length; i++)
ret.push(half[i]);
for (var i = 0; i < data.length; i++)
ret.push(data[i]);
return ret;
} else return null;
};
SOURCE: https://github.com/jwhitehorn/jsMiner/blob/4fcdd9042a69b309035dfe9c9ddf716119831a16/engine.js#L149-165
Frankly speaking
Bitcoin block hashing algorithm is not officially described by any source.
"
The hash calculation is documented at Block hashing algorithm.
"
should read
The hash calculation is "described" at Block hashing algorithm.
en.bitcoin.it/wiki/Block_hashing_algorithm
btw the example code in PHP comes with a bug (typo)
the example code in Python generates errors when run by Python3.3 for Windows XP 32
(missing support for string.decode)

Is there a practical limit to the size of bit masks?

There's a common way to store multiple values in one variable, by using a bitmask. For example, if a user has read, write and execute privileges on an item, that can be converted to a single number by saying read = 4 (2^2), write = 2 (2^1), execute = 1 (2^0) and then add them together to get 7.
I use this technique in several web applications, where I'd usually store the variable into a field and give it a type of MEDIUMINT or whatever, depending on the number of different values.
What I'm interested in, is whether or not there is a practical limit to the number of values you can store like this? For example, if the number was over 64, you couldn't use (64 bit) integers any more. If this was the case, what would you use? How would it affect your program logic (ie: could you still use bitwise comparisons)?
I know that once you start getting really large sets of values, a different method would be the optimal solution, but I'm interested in the boundaries of this method.
Off the top of my head, I'd write a set_bit and get_bit function that could take an array of bytes and a bit offset in the array, and use some bit-twiddling to set/get the appropriate bit in the array. Something like this (in C, but hopefully you get the idea):
// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//set the right bit
bytes[offset >> 3] |= (1 << (offset & 0x7));
return 0; //success
}
//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//get the right bit
return (bytes[offset >> 3] & (1 << (offset & 0x7));
}
I've used bit masks in filesystem code where the bit mask is many times bigger than a machine word. think of it like an "array of booleans";
(journalling masks in flash memory if you want to know)
many compilers know how to do this for you. Adda bit of OO code to have types that operate senibly and then your code starts looking like it's intent, not some bit-banging.
My 2 cents.
With a 64-bit integer, you can store values up to 2^64-1, 64 is only 2^6. So yes, there is a limit, but if you need more than 64-its worth of flags, I'd be very interested to know what they were all doing :)
How many states so you need to potentially think about? If you have 64 potential states, the number of combinations they can exist in is the full size of a 64-bit integer.
If you need to worry about 128 flags, then a pair of bit vectors would suffice (2^64 * 2).
Addition: in Programming Pearls, there is an extended discussion of using a bit array of length 10^7, implemented in integers (for holding used 800 numbers) - it's very fast, and very appropriate for the task described in that chapter.
Some languages ( I believe perl does, not sure ) permit bitwise arithmetic on strings. Giving you a much greater effective range. ( (strlen * 8bit chars ) combinations )
However, I wouldn't use a single value for superimposition of more than one /type/ of data. The basic r/w/x triplet of 3-bit ints would probably be the upper "practical" limit, not for space efficiency reasons, but for practical development reasons.
( Php uses this system to control its error-messages, and I have already found that its a bit over-the-top when you have to define values where php's constants are not resident and you have to generate the integer by hand, and to be honest, if chmod didn't support the 'ugo+rwx' style syntax I'd never want to use it because i can never remember the magic numbers )
The instant you have to crack open a constants table to debug code you know you've gone too far.
Old thread, but it's worth mentioning that there are cases requiring bloated bit masks, e.g., molecular fingerprints, which are often generated as 1024-bit arrays which we have packed in 32 bigint fields (SQL Server not supporting UInt32). Bit wise operations work fine - until your table starts to grow and you realize the sluggishness of separate function calls. The binary data type would work, were it not for T-SQL's ban on bitwise operators having two binary operands.
For example .NET uses array of integers as an internal storage for their BitArray class.
Practically there's no other way around.
That being said, in SQL you will need more than one column (or use the BLOBS) to store all the states.
You tagged this question SQL, so I think you need to consult with the documentation for your database to find the size of an integer. Then subtract one bit for the sign, just to be safe.
Edit: Your comment says you're using MySQL. The documentation for MySQL 5.0 Numeric Types states that the maximum size of a NUMERIC is 64 or 65 digits. That's 212 bits for 64 digits.
Remember that your language of choice has to be able to work with those digits, so you may be limited to a 64-bit integer anyway.