I have this snippet of code which will be used to implement a mnemonic phrase generator according to BIP39 Spec.
The problem is that most of the times the checksum is not correct, but in some cases it works, it depends on the given entropy. (I've used iancoleman.io bip39 for testing my checksum).
The following cases were observed:
128-bits of entropy were used.
Correct
Entropy: 10111101100010110111100011101111111110100010000101111110100101100000001100111111001100010010010011110110011010010010001011011000
Checksum: 1110
Incorrect
Entropy: 01011010000000110011001001001001001110100011100101010001001100111001111111000110000000011011110111011000011001010111001101111100
My checksum: 1010
Iancoleman checksum:1110
The first was a successful case, but the second failed. Below you can find my functions.
What did I miss?
def fill_bits(binary, bits):
if len(binary) < bits:
return "0" * (bits - len(binary)) + binary
return binary
# generate a given number of entropy bits
def generate_entropy(bits=256):
if bits < 128 or bits > 256:
raise EntropyRangeExceeded
entropybits = bin(int.from_bytes(os.urandom(bits // 8), byteorder=sys.byteorder))[2:]
return fill_bits(entropybits, bits)
# returns the sha256 hash of the given input
def sha256(_input):
return hashlib.sha256(_input.encode("utf-8")).hexdigest()
# returns the checksum of the input hash
# checksum is given by the first (entropy length / 32)
# bits of the sha256 hash applied on entropy bits
def get_checksum(_entropy):
entropy_length = len(_entropy) // 32
return bin(int(sha256(_entropy), 16))[2:][:entropy_length]
In sha256 the hash is calculated wrongly. No Utf8 encoding may be performed. Instead, the entropy must be represented as a byte array (see to_bytes) and the hash must be generated from this:
import hashlib
def sha256(_entropy):
entBytes = int(_entropy, 2).to_bytes(len(_entropy) // 8, byteorder='big')
return hashlib.sha256(entBytes).hexdigest()
Furthermore, the hash must be padded with leading 0-values to a length of 256 bit (see zfill), so that the leading 0-values are also considered in the checksum:
def get_checksum(_entropy):
entropy_length = len(_entropy) // 32
return bin(int(sha256(_entropy), 16))[2:].zfill(256)[:entropy_length];
Example 1, from here, step 4:
_entropy = '0011001010000101011111010000101111111111101000001001000001001010110100010101111001001011000100111100011110001001111011110111011010010100110011001110111001100010111011010010101101010011110100100110101111110001100101011001000110100010000110110001100101110001'
print(get_checksum(_entropy)) # 11110011
Example 2, your second example:
_entropy = '01011010000000110011001001001001001110100011100101010001001100111001111111000110000000011011110111011000011001010111001101111100'
print(get_checksum(_entropy)) # 1110
Example 3, leading 0-values, compare with the result from here:
_entropy = '10111101100011110111100011101111111110100010000101111110100101100000001100111111001100010010010011110110011011010010001011011000'
print(get_checksum(_entropy)) # 0010
Related
I'm trying to follow along with what the code is doing for VGGish and I came across a piece that I don't really understand. In vggish_input.py there is this:
def wavfile_to_examples(wav_file):
"""Convenience wrapper around waveform_to_examples() for a common WAV format.
Args:
wav_file: String path to a file, or a file-like object. The file
is assumed to contain WAV audio data with signed 16-bit PCM samples.
Returns:
See waveform_to_examples.
"""
wav_data, sr = wav_read(wav_file)
assert wav_data.dtype == np.int16, 'Bad sample type: %r' % wav_data.dtype
samples = wav_data / 32768.0 # Convert to [-1.0, +1.0]
return waveform_to_examples(samples, sr)
Where does the constant of 32768 come from and how does dividing that convert the data to samples?
I found this for converting to -1 and +1 and not sure how to bridge that with 32768.
https://stats.stackexchange.com/questions/178626/how-to-normalize-data-between-1-and-1
32768 is 2^15. int16 has a range of -32768 to +32767. If you have int16 as input and divide it by 2^15, you get a number between -1 and +1.
I want to verify a message signed by my trezor hardware wallet.
Basically I have these information.
.venv/bin/trezorctl btc get-public-node -n 0
Passphrase required:
Confirm your passphrase:
node.depth: 1
node.fingerprint: ea66f037
node.child_num: 0
node.chain_code: e02030f2a7dfb474d53a96cb26febbbe3bd3b9756f4e0a820146ff1fb4e0bd99
node.public_key: 026b4cc594c849a0d9a124725997604bc6a0ec8f100b621b1eaed4c6094619fc46
xpub: xpub69cRfCiJ5BVzesfFdsTgEb29SskY74wYfjTRw5kdctGN2xp1HF4udTP21t68PAQ4CBq1Rn3wAsWr84wiDiRmmSZLwkEkv4qK5T5Y7EXebyQ
$ .venv/bin/trezorctl btc sign-message 'aaa' -n 0
Please confirm action on your Trezor device
Passphrase required:
Confirm your passphrase:
message: aaa
address: 17DB2Q3oZVkQAffkpFvF4cwsXggu39iKdQ
signature: IHQ7FDJy6zjwMImIsFcHGdhVxAH7ozoEoelN2EfgKZZ0JVAbvnGN/w8zxiMivqkO8ijw8fXeCMDt0K2OW7q2GF0=
I wanted to use python3-ecdsa. When I want to verify the signature with any valid public key, I get an AssertionError: (65, 64), because the base64.b64decode of the signature is 65 bytes, but should be 64.
When I want to load the node.public_key into a ecdsa.VerifyingKey, I get an AssertionError: (32, 64), because the bytes.fromhex return 32 bytes, but every example I found uses 64 bytes for the public key.
Probably I need to convert the bip32 xpub to a public key, but I really dont know how.
Solutiion
python-ecdsa needs to be at version 0.14 or greater to handle compressed format of the public key.
import ecdsa
import base64
import hashlib
class DoubleSha256:
def __init__(self, *args, **kwargs):
self._m = hashlib.sha256(*args, **kwargs)
def __getattr__(self, attr):
if attr == 'digest':
return self.double_digest
return getattr(self._m, attr)
def double_digest(self):
m = hashlib.sha256()
m.update(self._m.digest())
return m.digest()
def pad_message(message):
return "\x18Bitcoin Signed Message:\n".encode('UTF-8') + bytes([len(message)]) + message.encode('UTF-8')
public_key_hex = '026b4cc594c849a0d9a124725997604bc6a0ec8f100b621b1eaed4c6094619fc46'
public_key = bytes.fromhex(public_key_hex)
message = pad_message('aaa')
sig = base64.b64decode('IHQ7FDJy6zjwMImIsFcHGdhVxAH7ozoEoelN2EfgKZZ0JVAbvnGN/w8zxiMivqkO8ijw8fXeCMDt0K2OW7q2GF0=')
vk = ecdsa.VerifyingKey.from_string(public_key, curve=ecdsa.SECP256k1)
print(vk.verify(sig[1:], message, hashfunc=DoubleSha256))
Public-key. Mathematically an elliptic curve public key is a point on the curve. For the elliptic curve used by Bitcoin, secp256k1, as well as other X9-style (Weierstrass form) curves, there are (in practice) two standard representations originally established by X9.62 and reused by many others:
uncompressed format: consists of one octet with value 0x04, followed by two blocks of size equal to the curve order size containing the (affine) X and Y coordinates. For secp256k1 this is 1+32x2 = 65 octets
compressed format: consists of one octet with value 0x02 or 0x03 indicating the parity of the Y coordinate, followed by a block of size equal tot he curve order containing the X coordinate. For secp256k1 this is 1+32 = 33 octets
The public key output by your trezor is the second form, 0x02 + 32 octets = 33 octets. Not 32.
I've never seen an X9EC library (ECDSA and/or ECDH) that doesn't accept at least the standard uncompressed form, and usually both. It is conceivable your python library expects only the uncompressed form without the leading 0x04, but if so this gratuitous and rather risky nonstandardness, unless a very good explanation is provided in the doc or code, would make me suspicious of its quality. If you do need to convert the compressed form to uncompressed you must implement the curve equation, which for secp256k1 can be found in standard references, not to mention many implementations. Compute x^3 + a*x + b, take the square root in F_p, and choose either the positive or negative value that has the correct parity (agreeing with the leading byte here 0x02).
The 'xpub' is a base58check encoding of a hierarchical deterministic key, which is not just an EC(DSA) key but adds metadata for the key derivation process. If you base58 decode it and remove the check, you get (in hex):
0488B21E01EA66F03700000000E02030F2A7DFB474D53A96CB26FEBBBE3BD3B9756F4E0A820146FF1FB4E0BD99026B4CC594C849A0D9A124725997604BC6A0EC8F100B621B1EAED4C6094619FC46good
which breaks down exactly as your display showed:
0488B21E fixed prefix
01 .depth
EA66F037 .fingerprint
00000000 .child_num
E02030F2A7DFB474D53A96CB26FEBBBE3BD3B9756F4E0A820146FF1FB4E0BD99 .chain_code
026B4CC594C849A0D9A124725997604BC6A0EC8F100B621B1EAED4C6094619FC46 .public_key
Confirming this, the ripemd160 of sha256 of (the bytes that are shown in hex as) 026B4CC594C849A0D9A124725997604BC6A0EC8F100B621B1EAED4C6094619FC46 is (the bytes shown in hex as) 441e1d2adf9ff2a6075d71d0d8782228e0df47f8, and prefixing the version byte 00 for mainnet to that and base58check encoding gives the address 17DB2Q3oZVkQAffkpFvF4cwsXggu39iKdQ as shown.
Signature. Mathematically an X9.62-type ECDSA signature is two integers, called r and s. There are two different standards for representing them, and Bitcoin uses both with variations:
ASN.1 DER format. DER is a general purpose encoding that contains 'tag' and 'length' metadata and variable length data depending on the numeric values, here r and s; for secp256k1 in general this encoding is usually 70 to 72 octets but occasionally less. However, to avoid certain 'malleability' attacks current Bitcoin requires use of 's' values less than half the curve order, commonly called 'low-s', which reduces the maximum length of the ASN.1 DER encoding to 71 octets. Bitcoin uses this for transaction signatures, and adds a 'sighash' byte immediately following it (in the 'scriptsig' aka redeem script) indicating certain options on how the signature was computed (and thus should be verified).
'plain' or P1363 format. This is fixed length and consists simply of the r and s values as fixed-length blocks; for secp256k1 this is 64 octets. Bitcoin uses this for message signatures but it adds a 'recovery' byte' to the beginning that allows determining the publickey from the message and signature if necessary, making the total 65 octets.
See https://bitcoin.stackexchange.com/questions/38351/ecdsa-v-r-s-what-is-v/38909 and https://bitcoin.stackexchange.com/questions/12554/why-the-signature-is-always-65-13232-bytes-long .
If your python library is designed for general purpose ECDSA, not Bitcoin, and wants a 64-byte signature, that almost certainly is the 'plain' format which corresponds to the Bitcoin message signature (here decoded from base64) with the first byte removed.
Let’s say I take 256 bits from a CSPRNG and assume it is perfectly 256 bits of entropy. Call this rand.
Then let’s say I take the sha256 of the ASCII text “password”. Call this hash.
Now we XOR rand and hash. Call this mixed.
Is the entropy of mixed less than that of rand?
If so, is there a formula for calculating its entropy?
Example below: What is the entropy of mixed as a function of rand and weak_hash
#!/usr/bin/python3
import hashlib, os
def main():
rand = int(os.urandom(32).hex(),16)
weak_hash = int(hashlib.sha256(b'password').digest().hex(),16)
mixed = ("%064x" % (rand ^ weak_hash))
print(mixed)
main()
You are describing a one-time-pad. If the key stream: the output of the CSPRNG is fully random then the ciphertext will be indistinguishable from random as well.
Of course the output of CSPRNG is not fully random. However, if the CSPRNG is well seeded with enough entropy then you'd have the same security as a stream cipher, which mimics a one time pad.
So the output (mixed) will be as random as the CSPRNG, as long as the CSPRNG doesn't get into a previously encountered state. That should basically only happen if the entropy source fails.
The RFC gives the formula
PRF(secret, label, seed) = P_MD5(S1, label + seed) XOR
P_SHA-1(S2, label + seed);
for doing this.P_hash in turn has the following formula:
P_hash(secret, seed) = HMAC_hash(secret, A(1) + seed) +
HMAC_hash(secret, A(2) + seed) +
HMAC_hash(secret, A(3) + seed) + ...
The RFC also says
P_hash can be iterated as many times as is necessary to produce the
required quantity of data. For example, if P_SHA-1 was being used to
create 64 bytes of data, it would have to be iterated 4 times
(through A(4)), creating 80 bytes of output data; the last 16 bytes
of the final iteration would then be discarded, leaving 64 bytes of
output data.
I find "P_hash can be iterated as many times as is necessary to produce the required quantity of data" confusing.Just how many times is necessary?Is there a threshold after which it doesn't matter how much it is? If so, what is this threshold?
I'm pretty sure that in TLS 1.0, the premaster secret and the master secret are always 48bytes long, but then when you create your key block further down the line, it can be longer than 48 bytes. i.e. if your cipher uses sha1 and aes256, you will need 136 bytes.
I'm trying to code an md5 hashing function in Python but it doesn't seem to work. I've isolated the problem to the message bits that are to be hashed. Yes, I'm actually converting each byte to bits and forming a bit message (I want to study the algorithm on a bit level). And this is where things are falling apart; my bit string is not correctly formed.
The simplest message would be "", it's 0 bytes long, padding would be a "1" followed (or not) by 511 "0"s (last 64 bits denote message length, which, as already said, is just 0).
10000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
I'm feeding 32-bit chunks of data into the transform function at a time. I've tried to manually position the 1 in all the positions of in the first chunk, as well as the last chunk (little endian). Where should the "1" be?
Thank you.
Update: The correct position for the first 32-bit word fed into the transform should in fact be: 00000000000000000000000010000000 which int(x,2) is 128 this mess is due to my A = rotL((A+F(B,C,D)+int(messageBits[0],2)+sinList[0]), s11)+B transform format using int() to interpret the bit strings as integer data, int() takes little endian format binary, thus 100.... was a very huge number.
MD5 uses big-endian convention at bit level, then little-endian convention at byte level.
The input is an ordered sequence of bits. Eight consecutive bits are a byte. A byte has a numerical value between 0 and 255; each bit in a byte has value 128, 64, 32, 16, 8, 4, 2 or 1, in that order (that's what "big-endian at bit level" means).
Four consecutive bytes are a 32-bit word. The numerical value of the word is between 0 and 4294967295. The first byte is least significant in that word ("little-endian at byte level"). Hence, if the four bytes are a, b, c and d in that order, then the word numerical value is a+256*b+65536*c+16777216*d.
In software applications, input is almost always a sequence of bytes (its length, in bits, is a multiple of 8). The aggregation of bits into bytes is assumed to have already taken place. Thus, the extra '1' padding bit will be the first bit of the next byte, and, since the bit-level convention is big-endian, that next byte will have numerical value 128 (0x80).
For an empty message, the very first bit will be the '1' padding bit, followed by a whole bunch of zeros. The message length is also zero, which encodes yet other zeros. Therefore, the padded message block will be a single '1' followed by 511 '0', as you show. When bits are assembled into bytes, the first byte will have value 128, followed by 63 bytes of value 0. When bytes are grouped into 32-bit words, the first word (M0) will have numerical value 128, and the 15 other words (M1 to M15) will have numerical value 0.
Refer to the MD5 specification for details. What I describe above is what is explained in the first paragraph of section 2 of RFC 1321. The same encoding is used for the message bit length (at the end of the padding), and for writing out the final hash result.