Is Runtime::generate_uuid() safe in Scrypto? - smartcontracts

I want to make a game in a Scrypto blueprint where users can play with their Gumball NFTs.
My blueprint has a pub fn attack(&self, my_gumball: Proof, other_gumball_key: NonFungibleId) method that attacks another NFT by assigning it a random damage between 1 and 10. Should I use Runtime::generate_uuid() for this?

Great question! I'll give you a little example here from Radix as the generation of random numbers is an issue that all blockchains/DLTs/public networks struggle with and it's a problem that is genuinely hard to solve.
First of all, I'm assuming that you're using UUIDs as the random number for your dApp, so this entire reply is based on that. Underneath the hood, when you call the Uuid::generate function, at the end of a long chain of calls that take place, the following function is the function that handles the generation of the UUID: https://github.com/radixdlt/radixdlt-scrypto/blob/24168ae772215af5169549a7a2cc1adeb666baa6/radix-engine/src/engine/id_allocator.rs#L78
If you look through this function you will see that the this function uses the transaction hash + the next available id to generate the UUID for you. The next ID available is nothing special, it's simply a counter that is incremented each time we need to generate a new ID. All that this method does is that it hashes the tx_hash + next_id twice before loading it into a u128 and that is pretty much how the UUID is generated. This means that the UUID is a pseudorandom number and that if somebody has knowledge over what the transaction hash is, then they WILL be able to determine the "random" number that you will be using.
Let's go away from all of the theory for a second and let's try a few things in code. Here is some simple Scrypto code to show you how not-random the UUID really is:
use radix_engine::engine::{IdAllocator, IdSpace};
use scrypto::prelude::*;
#[test]
fn test_randomness() {
// Creating the ID allocator
let mut id_allocator: IdAllocator = IdAllocator::new(IdSpace::Application);
// A fictional transaction hash
let tx_hash: H256 = H256::from_str("0e4c5812f00b3c821335c54b3bbc835a157df1149480f6469a4dc6b51489e989").unwrap();
// Generating four UUIDs
println!("Generated UUID: {:?}", id_allocator.new_uuid(tx_hash).unwrap());
println!("Generated UUID: {:?}", id_allocator.new_uuid(tx_hash).unwrap());
println!("Generated UUID: {:?}", id_allocator.new_uuid(tx_hash).unwrap());
println!("Generated UUID: {:?}", id_allocator.new_uuid(tx_hash).unwrap());
}
As we have said, the IdAllocator.new_uuid method requires a transaction hash to run, so I have provided it with a sample transaction hash. If we run this code and look at the output, both you and I would have the following in our command line terminals:
Generated UUID: 333873524275763974188597434119212610710
Generated UUID: 315396769568132504258157739854036837613
Generated UUID: 31497316649309892037047888539219683042
Generated UUID: 300332381675622117598720587595812830316
You might ask, Well why are we both getting the same output, isn't this random?
The output that both of us would get would be exactly the same as the randomness of the UUID relies on the changing of the transaction hash and the next_id. So it's easy to tell that this is not a random function but a pseudorandom function.
So to answer your question:
Someone would be able to guess the number 5-10 before the scrypto code can even generate it?
Yes! Transactions in the mempool have their hashes visible so there is the hash part. Plus somebody with knowledge of your blueprint and with the number of IDs allocated during the transaction will 100% of the time be able to guess the random number before the scrypto code does, the code above is an example of how that can be done.
Conclusion: The Uuid::generate generates UUIDs pretty well but does not generate random numbers well because it's not meant to be a true random number function.

Related

How to extract encryption and MAC keys using KDF (X9.63) defined by javacardx.security.derivation

As per Java Card v3.1 new package is defined javacardx.security.derivation
https://docs.oracle.com/en/java/javacard/3.1/jc_api_srvc/api_classic/javacardx/security/derivation/package-summary.html
KDF X9.63 works on three inputs: input secret, counter and shared info.
Depends on length of generated key material, multiple rounds on hash is carried out to generated final output.
I am using this KDF via JC API to generated 64 bytes of output (which is carried out by 2 rounds of SHA-256) for a 16 bytes-Encryption Key, a 16 bytes-IV, and a 32 bytes-MAC Key.
Note: This is just pseudo code to put my question with necessary details.
DerivationFunction df = DerivationFunction.getInstance(DerivationFunction.ALG_KDF_ANSI_X9_63, false);
df.init(KDFAnsiX963Spec(MessageDigest.ALG_SHA_256, input, sharedInfo, (short) 64);
SecretKey encKey = KeyBuilder.buildKey(KeyBuilder.TYPE_AES, (short)16, false);
SecretKey macKey = KeyBuilder.buildKey(KeyBuilder.TYPE_HMAC, (short)32, false);
df.nextBytes(encKey);
df.nextBytes(IVBuffer, (short)0, (short)16);
df.lastBytes(macKey);
I have the following questions:
When rounds of KDF are performed? Are these performed during df.init() or during df.nextBytes() & df.lastBytes()?
One KDF round will generate 32 bytes output (considering SHA-256 algorithm) then how API's df.nextBytes() & df.lastBytes() will work with any output expected length < 32 bytes?
In this KDF counter is incremented in every next round then how counter will be managed between df.nextBytes() & df.lastBytes() API's?
When rounds of KDF are performed? Are these performed during df.init() or during df.nextBytes() & df.lastBytes()?
That seems to be implementation specific to me. It will probably be faster to perform all the calculations at one time, but in that case it still makes sense to wait for the first request of the bytes. On the other hand RAM is also often an issue, so on demand generation also makes some sense. That requires a somewhat trickier implementation though.
The fact that the output size is pre-specified probably indicates that the simpler method of generating all the key material at once is at least foreseen by the API designers (they probably created an implementation before subjecting it to peer review in the JCF).
One KDF round will generate 32 bytes output (considering SHA-256 algorithm) then how API's df.nextBytes() & df.lastBytes() will work with any output expected length < 32 bytes?
It will commonly return the leftmost bytes (of the hash output) and likely leave the rest of the bytes in a buffer. This buffer will likely be destroyed together with the rest of the state when lastBytes is called (so don't forget to do so).
Note that the API clearly states that you have to re-initialize the DerivationFunction instance if you want to use it again. So that is a very strong indication that they though of destruction of key material (something that is required by FIPS and Common Criteria certification, not just common sense).
Other KDF's could have a different way of returning bytes, but using the leftmost bytes and then add rounds to the right is so common you can call it universal. For the ANSI X9.63 KDF this is certainly the case and it is clearly specified in the standard that way.
In this KDF counter is incremented in every next round then how counter will be managed between df.nextBytes() & df.lastBytes() API's?
These are methods of the same class and cannot be viewed separately, so they are not separate API's. Class instances can keep state in anyway they want. It might simply hold the counter as class variable, but if it decided to generate the bytes during init or the first nextBytes / lastBytes call then the counter is not even required anymore.

Twofish known answer test

I'm looking into using twofish to encrypt strings of data. Before trusting my precious data to an unknown library I wish to verify that it agrees with the known answer tests published on Bruce Schneier's website.
To my dismay I tried three twofish implementations and found none that agree with the KAT. This leads me to believe that I'm doing something wrong and I'm wondering if someone could tell me what it is.
I've made sure the mode is the same (CBC), the key length is the same (128bits) and the iv/key/pt values are the same. Is there an additional parameter in play for twofish encryption?
Here are the first two test entries from CBC_E_M.txt from the KAT archive:
I=0
KEY=00000000000000000000000000000000
IV=00000000000000000000000000000000
PT=00000000000000000000000000000000
CT=3CC3B181E1495D0495D652B66921DA0F
I=1
KEY=3CC3B181E1495D0495D652B66921DA0F
IV=3CC3B181E1495D0495D652B66921DA0F
PT=BE938D30FAB43B71F2E114E9C0529299
CT=695250B109C6F71D410AC38B0BBDA3D2
I interpret these to be in hex, therefore 16bytes=128bits long.
I tried using the following twofish implementations:
ruby: https://github.com/mcarpenter/twofish.rb
JS: https://github.com/ryanofsky/twofish/
online: http://twofish.online-domain-tools.com/
All three give the same CT for the first test, namely (hex encoded)
9f589f5cf6122c32b6bfec2f2ae8c35a
So far so good, except it does not agree with CT0 in the KAT...
For the second test the ruby library and the online tool give:
f84268f0293adf4d24e27194911a24c
While the js library gives:
fd803b310bb5388ddb76d5faf9e23dbe
And neither of these agrees with CT1 in the KAT.
Am I doing something wrong here? Any help greatly appreciated.
The online tool is easy to use, just be sure to select HEX for the key and input text. Here is the ruby code I used to generate these values (it's necessary to check out each library for this to work):
def twofish_encrypt(iv_hex, key_hex, data_hex)
iv = iv_hex.gsub(/ /, "").scan(/../).map { |x| x.hex.chr }.join
key = key_hex.gsub(/ /, "").scan(/../).map { |x| x.hex.chr }.join
data = data_hex.gsub(/ /, "").scan(/../).map { |x| x.hex.chr }.join
tf = Twofish.new(key, :mode => :cbc, :padding => :none)
tf.iv = iv
enc_data = tf.encrypt(data)
enc_data.each_byte.map { |b| b.to_s(16) }.join
end
ct0 = twofish_encrypt("00000000000000000000000000000000",
"00000000000000000000000000000000",
"00000000000000000000000000000000")
puts "ct0: #{ct0}"
ct1 = twofish_encrypt("3CC3B181E1495D0495D652B66921DA0F",
"3CC3B181E1495D0495D652B66921DA0F",
"BE938D30FAB43B71F2E114E9C0529299")
puts "ct1: #{ct1}"
function twofish_encrypt(iv_hex, key_hex, data_hex) {
var iv = new BinData()
iv.setHexNibbles(iv_hex)
iv.setlength(16*8)
binkey = new BinData()
binkey.setHexNibbles(key_hex)
binkey.setlength(16*8)
key = new TwoFish.Key(binkey);
data = new BinData()
data.setHexNibbles(data_hex)
data.setlength(16*8)
cipher = new TwoFish.Cipher(TwoFish.MODE_CBC, iv);
enc_data = TwoFish.Encrypt(cipher, key, data);
return enc_data.getHexNibbles(32);
}
var ct0 = twofish_encrypt("00000000000000000000000000000000",
"00000000000000000000000000000000",
"00000000000000000000000000000000");
console.log("ct0: " + ct0);
var ct1 = twofish_encrypt("3CC3B181E1495D0495D652B66921DA0F",
"3CC3B181E1495D0495D652B66921DA0F",
"BE938D30FAB43B71F2E114E9C0529299");
console.log("ct1: " + ct1);
The header of the CBC_E_M.txt file reads:
Cipher Block Chaining (CBC) Mode - ENCRYPTION
Monte Carlo Test
The confusion can be explained by this description; from the NIST description of the Monte Carlo Tests:
Each Monte Carlo Test consists of four million cycles through the candidate algorithm implementation. These cycles are divided into four hundred groups of 10,000 iterations each. Each iteration consists of processing an input block through the candidate algorithm, resulting in an output block. At the 10,000th cycle in an iteration, new values are assigned to the variables needed for the next iteration. The results of each 10,000th encryption or decryption cycle are recorded and included by the submitter in the appropriate file.
So what you get in the text file is 400 results, each representing 10,000 iterations where each input of an iteration depends on the output of the previous iterations. This is obviously not the same as a single encryption. Monte Carlo tests are basically performing many tests using randomized input; in this case a high number of block cipher encrypts are used to perform the randomization.
To test if your CBC code is correct, just use any of the other test vectors (not the Monte Carlo ones) and assume an all zero IV. In that case a single block (ECB) encrypt has the identical outcome of CBC mode. This also works for the ever more popular CTR mode.
The initial 9f589f5cf6122c32b6bfec2f2ae8c35a value that you found is correct for a 128 bit all zero key, IV and plaintext. The f84268f0293adf4d24e27194911a24c value is correct as well.
There is certainly something wrong with your hex encoder, your result is even not of the correct size for that value (what happens with leading zero's of the hex encodings?). Given the results and code, I would definitely take a look at your encoding / decoding functions.

Kotlin: Why is Sequence more performant in this example?

Currently, I am looking into Kotlin and have a question about Sequences vs. Collections.
I read a blog post about this topic and there you can find this code snippets:
List implementation:
val list = generateSequence(1) { it + 1 }
.take(50_000_000)
.toList()
measure {
list
.filter { it % 3 == 0 }
.average()
}
// 8644 ms
Sequence implementation:
val sequence = generateSequence(1) { it + 1 }
.take(50_000_000)
measure {
sequence
.filter { it % 3 == 0 }
.average()
}
// 822 ms
The point here is that the Sequence implementation is about 10x faster.
However, I do not really understand WHY that is. I know that with a Sequence, you do "lazy evaluation", but I cannot find any reason why that helps reducing the processing in this example.
However, here I know why a Sequence is generally faster:
val result = sequenceOf("a", "b", "c")
.map {
println("map: $it")
it.toUpperCase()
}
.any {
println("any: $it")
it.startsWith("B")
}
Because with a Sequence you process the data "vertically", when the first element starts with "B", you don't have to map for the rest of the elements. It makes sense here.
So, why is it also faster in the first example?
Let's look at what those two implementations are actually doing:
The List implementation first creates a List in memory with 50 million elements.  This will take a bare minimum of 200MB, since an integer takes 4 bytes.
(In fact, it's probably far more than that.  As Alexey Romanov pointed out, since it's a generic List implementation and not an IntList, it won't be storing the integers directly, but will be ‘boxing’ them — storing references to Int objects.  On the JVM, each reference could be 8 or 16 bytes, and each Int could take 16, giving 1–2GB.  Also, depending how the List gets created, it might start with a small array and keep creating larger and larger ones as the list grows, copying all the values across each time, using more memory still.)
Then it has to read all the values back from the list, filter them, and create another list in memory.
Finally, it has to read all those values back in again, to calculate the average.
The Sequence implementation, on the other hand, doesn't have to store anything!  It simply generates the values in order, and as it does each one it checks whether it's divisible by 3 and if so includes it in the average.
(That's pretty much how you'd do it if you were implementing it ‘by hand’.)
You can see that in addition to the divisibility checking and average calculation, the List implementation is doing a massive amount of memory access, which will take a lot of time.  That's the main reason it's far slower than the Sequence version, which doesn't!
Seeing this, you might ask why we don't use Sequences everywhere…  But this is a fairly extreme example.  Setting up and then iterating the Sequence has some overhead of its own, and for smallish lists that can outweigh the memory overhead.  So Sequences only have a clear advantage in cases when the lists are very large, are processed strictly in order, there are several intermediate steps, and/or many items are filtered out along the way (especially if the Sequence is infinite!).
In my experience, those conditions don't occur very often.  But this question shows how important it is to recognise them when they do!
Leveraging lazy-evaluation allows avoiding the creation of intermediate objects that are irrelevant from the point of the end goal.
Also, the benchmarking method used in the mentioned article is not super accurate. Try to repeat the experiment with JMH.
Initial code produces a list containing 50_000_000 objects:
val list = generateSequence(1) { it + 1 }
.take(50_000_000)
.toList()
then iterates through it and creates another list containing a subset of its elements:
.filter { it % 3 == 0 }
... and then proceeds with calculating the average:
.average()
Using sequences allows you to avoid doing all those intermediate steps. The below code doesn't produce 50_000_000 elements, it's just a representation of that 1...50_000_000 sequence:
val sequence = generateSequence(1) { it + 1 }
.take(50_000_000)
adding a filtering to it doesn't trigger the calculation itself as well but derives a new sequence from the existing one (3, 6, 9...):
.filter { it % 3 == 0 }
and eventually, a terminal operation is called that triggers the evaluation of the sequence and the actual calculation:
.average()
Some relevant reading:
Kotlin: Beware of Java Stream API Habits
Kotlin Collections API Performance Antipatterns

If-else statements to cut down processing time

Suppose I have a number of possible inputs from the user of my program listed from most likely to least as input1, input2, input3,...,inputN. Would the following framework cut down on processing time by accessing the most probable If statement needed first and then ignoring the rest (rather than testing the validity of each If statement thereafter)? I assume the least probable inputN will be extra burdensome on the processor, but the limited likelihood of the user giving that input makes it worth it if this structure reduces processing time overall.
If (input1) then (output1)
Else
If (input2) then (output2)
Else
If (input3) then:(output3)
Else
If ...
... Else
OutputN
Thanks!
This is how if-else-if statements work.
if(booleanTest1)
{
//do a thing
}
else if(booleanTest2)
{
//do another thing
}
//...ad infinitum
else
{
//do default behavior
}
If booleanTest1 is true, we execute its code, and then skip past all the other tests.
If you're comparing one variable against many possible values, use a switch statement.
I do not know for sure, but I'd assume, that a switch-case wolud be more efficient during runtime, because of branch prediction. With If-elses you have many branches, that might go wrong, which is not good for the piped commands in the processor que.
If there are really a lot of possibilities.
I usually do ist with a map / dictionary of <Key, Method to call>. As long as they have the same signature, this might work. It may not be as fast as a switch-case, but it will grant you some flexilibity, when you need to react to new inputs.
example:
Dictionary myDic = new Dictionary();
myDic.Add(input1,() => What ever to do when input1 comes);
the call the looks like this:
myDicinput1;

How should I store and compare 4 values for several instances of the same object type?

I'm working in Objective-C/Cocoa and I have an object type Tile. Each has a signature that can be represented as 4 different integer values. If I output a few these values as a string, with -es separating the values, it looks like this example:
signature: 4-4-3-3
signature: 4-3-3-3
signature: 0-0-0-1
signature: 0-0-1-1
signature: 0-0-1-0
signature: 1-1-1-2
signature: 1-1-2-2
signature: 1-1-2-1
signature: 3-3-3-1
signature: 3-3-1-1
signature: 3-3-1-3
signature: 4-4-4-3
signature: 4-4-3-3
I'm currently storing each of the values as an unsigned short. There never will be negative values and the maximum value is very unlikely to be above 15 or so. Zero is a valid value. There is no 'nil' value.
I would like to be able to call:
[myTile signature] to retrieve the value.
[myTile matches:otherTile] to return a BOOL indicating whether the signatures match.
What is the most efficient way to store this "signature" and compare it to the signatures of the other Tile instances? It seems like string comparisons would be slow...
First off, I'd use the commonly used methods names for these tasks: description and isEqual*:.
Concerning your question, I think the best way is the simpler:
- (BOOL)isEqualToTile:(Tile)tile
{
return self.value1 == tile.value1 &&
self.value2 == tile.value2 &&
self.value3 == tile.value3 &&
self.value4 == tile.value4;
}
Another possibility could be to implement hash.
EDIT: I wouldn't worry too much about performance if I were you.
Because 8 comparisons are fast. I mean really fast. If you were to put together a little benchmark, you would find that each comparison take ~1.5E-8s to run. This doesn't talk to me much, but lets just say you could make 10,000,000 of these comparisons under 100ms if my math is right.
Because if one day you find your sofware slow, then it will be time to investigate the origin of this slowness (and I doubt it will come from this method), but remember that premature optimization is the root of all evil.
Because it took you 12 seconds to implement it, and it would probably take a bit more to think of a working hash function. Don't over-think it. See my second point.
Because, if you need one day to optimize this function (if you are doing it now, re-read my second point), Cocoa have a couple handy tools to make your software doing such a dumb and repetitive task parallel. Without scratching your head to make it faster on only one of your (ever growing number of) cores.