Using Linux terminal I'm able to generate hashes with leading zeroes, but I'm confused on how I would be able to generate hashes based on the number of trailing zeroes.
Related
What is the preferred datatype for storing Steam IDs? These IDs are very similar to credit card numbers, but is different cases of use. Until now I'm using unsigned big integer but I'm not 100% sure yet. If the ID starts with a zero number, can cause issues? Eg ID: 76561197960287930
In general number take less space on the disk to store and on the transfer from the database to the application compared to strings. They are for the same reason faster to compare e.g. in the where-clause of a query.
Have a look here for the bytes needed to store numbers and bytes to store strings.
In the database the numbers are stored without leading zeros. You could fill up your numbers with leading zeros in your application after loading them from the database, if the numbers always have a fixed size.
But if the numbers can have leading zeros strings are easier to handle, because you do not have to implement additional logic for edgecases like leading zeros.
How can I check whether an input string is in the form of a Md5 hash or not in Rails3.0?
Consider all 32-digit long hexadecimal numbers (ie. consisting solely of letters a-f and digits 0-9) to be md5 hashes.
I don't know if md5's codomain is the whole space of 32-digit long hexadecimals, but a hash should ideally satisfy the condition so you may just assume it is.
For example commit list on GitHub shows only first 10, or this line from tornadoweb which uses only 5
return static_url_prefix + path + "?v=" + hashes[abs_path][:5]
Are only the first 5 chars enough to make sure that 2 different hashes for 2 different files won't collide?
LE: The example above from tornadoweb uses md5 hash for generating a query sting for static file caching.
In general, No.
In fact, even if a full MD5 hash were given, it wouldn't be enough to prevent malicious users from generating collisions---MD5 is broken. Even with a better hash function, five characters is not enough.
But sometimes you can get away with it.
I'm not sure exactly what the context of the specific example you provided is. However, to answer your more general question, if there aren't bad guys actively trying to cause collisions, than using part of the hash is probably okay. In particular, given 5 hex characters (20 bits), you won't expect collisions before around 2^(20/2) = 2^10 ~ one thousand values are hashed. This is a consequence of the the Birthday paradox.
The previous paragraph assumes the hash function is essentially random. This is not an assumption anyone trying to make a cryptographically secure system should make. But as long as no one is intentionally trying to create collisions, it's a reasonable heuristic.
Hello I was trying to find a good way to hash a set of numerical numbers which its output would be under 20 characters that are positive and unique. Any one have any suggestions?
For hashing in general, I'd use the HASHBYTES function. You can then convert the binary data to a string and just pick the first 20 characters, that should still be unique enough.
To get around HASHBYTES limitations (8000 bytes for instance), you can incrementally hash, e.g. for each value concat the previous hash with the value to be added and hash that again. This will make it unique with order etc. and unless you append close to 8000 bytes in one value it will not cause data truncation for the hashing.
EDIT
Here is the problem I am trying to solve:
I have a string broken up into multiple parts. These parts are not of equal, or predictable length. Each part will have a hash value. When I concatenate parts I want to be able to use the hash values from each part to quickly get the hash value for the parts together. In addition the hash generated by putting the parts together must match the hash generated if the string were hashed as a whole.
Basically I want a hashing algorithm where the parts of the data being hashed can be hashed in parallel, and I do not want the order or length of the pieces to matter. I am not breaking up the string, but rather receiving it in unpredictable chunks in an unpredictable order.
I am willing to ensure an elevated collision rate, so long as it is not too elevated. I am also ok with a slightly slower algorithm as it is hardly noticeable on small strings, and done in parallel for large strings.
I am familiar with a few hashing algorithms, however I currently have a use-case for a hash algorithm with the property that the sum of two hashes is equal to a hash of the sum of the two items.
Requirements/givens
This algorithm will be hashing byte-strings with length of at least 1 byte
hash("ab") = hash('a') + hash('b')
Collisions between strings with the same characters in different order is ok
Generated hash should be an integer of native size (usually 32/64 bits)
String may contain any character from 0-256 (length is known, not \0 terminated)
The ascii alpha-numeric characters will be by far the most used
A disproportionate number of strings will be 1-8 ASCII characters
A very tiny percentage of the strings will actually contain bytes with values at or above 127
If this is a type of algorithm that has terminology associated with it, I would love to know that terminology. If I knew what a proper term/name for this type of hashing algorithm was it would be much easier to google.
I am thinking the simplest way to achieve this is:
Any byte's hash should be its value, normalized to <128 (if >128 subtract 128)
To get the hash of a string you normalize each byte to <128 and add it to the key
Depending on key size I may need to limit how many characters are used to hash to avoid overflow
I don't see anything wrong with just adding each (unsigned) byte value to create a hash which is just the sum of all the characters. There is nothing wrong with having an overflow: even if you reach the 32/64 bit limit (and it would have to be a VERY/EXTREMELY long string to do this) the overflow into a negative number won't matter in 2's complement arithmetic. As this is a linear process it doesn't matter how you split your string.