RoR- Check whether an input is a MD5 hash or Not?

RoR- Check whether an input is a MD5 hash or Not? - ruby-on-rails-3

How can I check whether an input string is in the form of a Md5 hash or not in Rails3.0?

Consider all 32-digit long hexadecimal numbers (ie. consisting solely of letters a-f and digits 0-9) to be md5 hashes.
I don't know if md5's codomain is the whole space of 32-digit long hexadecimals, but a hash should ideally satisfy the condition so you may just assume it is.

Related

Determining the key of a Vigenere Cipher if key length is known

I'm struggling to get my head around the Vigenere Cipher when you know the length of the key but not what it is. I can decipher text if I know the key but I'm confused as to how to work out what the key actually is.
For one example I'm given cipher text and a key length of 6. That's all I'm given, I'm told the key is an arbitrary set of letters that don't necessarily have to make up a word in the english language, in other words, a random set of letters.
With this knowledge I've only so far broken up the cipher text into 6 subtexts, each containing the letters encrypted by the key letters so the first subtext contains every 6th letter starting with the first. The second every 6th letter starting with the second letter and so on.
What do I do now?

You calculate a letter frequency table for each letter of the key. If, as in your example, the key length is 6, you get 6 frequency tables. You should get similar frequencies, although not for the same letters. If you do not, the you have the wrong key length.
Now you check letter frequency tables for English (for example, see http://en.wikipedia.org/wiki/Letter_frequency). If the pattern does not match, the clear text was not in English. If it does, assign the most frequent letters in each subtext to the most frequent letters in the frequency table etc. and see what you get. You should note that your text may have slightly different frequencies, the reference tables are statistics based on a large amount of data. Now you need to use you head.
Using common digrams (such as th and sh in English) can help.

One approach is frequency analysis. Take each of the six groups and build a frequency table for each character. Then compare that table to a table of known frequencies for the plaintext (if it's standard text, this would just be the English language).
A second, possibly simpler, approach is to just brute-force each character. The number of possible keys is 26^6 ~= 300,000,000, which is about 29 bits of key space. This is brute-forceable but would probably take a bit of time on a personal computer. But if you brute-force one character at a time would only take 26*6 = 156 tries. To do so, write a function that "scores" an attempted decrypted plaintext with how "plaintext-like" it looks. You might do frequency analysis like above, but there can be simpler tests. Then brute-force each of the six sets of characters and pick the key letter that scores the best for decrypting each one of them.

SQL Server : Taking Numerical Characters and Hashing them under with a max length of 20 characters

Hello I was trying to find a good way to hash a set of numerical numbers which its output would be under 20 characters that are positive and unique. Any one have any suggestions?

For hashing in general, I'd use the HASHBYTES function. You can then convert the binary data to a string and just pick the first 20 characters, that should still be unique enough.
To get around HASHBYTES limitations (8000 bytes for instance), you can incrementally hash, e.g. for each value concat the previous hash with the value to be added and hash that again. This will make it unique with order etc. and unless you append close to 8000 bytes in one value it will not cause data truncation for the hashing.

Parallelizable hashing algorithm where size and order of sub-strings is irrelevant

EDIT
Here is the problem I am trying to solve:
I have a string broken up into multiple parts. These parts are not of equal, or predictable length. Each part will have a hash value. When I concatenate parts I want to be able to use the hash values from each part to quickly get the hash value for the parts together. In addition the hash generated by putting the parts together must match the hash generated if the string were hashed as a whole.
Basically I want a hashing algorithm where the parts of the data being hashed can be hashed in parallel, and I do not want the order or length of the pieces to matter. I am not breaking up the string, but rather receiving it in unpredictable chunks in an unpredictable order.
I am willing to ensure an elevated collision rate, so long as it is not too elevated. I am also ok with a slightly slower algorithm as it is hardly noticeable on small strings, and done in parallel for large strings.
I am familiar with a few hashing algorithms, however I currently have a use-case for a hash algorithm with the property that the sum of two hashes is equal to a hash of the sum of the two items.
Requirements/givens
This algorithm will be hashing byte-strings with length of at least 1 byte
hash("ab") = hash('a') + hash('b')
Collisions between strings with the same characters in different order is ok
Generated hash should be an integer of native size (usually 32/64 bits)
String may contain any character from 0-256 (length is known, not \0 terminated)
The ascii alpha-numeric characters will be by far the most used
A disproportionate number of strings will be 1-8 ASCII characters
A very tiny percentage of the strings will actually contain bytes with values at or above 127
If this is a type of algorithm that has terminology associated with it, I would love to know that terminology. If I knew what a proper term/name for this type of hashing algorithm was it would be much easier to google.
I am thinking the simplest way to achieve this is:
Any byte's hash should be its value, normalized to <128 (if >128 subtract 128)
To get the hash of a string you normalize each byte to <128 and add it to the key
Depending on key size I may need to limit how many characters are used to hash to avoid overflow

I don't see anything wrong with just adding each (unsigned) byte value to create a hash which is just the sum of all the characters. There is nothing wrong with having an overflow: even if you reach the 32/64 bit limit (and it would have to be a VERY/EXTREMELY long string to do this) the overflow into a negative number won't matter in 2's complement arithmetic. As this is a linear process it doesn't matter how you split your string.

Hash Function for 2D Barcode Data

I am writing a string of about 120 characters to a 2D barcode. Along with other text, the string contains a unique ticket number. I want to ensure that someone doesn't generate counterfeit tickets by reading the 2D barcode and generation their own barcoded tickets.
I would like to hash the string and append the hash value to what gets embedded in the barcode. That way I can compare the two on reading and see if the data had been tampered with. I have seen several hash function that return 64 bytes and up but the more characters you embed in a 2D barcode the bigger the barcode image becomes. I would like an algorithm that returns a fairly small value. It would also be nice if I could provide the function my own key. Collision is not that big of a deal. This isn't any kind of national security application.
Any suggestions?

Use any standard hash function. Take the 120-character string; append your own secret value; feed it into SHA-1 or MD5 or whatever hash function you have handy or feel like implementing; then just take the first however-many bits you want and use that as your value. (If you need ASCII characters, then I suggest that you take groups of 6 bits and use a base-64 encoding.)
If the hash you're using is any good (as, e.g., MD5 and SHA-1 are; MD5 shouldn't be used for serious cryptographic algorithms these days but it sounds like it's good enough for your needs) then any set of bits from it will be "good enough" in the sense that no other function producing that many bits will be much better.
(Warning: For serious cryptographic use, you should be a little more careful. Look at, e.g., http://en.wikipedia.org/wiki/HMAC for more information. From your description, I do not believe you need to worry about such things.)

Creating unique hash code (string) in SQL Server from a combination of two or more columns (of different data types)

I would like to create unique string columns (32 characters in length) from combination of columns with different data types in SQL Server 2005.

I have found out the solution elsewhere in StackOverflow
SELECT SUBSTRING(master.dbo.fn_varbintohexstr(HashBytes('MD5', 'HelloWorld')), 3, 32)
The answer thread is here

With HASBYTES you can create SHA1 hashes, that have 20 bytes, and you can create MD5 hashes, 16 bytes. There are various combination algorithms that can produce arbitrary length material by repeated hash operations, like the PRF of TLS (see RFC 2246).
This should be enough to get you started. You need to define what '32 characters' mean, since hash functions produce bytes not characters. Also, you need to internalize that no algorithm can possibly produce hashes of fixed length w/o collisions (guaranteed 'unique'). Although at 32 bytes length (assuming that by 'characters' you mean bytes) the theoretical collision probability of 50% is at 4x1038 hashed elements (see birthday problem), that assumes a perfect distribution for your 32 bytes output hash function, which you're not going to achieve.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas