Creating unique hash code (string) in SQL Server from a combination of two or more columns (of different data types)

Creating unique hash code (string) in SQL Server from a combination of two or more columns (of different data types) - sql

I would like to create unique string columns (32 characters in length) from combination of columns with different data types in SQL Server 2005.

I have found out the solution elsewhere in StackOverflow
SELECT SUBSTRING(master.dbo.fn_varbintohexstr(HashBytes('MD5', 'HelloWorld')), 3, 32)
The answer thread is here

With HASBYTES you can create SHA1 hashes, that have 20 bytes, and you can create MD5 hashes, 16 bytes. There are various combination algorithms that can produce arbitrary length material by repeated hash operations, like the PRF of TLS (see RFC 2246).
This should be enough to get you started. You need to define what '32 characters' mean, since hash functions produce bytes not characters. Also, you need to internalize that no algorithm can possibly produce hashes of fixed length w/o collisions (guaranteed 'unique'). Although at 32 bytes length (assuming that by 'characters' you mean bytes) the theoretical collision probability of 50% is at 4x1038 hashed elements (see birthday problem), that assumes a perfect distribution for your 32 bytes output hash function, which you're not going to achieve.

Related

SQL: Cross-platform generation of N-digit unique identifier (SQL Server, Snowflake, etc.)

We have two databases/warehouses on two different platforms--Microsoft SQL Server and Snowflake (cloud data warehouse).
Across both, customers are identified via a unique AccountId (integer) and Uuid (32 character).
For a particular use case, we need to take one of these unique values (say, the AccountId for instance), pass it into a system function, and generate a unique 20-character identifier (it can't be longer/shorter).
This function needs to exist in both systems. (e.g. select sys.myfn(1234) returns the same in each)
I am aware that Snowflake has functions like sha1(): https://docs.snowflake.com/en/sql-reference/functions/sha1.html
Which are equivalent to HASHBYTES() in SQL Server: https://learn.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver15
How do I take the output from either and truncate it down to 20 characters and maintain uniqueness?

A UUID is a 128bit value (with a few bits reserved for version information). If you run that through a hash function, perform a base64 encoding of the hash, and then truncate to 20 characters, you still get 20 * 6 = 120 bits of range. The chance of collision is still in in the life-of-the-universe ballpark.
(Note: If you choose to base64 encode the UUID directly, truncation may yield collisions for sequentially assigned UUIDs.)
The integer value can be similarly encoded with little chance of collision with the UUID based values.
If you can find equivalent usable base64 wncoding implementations on both platforms, I think you will be on your way to a solution.

VB.net Hash Algorithm

I am working on a Desktop Application using VB.net with an existing database. Including the user's username and password, I want to do the login window using the existing password but it was hashed password. May I know what hash algorithm use in this data X8NUoMVWb/w6D4QdmumxoQ==?

You can make an educated guess simply by looking at the length of the hash, as generally there's only a handful of popular hashing algorithms used for passwords, all with their own distinct output lengths:
Hash
Output length (bytes)
Output length (bits)
MD5
16
128
SHA-1
24
160
SHA-2 (SHA256)
32
256
SHA-2 (SHA512)
64
512
You can never know for sure because while different hashing algorithms have different output sizes, the output can always be truncated (or padded with random bytes).
That said, X8NUoMVWb/w6D4QdmumxoQ== is a Base64-encoded binary value which decodes to a 16-byte value. 16 bytes is 128 bits - it's very likely this is an MD5 hash value.
The 16 bytes convert to Base 16 (hexadecimal) are 5FC354A0C5566FFC3A0F841D9AE9B1A1.
This MD5 hash doesn't appear in any freely available leaked password databases or hash-reverse services I tried.
Note that systems like bcrypt generate an output string which is not just a hash-value, but actually a data structure containing the hash and other data. In bcrypt's case the string always starts with $2 which will never appear in a Base16 or Base64-encoded string.

SQL Server : Taking Numerical Characters and Hashing them under with a max length of 20 characters

Hello I was trying to find a good way to hash a set of numerical numbers which its output would be under 20 characters that are positive and unique. Any one have any suggestions?

For hashing in general, I'd use the HASHBYTES function. You can then convert the binary data to a string and just pick the first 20 characters, that should still be unique enough.
To get around HASHBYTES limitations (8000 bytes for instance), you can incrementally hash, e.g. for each value concat the previous hash with the value to be added and hash that again. This will make it unique with order etc. and unless you append close to 8000 bytes in one value it will not cause data truncation for the hashing.

Parallelizable hashing algorithm where size and order of sub-strings is irrelevant

EDIT
Here is the problem I am trying to solve:
I have a string broken up into multiple parts. These parts are not of equal, or predictable length. Each part will have a hash value. When I concatenate parts I want to be able to use the hash values from each part to quickly get the hash value for the parts together. In addition the hash generated by putting the parts together must match the hash generated if the string were hashed as a whole.
Basically I want a hashing algorithm where the parts of the data being hashed can be hashed in parallel, and I do not want the order or length of the pieces to matter. I am not breaking up the string, but rather receiving it in unpredictable chunks in an unpredictable order.
I am willing to ensure an elevated collision rate, so long as it is not too elevated. I am also ok with a slightly slower algorithm as it is hardly noticeable on small strings, and done in parallel for large strings.
I am familiar with a few hashing algorithms, however I currently have a use-case for a hash algorithm with the property that the sum of two hashes is equal to a hash of the sum of the two items.
Requirements/givens
This algorithm will be hashing byte-strings with length of at least 1 byte
hash("ab") = hash('a') + hash('b')
Collisions between strings with the same characters in different order is ok
Generated hash should be an integer of native size (usually 32/64 bits)
String may contain any character from 0-256 (length is known, not \0 terminated)
The ascii alpha-numeric characters will be by far the most used
A disproportionate number of strings will be 1-8 ASCII characters
A very tiny percentage of the strings will actually contain bytes with values at or above 127
If this is a type of algorithm that has terminology associated with it, I would love to know that terminology. If I knew what a proper term/name for this type of hashing algorithm was it would be much easier to google.
I am thinking the simplest way to achieve this is:
Any byte's hash should be its value, normalized to <128 (if >128 subtract 128)
To get the hash of a string you normalize each byte to <128 and add it to the key
Depending on key size I may need to limit how many characters are used to hash to avoid overflow

I don't see anything wrong with just adding each (unsigned) byte value to create a hash which is just the sum of all the characters. There is nothing wrong with having an overflow: even if you reach the 32/64 bit limit (and it would have to be a VERY/EXTREMELY long string to do this) the overflow into a negative number won't matter in 2's complement arithmetic. As this is a linear process it doesn't matter how you split your string.

Hash Function for 2D Barcode Data

I am writing a string of about 120 characters to a 2D barcode. Along with other text, the string contains a unique ticket number. I want to ensure that someone doesn't generate counterfeit tickets by reading the 2D barcode and generation their own barcoded tickets.
I would like to hash the string and append the hash value to what gets embedded in the barcode. That way I can compare the two on reading and see if the data had been tampered with. I have seen several hash function that return 64 bytes and up but the more characters you embed in a 2D barcode the bigger the barcode image becomes. I would like an algorithm that returns a fairly small value. It would also be nice if I could provide the function my own key. Collision is not that big of a deal. This isn't any kind of national security application.
Any suggestions?

Use any standard hash function. Take the 120-character string; append your own secret value; feed it into SHA-1 or MD5 or whatever hash function you have handy or feel like implementing; then just take the first however-many bits you want and use that as your value. (If you need ASCII characters, then I suggest that you take groups of 6 bits and use a base-64 encoding.)
If the hash you're using is any good (as, e.g., MD5 and SHA-1 are; MD5 shouldn't be used for serious cryptographic algorithms these days but it sounds like it's good enough for your needs) then any set of bits from it will be "good enough" in the sense that no other function producing that many bits will be much better.
(Warning: For serious cryptographic use, you should be a little more careful. Look at, e.g., http://en.wikipedia.org/wiki/HMAC for more information. From your description, I do not believe you need to worry about such things.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas