I am looking for a one-way hash in hive.
Seems like there is already a hash function - hash(a1[, a2...])
What kind of hash function is this? Is this a one-way hash?
Related
I have a use case where I have to search by key and in another use case, I have to search by value. Given this scenario, what's the best approach as scanning the entire cache can degrade performance (to filter by value).
Do reverse store i.e store value as key and key as the value in the same logical table?
Use different database and store Value, Key as K | V. I see a few posts that suggest using a different database is a bad idea and deprecated?
Or is there a better alternative/approach?
Do you really need to use Redis? Redis (and generally key-value stores) are not optimized for this kind of task.
If you need to stick with Redis you can create index to implement search by value. It will not be as storage effective and intuitive as e.g. SQL database table though. See documentation here: https://redis.io/topics/indexes
I am new to Oracle Hash function. I know that this function is for encryption purpose. It is actually convert a very large paragraph into one single hash value.
The Ora_hash function have three different parameters:
Expression
Max_bucket
Seed_value
For the Max_bucket and seed value, the document says I can specify between 0 to 429496725. Max_bucket is default to 429496725 and Seed_Value is default to 0.
However, does anyone know what is the difference between 0 and 429496725 for those values?
I am actually planning to use it to compare two columns from two different tables, each rows in each columns have close to 3000 characters, and 1 table will have close to 1 million of records while the other will have close to billions of records. Of course both table can be joined with an ID columns.
As a result of this, I think using a hash value will be a better option than simply using A = B.
However, could anyone teach me how to identify best Max_bucket and Seed_value for Oracle ORA_Hash function?
Thanks in advance!
ORA_HASH is not intended for generating unique hash values. You probably want to use a function like STANDARD_HASH instead.
ORA_HASH is intended for situations where you want to quickly throw a bunch of values into a group of buckets, and hash collisions are useful. ORA_HASH is useful for hash partitioning; for example, you might want to split a table into 64 segments to improve manageability.
STANDARD_HASH can be used to generate practically-unique hashes, using algorithms like MD5 or SHA. These hash algorithms are useful for cryptographic purposes, whereas ORA_HASH would not be suitable. For example:
select standard_hash('asdf') the_hash from dual;
THE_HASH
--------
3DA541559918A808C2402BBA5012F6C60B27661C
I have to perform a full scan and get the PK result as well.
I know that the PK is not stored by default and I am pretty sure I did not store that in my persistence query.
I also know that what is stored is a hash of the key to avoid large keys.
I got that information from: AQL - How to show PK in a SELECT
Now, is there a way to reverse engineer the hash and get the PK?
There's no way to reverse engineer the digest into the original PK, unfortunately. Can you deduce it from the data that's in the bins? The default policy regarding key is to use the digest only, rather than send the PK, because that takes extra space that you may not intend to use.
I'm interested in hashing database field values as part of an attempt to detect changes in tables.
The database in question (Vertica) has a HASH function, mainly for internal use I guess, as well as other hashes. The internal function assigns a non-null hash value to NULL (in fact, it differs for NULLs of different datatypes).
I might end up using that internal hash function, but if it turns out that its statistical properties and collision avoidance aren't that good, how can I use other provided functions like md5 etc (I don't need strong cryptographic hashes) when they all send NULL to NULL?
Of course I could just assign another hash value to NULL, but I don't know an elegant way to do that. (As opposed to expanding the set of hash values and adding one for NULL.)
You could simply select the part of the tables (I mean select the required columns from the table), generate a Hash Function of that queried data, and compare it the next time.
The query to a table is another table. Track change to that new table.
I'm using hash on an NSString to get an integer to uniquely represent a URL and then store it in Core Data to unique the object.
Is that enough to make sure it'll be unique? The URL string is usually 50 to 80 chars.
If it's not I'll gladly accept any suggestion to make it better!
No, the hash is not enough to unique an URL. The purpose of hash is to distribute objects like for example, computing the hash table index.
With hash code you can do a fast comparison and if two objects have two different hashes they're different, if they have the same hash you gotta use compare.