Is it bad practice to use mutable objects as Hashmap keys? What happens when you try to retrieve a value from a Hashmap using a key that has been modified enough to change its hashcode?
For example, given
class Key
{
int a; //mutable field
int b; //mutable field
public int hashcode()
return foo(a, b);
// setters setA and setB omitted for brevity
}
with code
HashMap<Key, Value> map = new HashMap<Key, Value>();
Key key1 = new Key(0, 0);
map.put(key1, value1); // value1 is an instance of Value
key1.setA(5);
key1.setB(10);
What happens if we now call map.get(key1)? Is this safe or advisable? Or is the behavior dependent on the language?
It has been noted by many well respected developers such as Brian Goetz and Josh Bloch that :
If an object’s hashCode() value can change based on its state, then we
must be careful when using such objects as keys in hash-based
collections to ensure that we don’t allow their state to change when
they are being used as hash keys. All hash-based collections assume
that an object’s hash value does not change while it is in use as a
key in the collection. If a key’s hash code were to change while it
was in a collection, some unpredictable and confusing consequences
could follow. This is usually not a problem in practice — it is not
common practice to use a mutable object like a List as a key in a
HashMap.
This is not safe or advisable. The value mapped to by key1 can never be retrieved. When doing a retrieval, most hash maps will do something like
Object get(Object key) {
int hash = key.hashCode();
//simplified, ignores hash collisions,
Entry entry = getEntry(hash);
if(entry != null && entry.getKey().equals(key)) {
return entry.getValue();
}
return null;
}
In this example, key1.hashcode() now points to the wrong bucket of the hash table, and you will not be able to retrieve value1 with key1.
If you had done something like,
Key key1 = new Key(0, 0);
map.put(key1, value1);
key1.setA(5);
Key key2 = new Key(0, 0);
map.get(key2);
This will also not retrieve value1, as key1 and key2 are no longer equal, so this check
if(entry != null && entry.getKey().equals(key))
will fail.
Hash maps use hash code and equality comparisons to identify a certain key-value pair with a given key. If the has map keeps the key as a reference to the mutable object, it would work in the cases where the same instance is used to retrieve the value. Consider however, the following case:
T keyOne = ...;
T keyTwo = ...;
// At this point keyOne and keyTwo are different instances and
// keyOne.equals(keyTwo) is true.
HashMap myMap = new HashMap();
myMap.push(keyOne, "Hello");
String s1 = (String) myMap.get(keyOne); // s1 is "Hello"
String s2 = (String) myMap.get(keyTwo); // s2 is "Hello"
// because keyOne equals keyTwo
mutate(keyOne);
s1 = myMap.get(keyOne); // returns "Hello"
s2 = myMap.get(keyTwo); // not found
The above is true if the key is stored as a reference. In Java usually this is the case. In .NET for instance, if the key is a value type (always passed by value), the result will be different:
T keyOne = ...;
T keyTwo = ...;
// At this point keyOne and keyTwo are different instances
// and keyOne.equals(keyTwo) is true.
Dictionary myMap = new Dictionary();
myMap.Add(keyOne, "Hello");
String s1 = (String) myMap[keyOne]; // s1 is "Hello"
String s2 = (String) myMap[keyTwo]; // s2 is "Hello"
// because keyOne equals keyTwo
mutate(keyOne);
s1 = myMap[keyOne]; // not found
s2 = myMap[keyTwo]; // returns "Hello"
Other technologies might have other different behaviors. However, almost all of them would come to a situation where the result of using mutable keys is not deterministic, which is very very bad situation in an application - a hard to debug and even harder to understand.
If key’s hash code changes after the key-value pair (Entry) is stored in HashMap, the map will not be able to retrieve the Entry.
Key’s hashcode can change if the key object is mutable. Mutable keys in HahsMap can result in data loss.
This will not work. You are changing the key value, so you are basically throwing it away. Its like creating a real life key and lock, and then changing the key and trying to put it back in the lock.
As others explained, it is dangerous.
A way to avoid that is to have a const field giving explicitly the hash in your mutable objects (so you would hash on their "identity", not their "state"). You might even initialize that hash field more or less randomly.
Another trick would be to use the address, e.g. (intptr_t) reinterpret_cast<void*>(this) as a basis for hash.
In all cases, you have to give up hashing the changing state of the object.
There are two very different issues that can arise with a mutable key depending on your expectation of behavior.
First Problem: (probably most trivial--but hell it gave me problems that I didn't think about!)
You are attempting to place key-value pairs into a map by updating and modifying the same key object. You might do something like Map<Integer, String> and simply say:
int key = 0;
loop {
map.put(key++, newString);
}
I'm reusing the "object" key to create a map. This works fine in Java because of autoboxing where each new value of key gets autoboxed to a new Integer object. What would not work is if I created my own (mutable) Integer object:
MyInteger {
int value;
plusOne(){
value++;
}
}
Then tried the same approach:
MyInteger key = new MyInteger(0);
loop{
map.put(key.plusOne(), newString)
}
My expectation is that, for instance, I map 0 -> "a" and 1 -> "b". In the first example, if I change int key = 0, the map will (correctly) give me "a". For simplicity let's assume MyInteger just always returns the same hashCode() (if you can somehow manage to create unique hashCode values for all possible states of an object, this will not be an issue, and you deserve an award). In this case, I call 0 -> "a", so now the map holds my key and maps it to "a", I then modify key = 1 and try to put 1 -> "b". We have a problem! The hashCode() is the same, and the only key in the HashMap is my MyInteger key object which has just been modified to be equal to 1, so It overwrites that key's value so that now, instead of a map with 0 -> "a" and 1 -> "b", I have 1 -> "b" only! Even worse, if I change back to key = 0, the hashCode points to 1 -> "b", but since the HashMap's only key is my key object, it satisfied the equality check and returns "b", not "a" as expected.
If, like me, you fall prey to this type of issue, it's incredibly difficult to diagnose. Why? Because if you have a decent hashCode() function it will generate (mostly) unique values. The hash value will largely take care of the inequality problem when structuring the map but if you have enough values, eventually you'll get a collision on the hash value and then you get unexpected and largely inexplicable results. The resultant behavior is that it works for small runs but fails for larger ones.
Advice:
To find this type of issue, modify the hashCode() method, even trivially (i.e. = 0--obviously when doing this, keep in mind that the hash values should be the same for two equal objects*), and see if you get the same results--because you should and if you don't, there's likely a semantic error with your implementation that's using a hash table.
*There should be no danger (if there is--you have a semantic problem) in always returning 0 from a hashCode() (although it would defeat the purpose of a Hash Table). But that's sort of the point: the hashCode is a "quick and easy" equality measure that's not exact. So two very different objects could have the same hashCode() yet not be equal. On the other hand, two equal objects must always have the same hashCode() value.
p.s. In Java, from my understanding, if you do such a terrible thing (as have many hashCode() collisions), it will start using a red-black-tree as opposed to ArrayList. So when you expect O(1) lookup, you'll get O(log(n))--which is better than the ArrayList which would give O(n).
Second Problem:
This is the one that most others seem to be focusing on, so I'll try to be brief. In this use case, I try to map a key-value pair and then I do some work on the key and then want to come back and get my value.
Expectation: key -> value is mapped, I then modify key and try to get(key). I expect that will give me value.
It seems kind of obvious to me that this wouldn't work but I'm not above having tried to use things like Collections as a key before (and quite quickly realizing it doesn't work). It doesn't work because it's quite likely that the hash value of key has changed so you won't even be looking in the correct bucket.
This is why it's very inadvisable to use collections as keys. I would assume, if you were doing this, you're trying to establish a many-to-one relationship. So I have a class (as in teaching) and I want two groups to do two different projects. What I want is that given a group, what is their project? Simple, I divide the class in two, and I have group1 -> project1 and group2 -> project2. But wait! A new student arrives so I place them in group1. The problem is that group1 has now been modified and likely its hash value has changed, therefore trying to do get(group1) is likely to fail because it will look in a wrong or non-existent bucket of the HashMap.
The obvious solution to the above is to chain things--instead of using the groups as keys, give them labels (that don't change) that point to the group and therefore the project: g1 -> group1 and g1 -> project1, etc.
p.s.
Please make sure to define a hashCode() and equals(...) method for any object you expect to use as a key (eclipse and, I'm assuming, most IDE's can do this for you).
Code Example:
Here is a class which exhibits the two different "problem" behaviors. In this case, I attempt to map 0 -> "a", 1 -> "b", and 2 -> "c" (in each case). In the first problem, I do that by modifying the same object, in the second problem, I use unique objects, and in the second problem "fixed" I clone those unique objects. After that I take one of the "unique" keys (k0) and modify it to attempt to access the map. I expect this will give me a, b, c and null when the key is 3.
However, what happens is the following:
map.get(0) map1: 0 -> null, map2: 0 -> a, map3: 0 -> a
map.get(1) map1: 1 -> null, map2: 1 -> b, map3: 1 -> b
map.get(2) map1: 2 -> c, map2: 2 -> a, map3: 2 -> c
map.get(3) map1: 3 -> null, map2: 3 -> null, map3: 3 -> null
The first map ("first problem") fails because it only holds a single key, which was last updated and placed to equal 2, hence why it correctly returns "c" when k0 = 2 but returns null for the other two (the single key doesn't equal 0 or 1). The second map fails twice: the most obvious is that it returns "b" when I asked for k0 (because it's been modified--that's the "second problem" which seems kind of obvious when you do something like this). It fails a second time when it returns "a" after modifying k0 = 2 (which I would expect to be "c"). This is more due to the "first problem": there's a hash code collision and the tiebreaker is an equality check--but the map holds k0, which it (apparently for me--could theoretically be different for someone else) checked first and thus returned the first value, "a" even though had it kept checking, "c" would have also been a match. Finally, the 3rd map works perfectly because I'm enforcing that the map holds unique keys no matter what else I do (by cloning the object during insertion).
I want to make clear that I agree, cloning is not a solution! I simply added that as an example of why a map needs unique keys and how enforcing unique keys "fixes" the issue.
public class HashMapProblems {
private int value = 0;
public HashMapProblems() {
this(0);
}
public HashMapProblems(final int value) {
super();
this.value = value;
}
public void setValue(final int i) {
this.value = i;
}
#Override
public int hashCode() {
return value % 2;
}
#Override
public boolean equals(final Object o) {
return o instanceof HashMapProblems
&& value == ((HashMapProblems) o).value;
}
#Override
public Object clone() {
return new HashMapProblems(value);
}
public void reset() {
this.value = 0;
}
public static void main(String[] args) {
final HashMapProblems k0 = new HashMapProblems(0);
final HashMapProblems k1 = new HashMapProblems(1);
final HashMapProblems k2 = new HashMapProblems(2);
final HashMapProblems k = new HashMapProblems();
final HashMap<HashMapProblems, String> map1 = firstProblem(k);
final HashMap<HashMapProblems, String> map2 = secondProblem(k0, k1, k2);
final HashMap<HashMapProblems, String> map3 = secondProblemFixed(k0, k1, k2);
for (int i = 0; i < 4; ++i) {
k0.setValue(i);
System.out.printf(
"map.get(%d) map1: %d -> %s, map2: %d -> %s, map3: %d -> %s",
i, i, map1.get(k0), i, map2.get(k0), i, map3.get(k0));
System.out.println();
}
}
private static HashMap<HashMapProblems, String> firstProblem(
final HashMapProblems start) {
start.reset();
final HashMap<HashMapProblems, String> map = new HashMap<>();
map.put(start, "a");
start.setValue(1);
map.put(start, "b");
start.setValue(2);
map.put(start, "c");
return map;
}
private static HashMap<HashMapProblems, String> secondProblem(
final HashMapProblems... keys) {
final HashMap<HashMapProblems, String> map = new HashMap<>();
IntStream.range(0, keys.length).forEach(
index -> map.put(keys[index], "" + (char) ('a' + index)));
return map;
}
private static HashMap<HashMapProblems, String> secondProblemFixed(
final HashMapProblems... keys) {
final HashMap<HashMapProblems, String> map = new HashMap<>();
IntStream.range(0, keys.length)
.forEach(index -> map.put((HashMapProblems) keys[index].clone(),
"" + (char) ('a' + index)));
return map;
}
}
Some Notes:
In the above it should be noted that map1 only holds two values because of the way I set up the hashCode() function to split odds and evens. k = 0 and k = 2 therefore have the same hashCode of 0. So when I modify k = 2 and attempt to k -> "c" the mapping k -> "a" gets overwritten--k -> "b" is still there because it exists in a different bucket.
Also there are a lot of different ways to examine the maps in the above code and I would encourage people that are curious to do things like print out the values of the map and then the key to value mappings (you may be surprised by the results you get). Do things like play with changing the different "unique" keys (i.e. k0, k1, and k2), try changing the single key k. You could also see how even the secondProblemFixed isn't actually fixed because you could also gain access to the keys (for instance via Map::keySet) and modify them.
I won't repeat what others have said. Yes, it's inadvisable. But in my opinion, it's not overly obvious where the documentation states this.
You can find it on the JavaDoc for the Map interface:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map
Behaviour of a Map is not specified if value of an object is changed in a manner that affects equals comparision while object(Mutable) is a key. Even for Set also using mutable object as key is not a good idea.
Lets see a example here :
public class MapKeyShouldntBeMutable {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
Map<Employee,Integer> map=new HashMap<Employee,Integer>();
Employee e=new Employee();
Employee e1=new Employee();
Employee e2=new Employee();
Employee e3=new Employee();
Employee e4=new Employee();
e.setName("one");
e1.setName("one");
e2.setName("three");
e3.setName("four");
e4.setName("five");
map.put(e, 24);
map.put(e1, 25);
map.put(e2, 26);
map.put(e3, 27);
map.put(e4, 28);
e2.setName("one");
System.out.println(" is e equals e1 "+e.equals(e1));
System.out.println(map);
for(Employee s:map.keySet())
{
System.out.println("key : "+s.getName()+":value : "+map.get(s));
}
}
}
class Employee{
String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
#Override
public boolean equals(Object o){
Employee e=(Employee)o;
if(this.name.equalsIgnoreCase(e.getName()))
{
return true;
}
return false;
}
public int hashCode() {
int sum=0;
if(this.name!=null)
{
for(int i=0;i<this.name.toCharArray().length;i++)
{
sum=sum+(int)this.name.toCharArray()[i];
}
/*System.out.println("name :"+this.name+" code : "+sum);*/
}
return sum;
}
}
Here we are trying to add mutable object "Employee" to a map. It will work good if all keys added are distinct.Here I have overridden equals and hashcode for employee class.
See first I have added "e" and then "e1". For both of them equals() will be true and hashcode will be same. So map sees as if the same key is getting added so it should replace the old value with e1's value. Then we have added e2,e3,e4 we are fine as of now.
But when we are changing the value of an already added key i.e "e2" as one ,it becomes a key similar to one added earlier. Now the map will behave wired. Ideally e2 should replace the existing same key i.e e1.But now map takes this as well. And you will get this in o/p :
is e equals e1 true
{Employee#1aa=28, Employee#1bc=27, Employee#142=25, Employee#142=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : one:value : 25
See here both keys having one showing same value also. So its unexpected.Now run the same programme again by changing e2.setName("diffnt"); which is e2.setName("one"); here ...Now the o/p will be this :
is e equals e1 true
{Employee#1aa=28, Employee#1bc=27, Employee#142=25, Employee#27b=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : diffnt:value : null
So by adding changing the mutable key in a map is not encouraged.
To make the answer compact:
The root cause is that HashMap calculates an internal hash of the user's key object hashcode only once and stores it inside for own needs.
All other operations for data navigation inside the map are doing by this pre-calculated internal hash.
So if you change the hashcode of the key object (mutate) it will be still stored nicely inside the map with the changed key object's hashcode (you could even observe it via HashMap.keySet() and see the altered hashcode).
But HashMap internal hash will not be recalculated of course and it will be the old stored one and the map won't be able to locate your data by the provided mutated key object new hashcode. (e.g. by HashMap.get() or HashMap.containsKey()).
Your key-value pairs will be still inside the map but to get it back you will need that old hash code value that was given when you put your data into the map.
Notice that you also will be unable to get data back by the mutated key object taken right from the HashMap.keySet().
First the context, im trying to use Redis as an in-memory store backed with persistence. I need to store large number of objects (millions) in a Redis Hash.
At the same time, i don't want my redis instance to consume too much memory. So i've set the maxmemory property in redis.conf to 100mb. I've set maxmemory-policy as allkeys-random The persistece mode is AOF and fysnc is every second.
Now the issue iam facing is, every time i try to store more than two hundred thousand ojects in the hash, the hash gets reset (ie all the existing key values in the hash vanishes ). I confirm this by using the hlen command on the hash in the redis-cli.
Find below the object im trying to store
public class Employee implements Serializable {
private static final long serialVersionUID = 1L;
int id;
String name;
String department;
String address;
/* Getters and Setters */
/* Hashcode - Generates hashcode (key) for each object */
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((address == null) ? 0 : address.hashCode());
result = prime * result + ((department == null) ? 0 : department.hashCode());
result = prime * result + id;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
}
Also, Find below the code that stores into redis (Im using Jedis to interact with Redis )
JedisPool jedisPool = new JedisPool(new JedisPoolConfig(), "localhost");
Jedis jedis = (Jedis) jedisPool.getResource();
System.out.println("Starting....");
for(int i=0;i<1000000;i++) {
/* Converting object to byte array */
Employee employee = new Employee(i, "Arun Jolly", "IT", "SomeCompany");
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
ObjectOutputStream objectOutputStream = new ObjectOutputStream(byteArrayOutputStream);
objectOutputStream.writeObject(employee);
byte[] value = byteArrayOutputStream.toByteArray();
/* Creating key in byte array format using hashCode() */
ByteBuffer buffer = ByteBuffer.allocate(128);
buffer.putInt(employee.hashCode());
byte[] field = buffer.array();
/* Specyfying the Redis Hash in byte array format */
String tableName = "Employee_Details";
byte[] key = tableName.getBytes();
jedis.hset(key, field, value);
System.out.println("Stored Employee "+i);
}
Am i missing something ?
Does this mean that redis does not swap out to the disk once the maxmemory is reached ( Is it trying to hold all the key values in memory ? ) Does it mean that i have to incrementally increase the maxmemory limit according to the increase in the number of key-value pairs that i might have to store ?
Am I missing something?
Yes. Redis is a pure in-memory store, with persistency options. Everything must fit in memory.
Does this mean that redis does not swap out to the disk once the maxmemory is reached.
Precisely.
Is it trying to hold all the key values in memory?
Keys and values, yes.
Does it mean that I have to incrementally increase the max memory limit according to the increase in the number of key-value pairs that I might have to store?
You need to decide upfront how much memory you allocate to Redis, yes.
If you are memory constrained, you will be better served with a disk-based store.