Efficiently make a view of (or copy) a subset of a large HashMap in Kotlin - kotlin

I am trying to create a subhashmap from a huge hashmap without copy the original one.
currently I use this:
val map = hashMapOf<Job, Int>()
val copy = HashMap(map)
listToRemoveFromCopy.forEach { copy.remove(it) }
this cost me around 50% of my current algorithm. Because java is calculating the hash of the job really often.
I only want the map minus the listToRemoveFromCopy in a new variable without removing the listToRemoveFromCopy elements from the original list.
anyone know this?
Thanks for help

First, you need to cache the hashcode for Job because any approach you use will be inefficient if you cannot have a set or a map of Job objects that operate at top speed.
Hopefully, the parts that make it a hashcode are immutable otherwise it should not be used as a key. It is very dangerous to mutate a key hashcode/equals while in use in a map or set. You should cache it on the first call to hashCode() so that you do not incur a cost until then unless you are sure you will always need it.
Then change listToRemoveFromCopy to be a Set so it can be efficiently used in many ways. You need to do the prior step before this.
Now you have multiple options. The most efficient is:
Guava has a utility function Maps.filterKeys which returns a view into a map, and you can create a predicate that works against a Set of the items to remove.
val removeKeys = listToRemoveFromCopy.toSet()
val mapView = Maps.filterKeys(map, Predicates.not(Predicates.in(removeKeys)))
But be sure to note some methods on the view are not very efficient. If you avoid those methods, this will be the top performing option:
Many of the filtered map's methods, such as size(), iterate across every key/value mapping in the underlying map and determine which satisfy the filter. When a live view is not needed, it may be faster to copy the filtered map and use the copy.
If you need to make a copy instead, you have a few approaches:
Use filterKeys on the map to create a new map in one shot. This is good if the remove list might be a larger percentage of the total keys.
val removeKeys = listToRemoveFromCopy.toSet()
val newMap = map.filterKeys { it !in removeKeys }
Another tempting option you should be careful about is the minus - operator which copies the full map and then removes the items. It can use the listToRemoveFromCopy as-is without it being a set, but the full map copy might undo the benefit. So do not do this unless the remove list is a small percentage of keys.
val newMapButSlower = map - listToRemoveFromCopy
You could pick one model over the other depending on the ratio between map size and remove list size, find a breaking point that works for your "huge".
Implementing your own view into the map to avoid a copy is possible, but not trivial (and by that I mean very complex). Every method you override has to do the correct thing at all times (including the map's own hashCode and equals), and other views would have to be created around the key set and values. The entrySet would be nasty to get right. I'd look for a pre-written solution before attempting your own (the Guava one above or other). This zero-copy model would be the most efficient solution but the most code and is what I would do in the same case if "huge" meant significant processing time. There is a lot that you can get wrong with this approach if you misunderstand any part of the implementation contract.
You could wrap the Guava solution with one that maintains the size attribute as items are manipulated and therefore be efficient for that case. You can also write a more efficient solution if you know the original map is read-only. For ideas, check out the Guava implementation of FilteredKeyMap and its ancestor AbstractFilteredMap.
In summary, likely the caching of your hashcode is going to give you the biggest result for the effort. Start there. You'll need it to do even for the Guava approach.

In addition to Axel's direct answer:
Could calculating the hashcode of a Job be optimised?  If the calculation can't be sped up, could it cache the result?  (There's ample precedent for this, including java.lang.String.)  Or if the class isn't under your control, could you create a delegate/wrapper that overrides the hashcode calculation?

You can use filterKeys function. It will iterate map only once
val copy = map.filterKeys { it !in listToRemoveFromCopy }

Related

Possible race condition when enabling multithreading

Suppose I have a slight variant of the cloud balancing problem, in which the Process has not just one weight, but a map of (positive) weights, such as
Map<Long, Long> groupMap = new HashMap<>();
where the the key is specific to my domain and the value is the weight.
On the class Computer (still referring to the cloud balancing example) I have a shadow variable hist which is also a (Hash)Map<Long, Long>, and a custom listener updating hist:
public class HistListener implements VariableListener {
#Override
public void beforeVariableChanged(ScoreDirector scoreDirector, Object o) {
Process p = (Process) o;
if (p.getComputer() != null) {
Computer kc = p.getComputer();
//update hist Map
scoreDirector.beforeVariableChanged(kc, "hist");
for (Map.Entry<Long, Long> entrySet:k.getGroupMap().entrySet()){
kc.getHist().put(entrySet.getKey(), kc.getHist().get(entrySet.getKey()) - k.getGroupMap().get(entrySet.getKey()));
}
scoreDirector.afterVariableChanged(kc, "hist");
}
}
and pretty much the same for afterVariableChanged just with reversed sign.
I annotate both Process and Computer as #PlanningEntity and register them in the solverConfig.
There are no constraints, so the solver should be able to assign the computers to the processes arbitrarily. As a result, I expect hist only to have natural numbers (incl. 0) as values.
When running it with <moveThreadCount>NONE</moveThreadCount>, this is indeed the case:
<"Computer"+computer.id: hist>
Computer0: {0=0, 1=0, 2=20, 3=0, 4=10, 5=20, 6=0, 7=10, 8=10, 9=20}
Computer1: {0=0, 1=10, 2=0, 3=0, 4=10, 5=0, 6=10, 7=0, 8=0, 9=0}
Computer2: {0=0, 1=0, 2=0, 3=0, 4=0, 5=0, 6=0, 7=0, 8=0, 9=0}
When running exactly the same code with <moveThreadCount>AUTO</moveThreadCount>, I partially get negative values in hist:
Computer0: {0=0, 1=-20, 2=30, 3=0, 4=-40, 5=50, 6=-10, 7=30, 8=40, 9=150}
Computer1: {0=0, 1=-40, 2=-20, 3=0, 4=-90, 5=-50, 6=-40, 7=-20, 8=-20, 9=-30}
Computer2: {0=0, 1=80, 2=-20, 3=0, 4=30, 5=-30, 6=50, 7=0, 8=-20, 9=-50}
This discrepancy disappears when I refactor the keys of groupMap on process and those of hist on computer as individual shadow variables.
The trace logs suggest a race condition, where several threads access hist simultaneously. (According the Oracle docs, I only need a synchronizedMap implementation if the map is structurally changed, i.e., if keys are added or removed - I'm not doing that.)
The use of a Map as a shadow variable greatly enhances the flexibility of my solution, it would be great if this were supported with multithreading. I know I could probably fix this very simply example with an appropriate ConstraintProvider. My actual problem is much more complex than this and is not amenable to be treated with ConstraintProviders.
Question: Is it possible to have a Map based structure as a shadow variable in a multithreading context?
If it is not possible , I recommend adding a short note in the docs of optaplanner 8.29.0.Final (the version I'm using).
I had a look at questions regarding Lists as PlanningVariables in optaplanner, but I don't see how these questions relate to mine.
Is it possible to have a Map based structure as a shadow variable in a multithreading context?
Yes, because each move thread in a multithreading context has it's own ScoreDirector and own workingSolution internally. From a shadow variable's point of view and that map, it's single threaded.
What can mess this up?
Bad #PlanningId's in your dataset so the Move.rebase() operations go wrong. Duplicate IDs or lack of IDs. OptaPlanner detects most of these. Unlikely that this is your problem.
Incomplete planning cloning in your model. That's probably it. This will also cause issues you haven't seen yet in a single threaded context, especially when the last working solution greatly differs from the last best found solution when the termination runs out. FULL_ASSERT should detect those, but they might not occur on every run...
Each move thread has their own workingSolution internally. That's not entirely true. They all have a planning clone from the original. But if the planning clone doesn't clone all of the shadow variable affected data, it's corrupted. In a multithreaded solving context this will cause issues much faster.
Ok, this is getting complex. How do I solve this?
Experiment with adding a #DeepPlanningClone annotation on your Map field. But making a shadow variable already implies deep planning cloning it automatically IIRC. My guess it's keys or values in that map that need to get planning cloned too. Read the planning clone section in the docs.

How to create a new list of Strings from a list of Longs in Kotlin? (inline if possible)

I have a list of Longs in Kotlin and I want to make them strings for UI purposes with maybe some prefix or altered in some way. For example, adding "$" in the front or the word "dollars" at the end.
I know I can simply iterate over them all like:
val myNewStrings = ArrayList<String>()
longValues.forEach { myNewStrings.add("$it dollars") }
I guess I'm just getting nitpicky, but I feel like there is a way to inline this or change the original long list without creating a new string list?
EDIT/UPDATE: Sorry for the initial confusion of my terms. I meant writing the code in one line and not inlining a function. I knew it was possible, but couldn't remember kotlin's map function feature at the time of writing. Thank you all for the useful information though. I learned a lot, thanks.
You are looking for a map, a map takes a lambda, and creates a list based on the result of the lambda
val myNewStrings = longValues.map { "$it dollars" }
map is an extension that has 2 generic types, the first is for knowing what type is iterating and the second what type is going to return. The lambda we pass as argument is actually transform: (T) -> R so you can see it has to be a function that receives a T which is the source type and then returns an R which is the lambda result. Lambdas doesn't need to specify return because the last line is the return by default.
You can use the map-function on List. It creates a new list where every element has been applied a function.
Like this:
val myNewStrings = longValues.map { "$it dollars" }
In Kotlin inline is a keyword that refers to the compiler substituting a function call with the contents of the function directly. I don't think that's what you're asking about here. Maybe you meant you want to write the code on one line.
You might want to read over the Collections documentation, specifically the Mapping section.
The mapping transformation creates a collection from the results of a
function on the elements of another collection. The basic mapping
function is
map().
It applies the given lambda function to each subsequent element and
returns the list of the lambda results. The order of results is the
same as the original order of elements.
val numbers = setOf(1, 2, 3)
println(numbers.map { it * 3 })
For your example, this would look as the others said:
val myNewStrings = longValues.map { "$it dollars" }
I feel like there is a way to inline this or change the original long list without creating a new string list?
No. You have Longs, and you want Strings. The only way is to create new Strings. You could avoid creating a new List by changing the type of the original list from List<Long> to List<Any> and editing it in place, but that would be overkill and make the code overly complex, harder to follow, and more error-prone.
Like people have said, unless there's a performance issue here (like a billion strings where you're only using a handful) just creating the list you want is probably the way to go. You have a few options though!
Sequences are lazily evaluated, when there's a long chain of operations they complete the chain on each item in turn, instead of creating an intermediate full list for every operation in the chain. So that can mean less memory use, and more efficiency if you only need certain items, or you want to stop early. They have overhead though so you need to be sure it's worth it, and for your use-case (turning a list into another list) there are no intermediate lists to avoid, and I'm guessing you're using the whole thing. Probably better to just make the String list, once, and then use it?
Your other option is to make a function that takes a Long and makes a String - whatever function you're passing to map, basically, except use it when you need it. If you have a very large number of Longs and you really don't want every possible String version in memory, just generate them whenever you display them. You could make it an extension function or property if you like, so you can just go
fun Long.display() = "$this dollars"
val Long.dollaridoos: String get() = "$this.dollars"
print(number.display())
print(number.dollaridoos)
or make a wrapper object holding your list and giving access to a stringified version of the values. Whatever's your jam
Also the map approach is more efficient than creating an ArrayList and adding to it, because it can allocate a list with the correct capacity from the get-go - arbitrarily adding to an unsized list will keep growing it when it gets too big, then it has to copy to another (larger) array... until that one fills up, then it happens again...

Why must MutableMap.keys returns a MutableSet?

The MutableMap.keys property is defined as : abstract val keys: MutableSet<K>
I understand that the content of keys will change as the underlying map will change, but how can the keys it-self be modifiable ? IE : I see no logic in calling map.keys.add(xxx)
Rq: I came into this problem while creating a proxy around a MutableMap. I have to temper the entries and keys content, but do not want to implement the remove/add/clear methods
The MutableSet returned by keys throws UnsupportedOperationException if you try to add something. It provides remove and filtering (retainAll) operations, which can simplify actions that don't need to consider the values, only the keys.
If you're already using a MutableMap, it makes sense that you should also be able to work with the keys directly in a mutable way.
It corresponds to the Java Map#keySet() method which is documented as follows:
Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined. The set supports element removal, which removes the corresponding mapping from the map, via the Iterator.remove, Set.remove, removeAll, retainAll, and clear operations. It does not support the add or addAll operations.
The bolded parts explain why it's represented as MutableSet in Kotlin; otherwise you couldn't port Java code using these capabilities.

Kotlin most efficient way sequence to set

Hi I've got list of 1330 objects and would like to apply method and obtain set as result.
val result = listOf1330
.asSequence()
.map {
someMethod(it)
}
val resultSet = result.toSet()
It works fine without toSet but if then execution time is about 10 times longer.
I've used sequence to make it work faster and it is but as a result I need list without duplicates (set).
Simply: What is most effective way to convert sequence to set?
val result = listOf1330.mapTo(HashSet()) { someMethod(it) }
It makes less sense to use streams or sequences to implement the transformation - you will need all elements from the collection, not several. The mapTo (and map) functions are inline in Kotlin. It means the code will be substituted into the call site, it will not have lambda created and executed many times. We use mapTo to avoid the second copy of the collection done by the toSet() function.
The .parallelStream() may add more performance, if you like to run the computation in several threads. It is still a good idea to measure how good the load is balanced between threads. The performance may depend on the collection implementation class, on which you call it
If your someObject has a slow implementation of equals() or hashCode(), or gives the same hash code for many objects, then that could account for the delay, and you may be able to improve it.
Otherwise, if the objects are big, the delay may be mostly due to the amount of memory that must be accessed to store them all; if so, that's the price you'll have to pay if you want a set with all those objects in memory.
Sequence.toSet() uses a LinkedHashSet.  You could try providing another Set instance, using e.g. toCollection(HashSet()), to see if that's any faster.  (You wouldn't get the same iteration order, though.)
I agree with gidds answer on HashSet and LinkedHashSet performance.
LinkedHashSet is more expensive for insertions than HashSet;
However, in the above use case, I think we can leverage parallelStream to improve the performance. Under the hood, Kotlin uses the Java parallelStream.
val result: Set<String> = listOf("sdgds", "fdgdfsg", "dsfgsdfg")
.parallelStream()
.map {
someMethod(it)
}.collect(Collectors.toSet())
The Collectors.toSet() uses HashSet. So, we should be ok in insertion performance perspective.
Use distict or distictBy.
val result = sequenceOf("a", "b", "a", "c").distinct()
// -> "a", "b", "c"
// for more complex cases use custom comparator function
val result = getMyObjectsSequence().distinctBy { it.name }
This approach lets keep using sequence without involving explicit Iterables (List, Set, etc.).
Nevertheless, there is no magic, and "distinct" still uses HashSet under the hood and in case of really huge sequence it may cause sufficient memory usage and it must be kept in mind while applying this function.

Minecraft bukkit scheduler and procedural instance naming

This question is probably pretty obvious to any person who knows how to use Bukkit properly, and I'm sorry if I missed a solution in the others, but this is really kicking my ass and I don't know what else to do, the tutorials have been utterly useless. There are really 2 things that I need help doing:
I need to learn how to create an indefinite number of instances of an object. I figure it'd be like this:
int num = 0;
public void create(){
String name = chocolate + num;
Thingy name = new Thingy();
}
So you see what I'm saying? I need to basically change the name that is given to each new instance so that it doesn't overwrite the last one when created. I swear I've looked everywhere, I've asked my Java professor and I can't get any answers.
2: I need to learn how to use the stupid scheduler, and I can't understand anything so far. Basically, when an event is detected, 2 things are called: one method which activates instantly, and one which needs to be given a 5 second delay, then called. The code is like this:
public onEvent(event e){
Thingy thing = new Thingy();
thing.method1();
thing.doOnDelay(method2(), 100 ticks);
}
Once again, I apologize if I am not giving too many specifics, but I cannot FOR THE LIFE OF ME find anything about the Bukkit event scheduler that I can understand.
DO NOT leave me links to the Bukkit official tutorials, I cannot understand them at all and it'll be a waste of an answer. I need somebody who can help me, I am a starting plugin writer.
I've had Programming I and II with focus in Java, so many basic things I know, I just need Bukkit-specific help for the second one.
The first one has had me confused since I started programming.
Ok, so for the first question I think you want to use a data structure. Depending on what you're doing, there are different data structures to use. A data structure is simply a container that you can use to store many instances of a type of object. The data structures that are available to you are:
HashMap
HashSet
TreeMap
List
ArrayList
Vector
There are more, but these are the big ones. HashMap, HashSet, and TreeMap are all part of the Map class, which is notable for it's speedy operations. To use the hashmap, you instantiate it with HashMap<KeyThing, ValueThingy> thing = new HashMap<KeyThing, ValueThing>(); then you add elements to it with thing.put(key, value). Thn when you want to get a value out of it, you just use thing.get(key) HashMaps use an algorithm that's super fast to get the values, but a consequence of this is that the HashMap doesn't store it's entries in any particular order. Therefore when you want to loop though it with a for loop, it randomly returns it's entries (Not truly random because memory and stuff). Also, it's important to note that you can only have one of each individual key. If you try to put in a key that already exists in the map, it will over-right the value for that key.
The HashSet is like a HashMap but without storing values to go with it. It's a pretty good container if all you need to use it for is to determine if an object is inside it.
The TreeMap is one of the only maps that store it's values in a particular order. You have to provide a Comparator (something that tells if an object is less than another object) so that it knows the order to put the values if it wants them to be in ascending order.
List and ArrayList are not maps. Their elements are put in with a index address. With the List, you have to specify the number of elements you're going to be putting into it. Lists do not change size. ArrayLists are like lists in that each element can be retrieved with arrayListThing.get(index) but the ArrayList can change size. You add elements to an ArrayList by arrayListThing.add(Thing).
The Vector is a lot like an ArrayList. It actually functions about the same and I'm not quite sure what the difference between them is.
At any rate, you can use these data structures to store a lot of objects by making a loop. Here's an example with a Vector.
Vector<Thing> thing = new Vector<Thing>();
int numberofthings = 100;
for(int i = 0; i < numberofthings; i++) {
thing.add(new Thing());
}
That will give you a vector full of things which you can then iterate through with
for(Thing elem:thing) {
thing.dostuff
}
Ok, now for the second problem. You are correct that you need to use the Bukkit Scheduler. Here is how:
Make a class that extends BukkitRunnable
public class RunnableThing extends BukkitRunnable {
public void run() {
//what you want to do. You have to make this method.
}
}
Then what you want to do when you want to execute that thing is you make a new BukkitTask object using your RunnableThing
BukkitTask example = new RunnableThing().runTaskLater(plugin, ticks)
You have to do some math to figure out how many ticks you want. 20 ticks = 1 second. Other than that I think that covers all your questions.