kotlin, is there a data structure like LinkedHashMap but thread safe - kotlin

Need to have a queue and like mapSet to having unique key, so that it supports
queue.put(key, value) //<== put in the queue and keep the order (it become last in the queue)
queue.get(key)
queue.remove(key)
and it can also do something like
queue.pop() // remove from head
seems it could be implemented with LinkedHashMap but it is not thread safe. Any other data structure in kotlin may be used for this case? or LruCache?

Related

Iterator over Chronicle Queue implementation

I'm looking to implement an Iterator<> on a ChronicleQueue methodReader (or tailer if this cant be done with a methodReader).
Is there a way to see if a queue has more data ( can I use lastReadIndex() < ??? ) (so I can implement the hasNext() of the Iterator?

Synchronize access to mutable fields with Kotlin's map delegation

Is this implementation safe to synchronize the access to the public fields/properties?
class Attributes(
private val attrsMap: MutableMap<String, Any?> = Collections.synchronizedMap(HashMap())
) {
var attr1: Long? by attrsMap
var attr2: String? by attrsMap
var attr3: Date? by attrsMap
var attr4: Any? = null
...
}
Mostly.
Because the underlying map is is only accessible via the synchronised wrapper, you can't have any issues caused by individual calls, such as simultaneous gets and/or puts (which is the main cause of race conditions): only one thread can be making such a call, and the Java memory model ensures that the results are then visible to all threads.
You could have race conditions involving a sequence of calls, such as iterating through the map, or a check followed by a modify, if the map could be modified in between.  (That sort of problem can occur even on a single thread.)  But as long as the rest of your class avoided such sequences, and didn't leak a reference to the map, you'd be safe.
And because the types Long, String, and Date are immutable, you can't have any issues with their contents being modified.
That is a concern with the Any parameter, though.  If it stored e.g. a StringBuilder, one thread could be modifying its contents while another was accessing it, with hilarious consequences.  There's not much you can do about that in a wrapper class, though.
By the way, instead of using a synchronised wrapper, you could use a ConcurrentHashMap, which would avoid the synchronisation in most cases (at the cost of a bit more memory).  It also provides many methods which can replace call sequences, such as getOrPut(); it's a really powerful tool for writing high-performance multithreaded code.

Are serializers the right spot to remove shared state from Akka messages?

I am working on a distributed algorithm and decided to use a Akka to scale it across machines. The machines need to exchange messages very frequently and these messages reference some immutable objects that exist on every machine. Hence, it seems sensible to "compress" the messages in the sense that the shared, replicated objects should not be serialized in the messages. Not only would this save on network bandwidth but it also would avoid creating duplicate objects in the receiver side whenever a message is deserialized.
Now, my question is how to do this properly. So far, I could think of two options:
Handle this on the "business layer", i.e., converting my original message objects to some reference objects that replace references to the shared, replicated objects by some symbolic references. Then, I would send those reference objects rather than the original messages. Think of it as replacing some actual web resource with a URL. Doing this seems rather straight-forward in terms of coding but it also drags serialization concerns into the actual business logic.
Write custom serializers that are aware of the shared, replicated objects. In my case, it would be okay that this solution would introduce the replicated, shared objects as global state to the actor systems via the serializers. However, the Akka documentation does not describe how to programmatically add custom serializers, which would be necessary to weave in the shared objects with the serializer. Also, I could imagine that there are a couple of reasons, why such a solution would be discouraged. So, I am asking here.
Thanks a lot!
It's possible to write your own, custom serializers and let them do all sorts of weird things, then you can bind them at the config level as usual:
class MyOwnSerializer extends Serializer {
// If you need logging here, introduce a constructor that takes an ExtendedActorSystem.
// class MyOwnSerializer(actorSystem: ExtendedActorSystem) extends Serializer
// Get a logger using:
// private val logger = Logging(actorSystem, this)
// This is whether "fromBinary" requires a "clazz" or not
def includeManifest: Boolean = true
// Pick a unique identifier for your Serializer,
// you've got a couple of billions to choose from,
// 0 - 40 is reserved by Akka itself
def identifier = 1234567
// "toBinary" serializes the given object to an Array of Bytes
def toBinary(obj: AnyRef): Array[Byte] = {
// Put the code that serializes the object here
//#...
Array[Byte]()
//#...
}
// "fromBinary" deserializes the given array,
// using the type hint (if any, see "includeManifest" above)
def fromBinary(
bytes: Array[Byte],
clazz: Option[Class[_]]): AnyRef = {
// Put your code that deserializes here
//#...
null
//#...
}
}
But this raises an important question: if your messages all references data that is shared on the machines already, why would you want to put in the message the pointer to the object (very bad! messages should be immutable, and a pointer isn't!), rather than some sort of immutable, string objectId (kinda your option 1) ? This is a much better option when it comes to preserving the immutability of the messages, and there is little change in your business logic (just put a wrapper over the shared state storage)
for more info, see the documentation
I finally went with the solution proposed by Diego and want to share some more details on my reasoning and solution.
First of all, I am also in favor of option 1 (handling the "compaction" of messages in the business layer) for those reasons:
Serializers are global to the actor system. Making them stateful is actually a most severe violation of Akka's very philosophy as it goes against the encapsulation of behavior and state in actors.
Serializers have to be created upfront, anyway (even when adding them "programatically").
Design-wise, one can argue that "message compaction is not a responsibility of the serializer, either. In a strict sense, serialization is merely the transformation of runtime-specific data into a compact, exchangable representation. Changing what to serialize, is not a task of a serializer, though.
Having settled upon this, I still strived for a clear separation of "message compaction" and the actual business logic in the actors. I came up with a neat way to do this in Scala, which I want to share here. The basic idea is to make the message itself look like a normal case class but still allow these messages to "compactify" themselves. Here is an abstract example:
class Sender extends ActorRef {
def context: SharedContext = ... // This is the shared data present on every node.
// ...
def someBusinessLogic(receiver: ActorRef) {
val someData = computeData
receiver ! MyMessage(someData)
}
}
class Receiver extends ActorRef {
implicit def context: SharedContext = ... // This is the shared data present on every node.
def receiver = {
case MyMessage(someData) =>
// ...
}
}
object Receiver {
object MyMessage {
def apply(someData: SomeData) = MyCompactMessage(someData: SomeData)
def unapply(myCompactMessage: MyCompactMessage)(implicit context: SharedContext)
: Option[SomeData] =
Some(myCompactMessage.someData(context))
}
}
As you can see, the sender and receiver code feels just like using a case class and in fact, MyMessage could be a case class.
However, by implementing apply and unapply manually, one can insert its own "compactification" logic and also implicitly inject the shared data necessary to do the "uncompactification", without touching the sender and receiver. For defining MyCompactMessage, I found Protocol Buffers to be especially suited, as it is already a dependency of Akka and efficient in terms of space and computation, but any other solution would do.

waitUntilAllOperationsAreFinished and objectWithID

Update I can confirm that objectWithID could potentially need a parent (or grandparent, etc) context's thread to do some fetching so avoid blocking your parent thread using something like waitUntilAllOperationsAreFinished.
As a quick test I pointed the children moc's parent to their grandparent instead and left the children threads blocking the original parent. In this setup the deadlock never occurred. This is a poor architecture though so I'll be rearchitecting.
Original Question
I have two layers of NSOperationQueue. The first is an NSOperation graph with operations that have a set of dependencies between them. They all run fine without deadlocking each other. Within one of these operations (a Scheduler for groups of people) I have broken out its work to more discrete chunks that can be run on another NSOperationQueue. However I still will want the Scheduler to finish creating all of its schedules before the larger operation is considered finished. To that end, once I create all Schedule operations and add them to the Scheduler operation queue, I call waitUntilAllOperationsAreFinished on the operation queue. This is where I deadlock.
I am using Core Data and have an NSBlockOperation subclass called BlockOperation that handles the routine of taking a parent managed object context, creating a PrivateQueueConcurrencyType child context, calling the provided block using performBlockAndWait and finally waiting on the parent context to merge changes. Here's some code...
init(block: (NSManagedObjectContext?) -> Void, withDependencies dependencies: Array<NSOperation>, andParentManagedObjectContext parentManagedObjectContext: NSManagedObjectContext?) {
self.privateContext = NSManagedObjectContext(concurrencyType: .PrivateQueueConcurrencyType)
super.init()
self.queuePriority = NSOperationQueuePriority.Normal
addExecutionBlock({
if (parentManagedObjectContext != nil) {
self.parentContext = parentManagedObjectContext!
self.privateContext.parentContext = parentManagedObjectContext!
self.privateContext.performBlockAndWait({ () -> Void in
block(self.privateContext)
})
self.parentContext!.performBlockAndWait({ () -> Void in
var error: NSError?
self.parentContext!.save(&error)
})
}
})
for operation in dependencies {
addDependency(operation)
}
}
This is working really well for me already. But now I want to block a calling thread until an operation queue on it has finished all of its operations. Like this...
for group in groups {
let groupId = group.objectID
let scheduleOperation = BlockOperation(
block: { (managedObjectContext: NSManagedObjectContext?) -> Void in
ScheduleOperation.scheduleGroupId(groupId, inManagedObjectContext: managedObjectContext!)
},
withDependencies: [],
andParentManagedObjectContext: managedObjectContext)
scheduleOperationQueue.addOperation(scheduleOperation)
}
scheduleOperationQueue.waitUntilAllOperationsAreFinished()
...this thread gets stuck on that last line (obviously). But we never see the other threads make any progress past a certain point. Pausing the debugger I see where the queued operations are stuck. It's in a ScheduleOperation's init method where we fetch the group using the provided id. (ScheduleOperation.scheduleGroupId calls this init)
convenience init(groupId: NSManagedObjectID, inManagedObjectContext managedObjectContext: NSManagedObjectContext) {
let group = managedObjectContext.objectWithID(groupId) as Group
...
Does objectWithID need to execute code on the "parent" thread that its parent moc is associated with and therefore creating a deadlock? Is there anything else about my approach that could be causing this?
Note: Although I am writing this is Swift, I have added Objective-C as a tag because I feel like this is not a language specific issue, but a framework specific one.
In general it's not specified on which thread objectWithID will be called, it's an implementation detail. I had some problems with Core Data deadlocks in the past (although in different circumstances) and I found out that the framework does some locking internally when you invoke methods on NSManagedObjectContext. So yes, I think it might result in a deadlock.
I have no advice other than re-designing your architecture, maybe it can be simplified a little. Keep in mind that you already have a private serial queue associated with a context, which guarantees that the operations will be called in the specified order. You can therefore share the same context between all the ScheduleOperation instances. Set scheduleOperationQueue.maxConcurrentOperationsCount to 1, so that operations will execute one after another. And instead of blocking the calling thread, call a completion handler when the last operation finishes (you can use oepration's completionBlock).

GPARs async functions and passing references that are being updated by another thread

I am using GPARs asynchronous functions to fire off a process as each line in a file is parsed.
I am seeing some strange behavior that makes me wonder if I have an issue with thread safety.
Let's say I have a current object that is being loaded up with values from the current row in an input spreadsheet, like so:
Uploader {
MyRowObject currentRowObject
}
Once it has all the values from the current row, I fire off an async closure that looks a bit like this:
Closure processCurrentRowObject = { ->
myService.processCurrentRowObject (currentRowObject)
}.asyncFun()
It is defined in the same class, so it has access to the currentRowObject.
While that is off and running, I parse the next row, and start by creating a new object:
MyObject currentObject = new MyObject()
and start loading it up with values.
I assumed that this would be safe, that the asynchronous function would be pointing to the previous object. However, I wonder if because I am letting the closure bind to the reference, if somehow the reference is getting updated in the async function, and I am pulling the object instance out from under it, so to speak - changing it while it's trying to work on the previous instance.
If so, any suggestions for fixing? Or am I safe?
Thanks!
I'm not sure I fully understand your case, however, here's a quick tip.
Since it is always dangerous to share a single mutable object among threads, I'd recommend to completely separate the row objects used for different rows:
final localRowObject = currentRowObject
currentRowObject = null
Closure processCurrentRowObject = { ->
myService.processCurrentRowObject (localRowObject)
}.asyncFun()