Synchronized collection that blocks on every method

Synchronized collection that blocks on every method - kotlin

I have a collection that is commonly used between different threads. In one thread I need to add items, remove items, retrieve items and iterate over the list of items. What I am looking for is a collection that blocks access to any of its read/write/remove methods whenever any of these methods are already being called. So if one thread retrieves an item, another thread has to wait until the reading has completed before it can remove an item from the collection.
Kotlin doesn't appear to provide this. However, I could create a wrapper class that provides the synchronization I'm looking for. Java does appear to offer the synchronizedList class but from what I read, this is really for blocking calls on a single method, meaning that no two threads can remove an item at the same time but one can remove while the other reads an item (which is what I am trying to avoid).
Are there any other solutions?

A wrapper such as the one returned by synchronizedList
synchronizes calls to every method, using the wrapper itself as the lock. So one thread would be blocked from calling get(), say, while another thread is currently calling put(). (This is what the question seems to ask for.)
However, as the docs to that method point out, this does nothing to protect sequences of calls, such as you might use when iterating through a collection. If another thread changes the collection in between your calls to next(), then anything could happen. (This is what I think the question is really about!)
To handle that safely, your options include:
Manual synchronization. Surround each sequence of calls to the collection in a synchronized block that synchronises on the collection, e.g.:
val list = Collections.synchronizedList(mutableListOf<String>())
// …
synchronized (list) {
for (i in list) {
// …
}
}
This is straightforward, and relatively easy to do if the collection is under your control. But if you miss any sequences, then you could get unexpected behaviour. Also, you'll need to keep your sequences short, to avoid holding the lock for an extended time and affecting performance.
Use a concurrent collection implementation which provides primitives letting you do all the processing you need in a single call, avoiding iteration and other sequences.
For maps, Java provides very good support with its ConcurrentMap interface, and high-performance implementations such as ConcurrentHashMap. These have methods allowing you to iterate, update single or multiple mappings, search, reduce, and many other whole-map operations in a single call, avoiding any concurrency problems.
For sets (as per this question) you can use a ConcurrentSkipListSet, or you can create one from a ConcurrentHashMap with newKeySet().
For lists (as per this question), there are fewer options. (I think concurrent lists are much less commonly needed.) If you don't need random access, ConcurrentLinkedQueue may suffice. Or if modification is much less common than iteration, CopyOnWriteArrayList could work.
There are many other concurrent classes in the java.util.concurrent package, so it's well worth looking through to see if any of those is a better match for your particular case.
If you have specialised requirements, you could write your own collection implementation which supports them. Obviously this is more work, and only worthwhile if none of the above approaches does what you want.
In general, I think it's well worth stepping back and seeing whether iteration is really needed. Historically, in imperative languages all the way from FORTRAN through BASIC and C up to Java, the for loop has traditionally been the tool of choice (sometimes the only structure) for operating on collections of data — and for those of us who grew up on those languages, it's what we reach for instinctively. But the functional programming paradigm provides alternative tools, and so in languages like Kotlin which provide some of them, it's good to stop and ask ourselves “What am I ultimately trying to achieve here?” (Often what we want is actually to update all entries, or map to a new structure, or search for an element, or find the maximum — all of which have better approaches in Kotlin than low-level iteration.)
After all, if you can tell the compiler what you want to do, instead of how to do it, then your program is likely to be shorter and easier to read and maintain, freeing you to think about more important things!

Related

Do we need to lock the immutable list in kotlin?

var list = listOf("one", "two", "three")
fun One() {
list.forEach { result ->
/// Does something here
}
}
fun Two() {
list = listOf("four", "five", "six")
}
Can function One() and Two() run simultaneously? Do they need to be protected by locks?

No, you dont need to lock the variable. Even if the function One() still runs while you change the variable, the forEach function is running for the first list. What could happen is that the assignment in Two() happens before the forEach function is called, but the forEach would either loop over one or the other list and not switch due to the assignment
if you had a println(result) in your forEach, your program would output either
one
two
three
or
four
five
six
dependent on if the assignment happens first or the forEach method is started.
what will NOT happen is something like
one
two
five
six

Can function One() and Two() run simultaneously?
There are two ways that that could happen:
One of those functions could call the other.  This could happen directly (where the code represented by // Does something here in One()⁽¹⁾ explicitly calls Two()), or indirectly (it could call something else which ends up calling Two() — or maybe the list property has a custom setter which does something that calls One()).
One thread could be running One() while a different thread is running Two().  This could happen if your program launches a new thread directly, or a library or framework could do so.  For example, GUI frameworks tend to have one thread for dispatching events, and others for doing work that could take time; and web server frameworks tend to use different threads for servicing different requests.
If neither of those could apply, then there would be no opportunity for the functions to run simultaneously.
Do they need to be protected by locks?
If there's any possibility of them being run on multiple threads, then yes, they need to be protected somehow.
99.999% of the time, the code would do exactly what you'd expect; you'd either see the old list or the new one.  However, there's a tiny but non-zero chance that it would behave strangely — anything from giving slightly wrong results to crashing.  (The risk depends on things like the OS, CPU/cache topology, and how heavily loaded the system is.)
Explaining exactly why is hard, though, because at a low level the Java Virtual Machine⁽²⁾ does an awful lot of stuff that you don't see.  In particular, to improve performance it can re-order operations within certain limits, as long as the end result is the same — as seen from that thread.  Things may look very different from other threads — which can make it really hard to reason about multi-threaded code!
Let me try to describe one possible scenario…
Suppose Thread A is running One() on one CPU core, and Thread B is running Two() on another core, and that each core has its own cache memory.⁽³⁾
Thread B will create a List instance (holding references to strings from the constant pool), and assign it to the list property; both the object and the property are likely to be written to its cache first.  Those cache lines will then get flushed back to main memory — but there's no guarantee about when, nor about the order in which that happens.  Suppose the list reference gets flushed first; at that point, main memory will have the new list reference pointing to a fresh area of memory where the new object will go — but since the new object itself hasn't been flushed yet, who knows what's there now?
So if Thread A starts running One() at that precise moment, it will get the new list reference⁽⁴⁾, but when it tries to iterate through the list, it won't see the new strings.  It might see the initial (empty) state of the list object before it was constructed, or part-way through construction⁽⁵⁾.  (I don't know whether it's possible for it to see any of the values that were in those memory locations before the list was created; if so, those might represent an entirely different type of object, or even not a valid object at all, which would be likely to cause an exception or error of some kind.)
In any case, if multiple threads are involved, it's possible for one to see list holding neither the original list nor the new one.
So, if you want your code to be robust and not fail occasionally⁽⁶⁾, then you have to protect against such concurrency issues.
Using #Synchronized and #Volatile is traditional, as is using explicit locks.  (In this particular case, I think that making list volatile would fix the problem.)
But those low-level constructs are fiddly and hard to use well; luckily, in many situations there are better options.  The example in this question has been simplified too much to judge what might work well (that's the down-side of minimal examples!), but work queues, actors, executors, latches, semaphores, and of course Kotlin's coroutines are all useful abstractions for handling concurrency more safely.
Ultimately, concurrency is a hard topic, with a lot of gotchas and things that don't behave as you'd expect.
There are many source of further information, such as:
These other questions cover some of the issues.
Chapter 17: Threads And Locks from the Java Language Specification is the ultimate reference on how the JVM behaves.  In particular, it describes what's needed to ensure a happens-before relationship that will ensure full visibility.
Oracle has a tutorial on concurrency in Java; much of this applies to Kotlin too.
The java.util.concurrent package has many useful classes, and its summary discusses some of these issues.
Concurrent Programming In Java: Design Principles And Patterns by Doug Lea was at one time the best guide to handling concurrency, and these excerpts discuss the Java memory model.
Wikipedia also covers the Java memory model
(1) According to Kotlin coding conventions, function names should start with a lower-case letter; that makes them easier to distinguish from class/object names.
(2) In this answer I'm assuming Kotlin/JVM.  Similar risks are likely apply to other platforms too, though the details differ.
(3) This is of course a simplification; there may be multiple levels of caching, some of which may be shared between cores/processors; and some systems have hardware which tries to ensure that the caches are consistent…
(4) References themselves are atomic, so a thread will either see the old reference or the new one — it can't see a bit-pattern comprising parts of the old and new ones, pointing somewhere completely random.  So that's one problem we don't have!
(5) Although the reference is immutable, the object gets mutated during construction, so it might be in an inconsistent state.
(6) And the more heavily loaded your system is, the more likely it is for concurrency issues to occur, which means that things will probably fail at the worst possible time!

Vulkan: Any downsides to creating pipelines, command-buffers, etc one at a time?

Some Vulkan objects (eg vkPipelines, vkCommandBuffers) are able to be created/allocated in arrays (using size + pointer parameters). At a glance, this appears to be done to make it easier to code using common usage patterns. But in some cases (eg: when creating a C++ RAII wrapper), it's nicer to create them one at a time. It is, of course, simple to achieve this.
However, I'm wondering whether there are any significant downsides to doing this?
(I guess this may vary depending on the actual object type being created - but I didn't think it'd be a good idea to ask the same question for each object)
Assume that, in both cases, objects are likely to be created in a first-created-last-destroyed manner, and that - while the objects are individually created and destroyed - this will likely happen in a loop.
Also note:
vkCommandBuffers are also deallocated in arrays.
vkPipelines are destroyed individually.
Are there any reasons I should modify my RAII wrapper to allow for array-based creation/destruction? For example, will it save memory (significantly)? Will single-creation reduce performance?

Remember that vkPipeline creation does not require external synchronization. That means that the process is going to handle its own mutexes and so forth. As such, it makes sense to avoid locking those internal mutexes whenever possible.
Also, the process is slow. So being able to batch it up and execute it into another thread is very useful.
Command buffer creation doesn't have either of these concerns. So there, you should feel free to allocate whatever CBs you need. However, multiple creation will never harm performance, and it may help it. So there's no reason to avoid it.

Vulkan is an API designed around modern graphics hardware. If you know you want to create a certain number of objects up front, you should use the batch functions if they exist, as the driver may be able to optimize creation/allocation, resulting in potentially better performance.

There may (or may not) be better performance (depending on driver and the type of your workload). But there is obviously potential for better performance.
If you create one or ten command buffers in you application then it does not matter.
For most cases it will be like less than 5 %. So if you do not care about that (e.g. your application already runs 500 FPS), then it does not matter.
Then again, C++ is a versatile language. I think this is a non-problem. You would simply have a static member function or a class that would construct/initialize N objects (there's probably a pattern name for that).
The destruction may be trickier. You can again have static member function that would destroy N objects. But it would not be called automatically and it is annoying to have null/husk objects around. And the destructor would still be called on VK_NULL_HANDLE. There is also a problem, that a pool reset or destruction would invalidate all the command buffer C++ objects, so there's probably no way to do it cleanly/simply.

Scala immutable vs mutable. What is the way one should go?

I'm just learning to program in scala.
I have some experience in functional programming, as I have in object oriented programming.
My question is kind of simple, yet tricky:
Which structures should be used in Scala? Should we only stick to immutables, eg. modifing lists by iterating through it and stick a new one together, or go for mutables? What is your opinion on that, what are the performance aspects, memory related aspects, ...
I'm likely to program in a functional style, but it often expands to an insane amount of effort to do things which are easily done by using mutables. Is it situation dependent, what to use?

Prefer immutable to mutable state. Use mutable state only where it is absolutely necessary. Some notable reasons include:
Performance. The standard libraries make wide use of vars and while loops, even though this is not idiomatic Scala. This should not be emulated, however, except for cases where you have profiled to determine that modifying the code to be more imperative will bring a significant performance gain.
I/O. I/O, or interacting with the outside world is inherently state dependent, and thus must be dealt with in a mutable manner.
This is no different than the recommended coding style found in all major languages, imperative or functional. For example, in Java it is preferable to use data objects with only private final fields. Code written in an immutable (and functional) way is inherently easier to understand because when one sees a val, they know it will never change, reducing the possible number of states any particular object or function can be in.
In many cases, it also allows automatic parallel execution, for example, collection classes in Scala all have a par function, which will return a parallel collection that automatically run the calls to functions like map or reduce in parallel.

(I thought this must be a duplicate but couldn't easily find an earlier similar one, so I venture to answer...)
There is no general answer to this question. The rule of thumb suggested by the creators of Scala is to start with immutable vals and structures and stick to them as long as it makes sense. You can almost always create a workable solution to your problem this way. But if not, of course be pragmatic and use mutability.
Once you have a solution, you can tweak it, test it, measure its performance etc. If you find that e.g. it is too slow or overly complex, identify the critical part of it, understand what makes it problematic and - if needed - reimplement it using mutable variables, ideally keeping it isolated from the rest of the program. Note though that in many cases, a better solution can be found from within the immutable realm as well, so try looking there first. Especially for a beginner like myself, it still happens regularly that the best solution I could come up with looked contorted and complex with no apparent way to improve it - until seeing a simple and elegant solution to the same problem in a few lines of code, created by an experienced Scala developer who controls more of the power of the language and its libraries.

I usually obey the following rules:
Never use static mutable vars
Keep all user defined data types (typically case classes) immutable unless they are very expensive to copy. This will simplify a lot of the application logic.
If a data structure/collection is inherently mutable (i.e. it's designed to change over time), using a mutable data structure/collection might be appropriate. An example might be a large game world that is updated when players move. Remember to (almost) never share these data structures between threads though.
It's fine to use mutable local vars in methods
Use immutable collections for function results. These can be strictly or lazily evaluated depending on what gives best performance in the used context. Be careful if you use a lazily evaluated result which depends on a mutable collection though.

objective-c block vs selector. which one is better?

In objective-c when you are implementing a method that is going to perform a repetitive operations, for example, you need to choice in between the several options that the language brings you:
#interface FancyMutableCollection : NSObject { }
-(void)sortUsingSelector:(SEL)comparator;
// or ...
-(void)sortUsingComparator:(NSComparator)cmptr;
#end
I was wondering which one is better?
Objective-c provides many options: selectors, blocks, pointers to functions, instances of a class that conforms a protocol, etc.
Some times the choice is clear, because only one method suits your needs, but what about the rest? I don't expect this to be just a matter of fashion.
Are there any rules to know when to use selectors and when to use blocks?

The main difference I can think of is that with blocks, they act like closures so they capture all of the variables in the scope around them. This is good for when you already have the variables there and don't want to create an instance variable just to hold that variable temporarily so that the action selector can access it when it is run.
With relation to collections, blocks have the added ability to be run concurrently if there are multiple cores in the system. Currently in the iPhone there isn't, but the iPad 2 does have it and it is probable that future iPhone models will have multiple cores. Using blocks, in this case, would allow your app to scale automatically in the future.
In some cases, blocks are just easier to read as well because the callback code is right next to the code that's calling it back. This is not always the case of course, but when sometimes it does simply make the code easier to read.
Sorry to refer you to the documentation, but for a more comprehensive overview of the pros/cons of blocks, take a look at this page.
As Apple puts it:
Blocks represent typically small, self-contained pieces of code. As such, they’re particularly useful as a means of encapsulating units of work that may be executed concurrently, or over items in a collection, or as a callback when another operation has finished.
Blocks are a useful alternative to traditional callback functions for two main reasons:
They allow you to write code at the point of invocation that is executed later in the context of the method implementation.
Blocks are thus often parameters of framework methods.
They allow access to local variables.
Rather than using callbacks requiring a data structure that embodies all the contextual information you need to perform an operation, you simply access local variables directly.
On this page

The one that's better is whichever one works better in the situation at hand. If your objects all implement a comparison selector that supports the ordering you want, use that. If not, a block will probably be easier.

Manipulating Objects in Methods instead of returning new Objects?

Let’s say I have a method that populates a list with some kind of objects. What are the advantages and disadvantages of following method designs?
void populate (ArrayList<String> list, other parameters ...)
ArrayList<String> populate(other parameters ...)
Which one I should prefer?
This looks like a general issue about method design but I couldn't find a satisfying answer on google, probably for not using the right keywords.

The second one seems more functional and thread safe to me. I'd prefer it in most cases. (Like every rule, there are exceptions.)
The owner of the populate method could return an immutable List (why ArrayList?).
It's also thread safe if there is no state modified in the populate method. Only passed in parameters are used, and these can also be immutable.

Other than what #duffymo mentioned, the second one is easier to understand, thus use: it is obvious what its input and output is.

Advantages to the in-out parameter:
You don't have to create as many objects. In languages like C or C++, where allocation and deallocation can be expensive, that can be a plus. In Java/C#, not so much -- GC makes allocation cheap and deallocation all but invisible, so creating objects isn't as big a deal. (You still shouldn't create them willy-nilly, but if you need one, the overhead isn't as bad as in some manual-allocation languages.)
You get to specify the type of the list. Potential plus if you need to pass that array to some other code you don't control later.
Disadvantages:
Readability issues.
In almost all languages that support function arguments, the first case is assumed to mean "do something with the entries in this list". Modifying args violates the Priciple of Least Astonishment. The second is assumed to mean "give me a list of stuff", which is what you're after.
Every time you say "ArrayList", or even "List", you take away a bit of flexibility. You add some overhead to your API. What if i don't want to create an ArrayList before calling your method? I shouldn't have to, if the method's whole purpose in life is to return me some entries. That's the API's job.
Encapsulation issues:
The method being passed a list to fill can't assume anything about that list (even that it's a list at all; it could be null).
The method passing the list can't guarantee anything about what the method does with it. If it's working correctly, sure, the API docs can say "this method won't destroy existing entries". But considering the chance of bugs, that may not be worth trusting. At least if the method returns its own list, the caller doesn't have to worry about what was in it before. And it doesn't have to worry about a bug from a thousand miles away corrupting data it should never have affected.
Thread safety issues.
The list could be locked by another thread, meaning if we try and lock on it now it could potentially lock up the app.
Or, if not locked, it could still be modified by another thread, in which case we're no less screwed. Unless you're going to write extra code to handle concurrent-modification exceptions everywhere.
Returning a new list means every call to the method can have its own list. No thread can mess with another thread's return value, unless the class is very badly designed.
Side point: Being able to specify the type of the list often leads to dependencies on the type of the list. Notice how you're passing ArrayLists around everywhere. You're painting yourself into corners by saying "This is an ArrayList" when you don't need to, but when you're passing it to a dozen methods, that's a dozen methods you'll have to change. (Not entirely related, but only slightly tangential. You could change the types to List rather than ArrayList and get rid of this. But the more you're passing that list around, the more places you'll need to change.)
Short version: Unless you have a damn good reason, use the first syntax only if you're using the existing contents of the list in your method. IE: if you're modifying it, or doing something with the existing values. If you intend to return a list of entries, then return a List of entries.

The second method is the preferred way for many reasons.
primarily because the function signature is more clear and shows what its intentions are.
It is actually recommended that you NEVER change the value of a parameter that is passed in to a function unless you explicitly mark it as an "out" parameter.
it will also be easier to use in expressions
and it will be easier to change in the future. including taking it to a more functional approach (for threading, etc.) if you would like to

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas