I'm trying to ensure that my concurrent program is free of the following,
deadlock
livelock
starvation
I found the following tool
http://blog.golang.org/race-detector
And I tried compiling and running using -race enabled and did not see any complaints.
Does anyone know whether this checks for all the above issues? Does not receiving any complaints in the output mean to say that the program is free of these issues?
Deadlock, livelock and starvation cannot be eliminated by testing alone and they are not detected by the Go race detector. Deadlocks in Go programs will be detected at runtime, but then it's usually too late. Livelocks (non-terminating busy loops) will not be detected unless they also cause a deadlock.
Thread starvation is similar to livelock in that imbalanced busy-ness in an application causes some activities to be stymied and never make the expected progress. An example is the famous 'Wot no Chickens?' by Peter Welch.
The race detector itself is limited in its usefulness because some race conditions depend on environment and the conditions that cause a particular race may be absent during the testing phase, so the race detector will miss them.
If all this sounds rather bleak, there is a body of theoretical work that can help a lot. The premise is that these four dynamic problems are best addressed via design strategies and language features, rather than by testing. As a simple example, the Occam programming language (which is quite like Go in its concurrency model) has a parallel usage rule enforced by the compiler that eliminates race conditions. This imposes a restriction on the programmer: aliases for mutable state (i.e. pointers) are not allowed.
Thread starvation in Go (and Occam) should likewise not be as much a problem as in Java because the concurrency model is better designed. Unless you abuse select, it won't be a problem.
Deadlock is best addressed by theoretically-based design patterns. For example, Martin & Welch published A Design Strategy for Deadlock-Free Concurrent Systems, which described principally the client-server strategy and the i/o-par strategy. This is aimed at Occam programs but applies to Go as well. The client-server strategy is simple: describe your Go-routine network as a set of communicating servers and their clients; ensure that there are no loops in the network graph => deadlock is eliminated. I/o-par is a way to form rings and toruses of Go-routines such that there will not be a deadlock within the structure.
IMHO the race detector checks nothing from your list. It checks racy writes to memory. (Sidenote: Goroutine deadlocks are detected by the runtime.)
Related
I understand that the meaning of atomic was explained in What's the difference between the atomic and nonatomic attributes?, but what I want to know is:
Q. Are there any side effects, besides performance issues, in using atomic properties everywhere?
It seems the answer is no, as the performance of iPhone is quite fast nowadays. So why are so many people still using non-atomic?
Even atomic does not guarantee thread safety, but it's still better than nothing, right?
Even atomic does not guarantee thread safety, but it's still better than nothing, right?
Wrong. Having written some really complex concurrent programs, I recommend exactly the opposite. You should reserve atomic for when it truly makes sense to use -- and you may not fully understand this until you write concurrent programs without any use of atomic. If I am writing a multithreaded program, I don't want programming errors masked (e.g. race conditions). I want concurrency issues loud and obvious. This way, they are easier to identify, reproduce, and correct.
The belief that some thread safety is better than none is flawed. The program is either threadsafe, or it is not. Using atomic can make those aspects of your programs more resistant to issues related to concurrency, but that doesn't buy you much. Sure, there will likely be fewer crashes, but the program is still undisputedly incorrect, and it will still blow up in mysterious ways. My advice: If you aren't going to take the time to learn and write correct concurrent programs, just keep them single threaded (if that sounds kind of harsh: it's not meant to be harsh - it will save you from a lot of headaches). Multithreading and concurrency are huge, complicated subjects - it takes a long time to learn to write truly correct, long-lived programs in many domains.
Of course, atomic can be used to achieve threadsafety in some cases -- but making every access atomic guarantees nothing for thread safety. As well, it's highly unusual (statistically) that atomic properties alone will make a class truly threadsafe, particularly as complexity of the class increases; it is more probable that a class with one ivar is truly safe using atomics only, versus a class with 5 ivars. atomic properties are a feature I use very very rarely (again, some pretty large codebases and concurrent programs). It's practically a corner case if atomics are what makes a class truly thread safe.
Performance and execution complexity are the primary reasons to avoid them. Compared to nonatomic accesses, and the frequency and simplicity of accessing a variable, use of atomic adds up very fast. That is, atomic accesses introduce a lot of execution complexity relative to the task they perform.
Spin locks are one way atomic properties are implemented. So, would you want a synchronization primitive such as a spin lock or mutex implicitly surrounding every get and set, knowing it does not guarantee thread safety? I certainly don't! Making every property access in your implementations atomic can consume a ton of CPU time. You should use it only when you have an explicit reason to do so (also mentioned by dasblinkenlicht+1). Implementation Detail: some accesses do not require spin locks to uphold guarantees of atomic; it depends on several things, such as the architecture and the size of a variable.
So to answer your question "any side-effect?" in a TL;DR format: Performance is the primary reason as you noted, while the applicability of what atomic guarantees and how that is useful for you is very narrow at your level of abstraction (oft misunderstood), and it masks real bugs.
You should not pay for what you do not use. Unlike plugged-in computers where CPU cycles cost you in terms of time, CPU cycles on a mobile device cost you both in time and in the battery use. If your application is single-threaded, there is no reason to use atomic, because the locking and unlocking operations would be a waste of time and battery. The battery is more important than the time: while the latency associated with addition of extra operations may be invisible to your end-user, the cycles spent will reduce the time the mobile device can work after a single charge, a measure that a lot of users consider very important.
I wrote a MPI program that seems to run ok, but I wonder about performance. Master thread needs to do 10 or more times MPI_Send, and the worker receives data 10 or more times and sends it. I wonder if it gives a performance penalty and whether I could transfer everything in single structs or which other technique could I benefit from.
Other general question, once a mpi program works more or less, what are the best optimization techniques.
It's usually the case that sending 1 large message is faster than sending 10 small messages. The time cost of sending a message is well modelled by considering a latency (how long it would take to send an empty message, which is non-zero because of the overhead of function calls, network latency, etc) and a bandwidth (how much longer it takes to send an extra byte given that the network communications has already started). By bundling up messages into one message, you only incurr the latency cost once, and this is often a win (although it's always possible to come up with cases where it isn't). The best way to know for any particular code is simply to try. Note that MPI datatypes allow you very powerful ways to describe the layout of your data in memory so that you can take it almost directly from memory to the network without having to do an intermediate copy into some buffer (so-called "marshalling" of the data).
As to more general optimization questions about MPI -- without knowing more, all we can do is give you advice which is so general as to not be very useful. Minimize the amount of communications which need to be done; wherever possible, use built-in MPI tools (collectives, etc) rather than implementing your own.
One way to fully understand the performance of your MPI application is to run it within the SimGrid platform simulator. The tooling and models provided are sufficient to get realistic timing predictions of mid-range applications (like, a few dozen thousands lines of C or Fortran), and it can be associated to adapted visualization tools that can help you fully understand what is going on in your application, and the actual performance tradeoffs that you have to consider.
For a demo, please refer to this screencast: https://www.youtube.com/watch?v=NOxFOR_t3xI
Greetings,
I'm evaluating some components for a multi-data center distributed system. We're going to be using message queues (via either RabbitMQ or Qpid) so agents can make asynchronous requests to other agents without worrying about addressing, routing, load balancing or retransmission.
In many cases, the agents will be interacting with components that were not designed for highly concurrent access, so locking and cross-agent coordination will be needed to avoid race conditions. Also, we'd like the system to automatically respond to agent or data center failures.
With the above use cases in mind, ZooKeeper seemed like it might be a good fit. But I'm wondering if trying to use both ZK and message queuing is overkill. It seems like what Zookeeper does could be accomplished by my own cluster manager using AMQP messaging, but that would be hard to get really right. On the other hand, I've seen some examples where ZooKeeper was used to implement message queuing, but I think RabbitMQ/Qpid are a more natural fit for that.
Has anyone out there used a combination like this?
Thanks in advance,
-Chris
Coming into this late, but maybe it will be of some use. The primary consideration should be the performance characteristics of your system. ZooKeeper, like you said, is more than capable of implementing a task distribution system using a distributed queue, but zk currently, is more optimized for reads than it is for writes (this only comes into play in the 1000's of ops per second range). If your throughput needs are less than this, then using just zk to implement your system would reduce number of runtime components and make it simpler. Of course, you should always run your performance tests before deciding.
Distributed coordination is really hard to get right, so I would definitely recommend using zookeeper for that and not rolling your own.
Not quite sure what ZooKeeper exactly is, but I guess that using a component from Apache (if it does fit your needs well) is preferred before managing such things as distributed synchronization and group services at your own. You could of course hire a team of developers especially for that purpose, but that doesn't guarantee you a better implementation.
I guess, that it would be anyways implemented as a separate component, cuz other way could bring much complexity and decelerate the workflow; so the preference of ZooKeeper or anything similar is kind of obvious (to me).
And surely, unless you're in the global optimization phase of your project workflow, I guess it would be better to use RabbitMQ or such (I would even stress that, cuz implementations (especially commercial) of the AMQP would be more reliable than everything that you'd come up with).
So I would go for both, carefully chosing the appropriate thirdparty products, but using as much of them as it is needed. And that's just my opinion; thanks for reading :)
I've been reading up a lot about transactional memory lately. There is a bit of hype around TM, so a lot of people are enthusiastic about it, and it does provide solutions for painful problems with locking, but you regularly also see complaints:
You can't do I/O
You have to write your atomic sections so they can run several times (be careful with your local variables!)
Software transactional memory offers poor performance
[Insert your pet peeve here]
I understand these concerns: more often than not, you find articles about STMs that only run on some particular hardware that supports some really nifty atomic operation (like LL/SC), or it has to be supported by some imaginary compiler, or it requires that all accesses to memory be transactional, it introduces type constraints monad-style, etc. And above all: these are real problems.
This has lead me to ask myself: what speaks against local use of transactional memory as a replacement for locks? Would this already bring enough value, or must transactional memory be used all over the place if used at all?
Yes, some of the problems you mention can be real ones now, but things evolve.
As any new technology, first there is a hype, then the new technology shows that there are some unresolved problems, and then some of these problems are solved and others not. This result in another possibility to solve your problems, for which this technology is the more adapted.
I will say that you can use STM for a part of your application that can leave with the constraints the currents state of the art have. Part of the application that don't mind about a lost of efficiency for example.
Communication between the transaction and non transactional parts is the big problem. There are STM that are lock aware, so them can interact in a consistent way with non transactional parts.
I/O is also possible, but your transaction becomes irrevocable, that is, can not be aborted. That means that only one transaction can use I/O at the same time. You can also use I/O once the top level transaction has succeed, on a non-transactional world, as now.
Most of the STM library base systems force the user to make the difference between transactional and non transactional data. So yes, you need to understand what this exactly means. On the other hand, compilers can deduce what access must be transactional or not, the problem been that they can be too conservative, decreasing the efficiency we can get when we manage explicitly the different kind of variables. This is the same as having static, local and dynamic variables. You need to know the constraints each one have to make a correct program.
I've been reading up a lot about transactional memory lately.
You might also be interested in this podcast on software transactional memory, which also introduces STM using an analogy based on garbage collection:
The paper is about an analogy between garbage collection and transactional memory.
In addition to seeing the beauty of the analogy, the discussion also serves as a good
introduction to transactional memory (which was mentioned in the Goetz/Holmes episode)
and - to some extent - to garbage collection.
If you use transactional memory as a replacement for locks, all the code that executes with that lock held could be rolled back upon completion. Thus the code that was previously using locks must be transactional, and will have all the same drawbacks (and benefits).
So, you could possibly restrict the influence of TM to only those parts of the code that hold locks, right? Every piece of code that can be called during a held lock must support TM, in that scenario. How much of your program does not hold locks and is never called by code that holds locks?
Transactional programming is, in this day and age, a staple in modern development. Concurrency and fault-tolerance are critical to an applications longevity and, rightly so, transactional logic has become easy to implement. As applications grow though, it seems that transactional code tends to become more and more burdensome on the scalability of the application, and when you bridge into distributed transactions and mirrored data sets the issues start to become very complicated. I'm curious what seems to be the point, in data size or application complexity, that transactions frequently start becoming the source of issues (causing timeouts, deadlocks, performance issues in mission critical code, etc) which are more bothersome to fix, troubleshoot or workaround than designing a data model that is more fault-tolerant in itself, or using other means to ensure data integrity. Also, what design patterns serve to minimize these impacts or make standard transactional logic obsolete or a non-issue?
--
EDIT: We've got some answers of reasonable quality so far, but I think I'll post an answer myself to bring up some of the things I've heard about to try to inspire some additional creativity; most of the responses I'm getting are pessimistic views of the problem.
Another important note is that not all dead-locks are a result of poorly coded procedures; sometimes there are mission critical operations that depend on similar resources in different orders, or complex joins in different queries that step on each other; this is an issue that can sometimes seem unavoidable, but I've been a part of reworking workflows to facilitate an execution order that is less likely to cause one.
I think no design pattern can solve this issue in itself. Good database design, good store procedure programming and especially learning how to keep your transactions short will ease most of the problems.
There is no 100% guaranteed method of not having problems though.
In basically every case I've seen in my career though, deadlocks and slowdowns were solved by fixing the stored procedures:
making sure all tables are accessed in order prevents deadlocks
fixing indexes and statistics makes everything faster (hence diminishes the chance of deadlock)
sometimes there was no real need of transactions, it just "looked" like it
sometimes transactions could be eliminated by making multiple statement stored procedures in single statement ones.
The use of shared resources is wrong in the long run. Because by reusing an existing environment you are creating more and more possibilities. Just review the busy beavers :) The way Erlang goes is the right way to produce fault-tolerant and easily verifiable systems.
But transactional memory is essential for many applications in widespread use. If you consult a bank with its millions of customers for example you can't just copy the data for the sake of efficiency.
I think monads are a cool concept to handle the difficult concept of changing state.
One approach I've heard of is to make a versioned insert only model where no updates ever occur. During selects the version is used to select only the latest rows. One downside I know of with this approach is that the database can get rather large very quickly.
I also know that some solutions, such as FogBugz, don't use enforced foreign keys, which I believe would also help mitigate some of these problems because the SQL query plan can lock linked tables during selects or updates even if no data is changing in them, and if it's a highly contended table that gets locked it can increase the chance of DeadLock or Timeout.
I don't know much about these approaches though since I've never used them, so I assume there are pros and cons to each that I'm not aware of, as well as some other techniques I've never heard about.
I've also been looking into some of the material from Carlo Pescio's recent post, which I've not had enough time to do it justice unfortunately, but the material seems very interesting.
If you are talking 'cloud computing' here, the answer would be to localize each transaction to the place where it happens in the cloud.
There is no need for the entire cloud to be consistent, as that would kill performance (as you noted). Simply, keep track of what is changed and where and handle multiple small transactions as changes propagate through the system.
The situation where user A updates record R and user B at the other end of cloud does not see it (yet) is the same as the one when user A didn't do the change yet in the current strict-transactional environment. This could lead to a discrepancy in an update-heavy system, so systems should be architectured to work with updates as less as possible - moving things to aggregation of data and pulling out the aggregates once the exact figure is critical (i.e. moving requirement for consistency from write-time to critical-read-time).
Well, just my POV. It's hard to conceive a system that is application agnostic in this case.
Try to make changes at the database level in the least number of possible instructions.
The general rule is to lock a resource the lest possible time. Using T-SQL, PLSQL, Java on Oracle or any similar way you can reduce the time that each transaction locks a shared resource. I fact transactions in the database are optimized with row-level locks, multi-version, and other kinds of intelligent techniques. If you can make the transaction at the database you save the network latency. Apart from other layers like ODBC/JDBC/OLEBD.
Sometimes the programmer tries to obtain the good things of a database ( It is transactional, parallel, distributed, ) but keep a caché of the data. Then they need to add manually some of the database features.