Restarting agent program after it crashes - testing

Consider a distributed bank application, wherein distributed agent machines modify the value of a global variable : say "balance"
So, the agent's requests are queued. A request is of the form wherein value is added to the global variable on behalf of the particular agent. So,the code for the agent is of the form :
look_queue(); // take a look at the leftmost request on queue without dequeuing
///////////////////// **POINT A**
/////////////////// **POINT B**
dequeue(); // once transaction is complete, request can be dequeued
Now, if an agent's code crashes at POINT B, then obviously the request should not be processed again, otherwise the variable will be modified twice for the same request. To avoid this, we can make the code atomic, thus :
look_queue(); // take a look at the leftmost request on queue without dequeuing
dequeue(); // once transaction is complete, request can be dequeued
I am looking for answers to these questions :
How to identify points in code which need to be executed atomically 'automatically' ?
IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
Again,logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
In general,how can identify such atomic blocks in case of agents that work together. If one agent fails, others have to wait for it to restart ? How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
How to make the atomic blocks more fine-grained, to reduce performance bottlenecks ?

Q> How to identify points in code which need to be executed atomically 'automatically' ?
A> Any time, when there's anything stateful shared across different contexts (not necessarily all parties need to be mutators, enough to have at least one). In your case, there's balance that is shared between different agents.
Q> IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
A> It can help, but it has high costs attached. You need to rollback X entries, replay the scenario, etc. Better approach is to either make it all-transactional or have effective automatic rollback scenario.
Q> Again, logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
A> In some cases you can relax consistency. For example, CopyOnWriteArrayList does a concurrent write-behind and switches data on for new readers after when it becomes available. If write fails, it can safely discard that data. There's also compare and swap. Also see the link for the previous question.
Q> In general,how can identify such atomic blocks in case of agents that work together.
A> See your first question.
Q> If one agent fails, others have to wait for it to restart ?
A> Most of the policies/APIs define maximum timeouts for critical section execution, otherwise risking the system to end up in a perpetual deadlock.
Q> How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
A> It can to a fair degree. However testing concurrent code requires as much skills as to write the code itself, if not more.
Q> How to make the atomic blocks more fine-grained, to reduce performance bottlenecks?
A> You have answered the question yourself :) If one atomic operation needs to modify 10 different shared state variables, there's nothing much you can do apart from trying to push the external contract down so it needs to modify more. This is pretty much the reason why databases are not as scalable as NoSQL stores - they might need to modify depending foreign keys, execute triggers, etc. Or try to promote immutability.
If you were Java programmer, I would definitely recommend reading this book. I'm sure there are good counterparts for other languages, too.


Webflux better performance Mono<List<Object>>, Flux<Object>

I'm starting with webflux and I wonder which of the following have a better performance as all of them seem quite similar to me
- List<Customer> findAll()
- Mono<List<Customer>> findAll()
- Flux<Customer> findAll()
Could you help me to understand which one is the best and why? Thanks
This is pretty basic, and you should read about the difference between a Mono, a Flux and a concrete List<T> in the official Reactive documentation. But i will explain it in simple terms.
All of the above produce the same thing, it's more of a question how they produce it.
All examples will assume that your application is under heavy load, or you have a very slow database.
List findAll()
When this call is made, the underlying thread that performs the call, will call the database and then wait for the answer to be returned from the database. During this waiting, it will basically do nothing. It will sit there and do nothing until the database responds with the List of customers.
As you can understand, this is usually a waste of resources (memory) having threads just waiting for responses.
Mono<List> findAll()
This type of call will call the database and ask for a List of customers, if the database is slow, here the thread will not wait it will actually start doing something else. Maybe do other calls to the database, or process something else its free for the server to decide. Here you could technically say that you are making a async call to the database and the thread is free to do anything else while the database is processing the request.
This makes the use of threads more efficient, making sure that all threads always has something to do.
When the response comes back from the database we deliver the entire List<Customer> out to the calling client.
Flux findAll()
Here we ask for a list of Customers but we dont want our response as a full list in one go. Instead we are basically say "give me all customers, but deliver them when you find them in a as-you-go manor".
It doesn't hand you a giant list in one go as the two previous examples but instead it might first give you 8 customers, then 10, then another 8, then 15 in a flow until all Customers are delivered.
This is usually only noticeable for us humans if you have very large lists. If it is only a couple of entries to us it looks like the list got delivered in one go. But if you have millions of entries in the database you will notice the difference.
The first example List<T> is a blocking call and should not be done in webflux at all. Webflux has very few threads, and will try to make use of them as efficient as possible. If your threads needs to wait for the database you risk having very poor performance.
Netty (the default underlying server implementation used in webflux) runs a set number of worker threads depending on how many cores your machine has. So having one thread waiting can be quite a huge performance loss.
Second example, if you have small lists and you want to deliver lists in one go, then Mono<List<Customer>> is a good choice. But a Flux can be useful here too.
Third example, large lists, continuous flow of items, if you have an application that constantly pushes out values to a client (web sockets) think of a gambling site that pushes odds, or a stock market application pushing a constant flow of data.
Blocking db drivers
Lastly a word about database drivers. In order to use Mono, Flux against the database means you need to have a non-blocking database driver that supports the R2DBC standard.
If the database driver you are using does not follow it then all your calls will be like example one, and will be done in a blocking manor with poor performance.
There are ways to optimize such calls if you really need to talk to a db that does not support R2DBC. But these sort of db's should be avoided if possible.

Optaplanner early termination deltas between Incremental Score state and solution

I'm using an incremental score calculator class because of the heavy use of maps and calculations which did not scale well in drools.
It seems to work well, but as I've had to debug I've noticed differences between the number of moves used in the final solution and the moves processed by my before/afterVariableChanged handlers. This causes a diffenence between what is actually assigned in the solution and the state in my incremental score objects. Based on some logs it looks like the incremental does not know when the solution has stopped accepting moves because of early termination (i'm using secondsSpentLimit).
How can I stop my incremental score implentation from receiving before/afterVariableChanged events that would not be considered in the solution because of early termination?
Can you turn on TRACE logging to confirm this behavior?
In essence, LocalSearch looks like this (with moveThreadCount=NONE (= default)):
for (each step && not terminated) {
for (each move && not terminated) {
doMove(); // Triggers variable listeners
if (better move) updateBestSolution();
undoMove(); // Triggers variable listeners
So termination is atomic versus move evaluation. So your issue should have another cause.
With moveThreadCount=4 etc this story becomes a bit more complex, but each move thread has it's own ScoreDirector, so they pretty much work in isolation.
That being said, incremental java score calculation and shadow variables together can drive people insane... This is mainly caused because the before events happen in the middle of doMove()'s, right before a variable changes. At that time, half of the model can already be in an intermediate state that is impossible to understand, due to other variables that move was already changing. FWIW, take a look at VariableListener.requiresUniqueEntityEvents(), it might help a bit...
Or try ConstraintStreams (BAVET implementation if you favor speed over functionality).

Single Stored procedure call or multiple database calls?

I am developing a MVC application where the view can fetch data using a stored procedure call which has multiple different queries or calling all of them individually directly from model. I am really confused about which approach makes a good practice ?
I would go for a single call to the stored procedure in almost all cases (I will talk of the exception at the end).
Whoever maintains the stored procedure may introduce some extra logic (be it for performance reasons or business reasons) that you will have to either duplicate in your own direct calls, or - even worse - might miss altogether due to miscommunication with whoever is maintaining the stored procedure. (Even if you maintain both, you will have to spend effort on duplication). This is #1 reason: using a dedicated interface ensures correctness and avoid duplication
Every time you interact with your DB you incur in a small (but not null) overhead to open the connection (or retrieve it from a pool), marshalling and unmarshalling data over the wire, network latency and so on. Having a single entry point (your Stored Procedure) will amortize these better.
It is possible (but it really depends on a lot of different factors so it is not a guarantee of anything) that the DB engine can further optimize its workload by having everything in a single transaction context. I.e. maybe it issues two consecutive queries which are similar enough that a some of the indexes/records are buffered in the DB cache and so can be accessed faster by the second query).
Possible exception: your application has a sort of "zooming" process where you load first the header of a multirecord structure, and need the lower level details only when/if the user requires those. In this case it might be better to access these on the fly. I would still prefer the previous solution unless I can prove (i.e. testing with realistic loads) that the detail records are big enough to make this a burden on the front-end.
Please resist the temptation to just decide that it is more efficient this way unless you have hard data to backup your "insight". Most of the time this will prove to be a mistake.

Erlang ETS tables versus message passing: Optimization concerns?

I'm coming into an existing (game) project whose server component is written entirely in erlang. At times, it can be excruciating to get a piece of data from this system (I'm interested in how many widgets player 56 has) from the process that owns it. Assuming I can find the process that owns the data, I can pass a message to that process and wait for it to pass a message back, but this does not scale well to multiple machines and it kills response time.
I have been considering replacing many of the tasks that exist in this game with a system where information that is frequently accessed by multiple processes would be stored in a protected ets table. The table's owner would do nothing but receive update messages (the player has just spent five widgets) and update the table accordingly. It would catch all exceptions and simply go on to the next update message. Any process that wanted to know if the player had sufficient widgets to buy a fooble would need only to peek at the table. (Yes, I understand that a message might be in the buffer that reduces the number of widgets, but I have that issue under control.)
I'm afraid that my question is less of a question and more of a request for comments. I'll upvote anything that is both helpful and sufficiently explained or referenced.
What are the likely drawbacks of such an implementation? I'm interested in the details of lock contention that I am likely to see in having one-writer-multiple-readers, what sort of problems I'll have distributing this across multiple machines, and especially: input from people who've done this before.
first of all, default ETS behaviour is consistent, as you can see by documentation: Erlang ETS.
It provides atomicity and isolation, also multiple updates/reads if done in the same function (remember that in Erlang a function call is roughly equivalent to a reduction, the unit of measure Erlang scheduler uses to share time between processes, so a multiple function ETS operation could possibly be split in more parts creating a possible race condition).
If you are interested in multiple nodes ETS architecture, maybe you should take a look to mnesia if you want an OOTB multiple nodes concurrency with ETS: Mnesia.
(hint: I'm talking specifically of ram_copies tables, add_table_copy and change_config methods).
That being said, I don't understand the problem with a process (possibly backed up by a not named ets table).
I explain better: the main problem with your project is the first, basic assumption.
It's simple: you don't have a single writing process!
Every time a player takes an object, hits a player and so on, it calls a non side effect free function updating game state, so even if you have a single process managing game state, he must also tells other player clients 'hey, you remember that object there? Just forget it!'; this is why the main problem with many multiplayer games is lag: lag, when networking is not a main issue, is many times due to blocking send/receive routines.
From this point of view, using directly an ETS table, using a persistent table, a process dictionary (BAD!!!) and so on is the same thing, because you have to consider synchronization issues, like in objects oriented programming languages using shared memory (Java, everyone?).
In the end, you should consider just ONE main concern developing your application: consistency.
After a consistent application has been developed, only then you should concern yourself with performance tuning.
Hope it helps!
Note: I've talked about something like a MMORPG server because I thought you were talking about something similar.
An ETS table would not solve your problems in that regard. Your code (that wants to get or set the player widget count) will always run in a process and the data must be copied there.
Whether that is from a process heap or an ETS table makes little difference (that said, reading from ETS is often faster because it's well optimized and doesn't perform any other work than getting and setting data). Especially when getting the data from a remote node. For multple readers ETS is most likely faster since a process would handle the requests sequentially.
What would make a difference however, is if the data is cached on the local node or not. That's where self replicating database systems, such as Mnesia, Riak or CouchDB, comes in. Mnesia is in fact implemented using ETS tables.
As for locking, the latest version of Erlang comes with enhancements to ETS which enable multiple readers to simultaneously read from a table plus one writer that writes. The only locked element is the row being written to (thus better concurrent performance than a normal process, if you expect many simultaneous reads for one data point).
Note however, that all interaction with ETS tables is non-transactional! That means that you cannot rely on writing a value based on a previous read because the value might have changed in the meantime. Mnesia handles that using transactions. You can still use the dirty_* functions in Mneisa to squeeze out near-ETS performance out of most operations, if you know what you're doing.
It sounds like you have a bunch of things that can happen at any time, and you need to aggregate the data in a safe, uniform way. Take a look at the Generic Event behavior. I'd recommend using this to create an event server, and have all these processes share this information via events to your server, at that point you can choose to log it or store it somewhere (like an ETS table). As an aside, ETS tables are not good for peristent data like how many "widgets" a player has - consider Mnesia, or an excellent crash only db like CouchDB. Both of these replicate very well across machines.
You bring up lock contention - you shouldn't have any locks. Messages are processed in a synchronous order as they are received by each process. In fact, the entire point of the message passing semantics built into the language is to avoid shared-state concurrency.
To summarize, normally you communicate with messages, from process to process. This is hairy for you, because you need information from processes scattered all over the place, so my recommendation for you is based of the idea of concentrating all information that is "interesting" outside of the originating processes into a single, real-time source.

Do I really need to use transactions in stored procedures? [MSSQL 2005]

I'm writing a pretty straightforward e-commerce app in, do I need to use transactions in my stored procedures?
Read/Write ratio is about 9:1
Many people ask - do I need transactions? Why do I need them? When to use them?
The answer is simple: use them all the time, unless you have a very good reason not to (for instance, don't use atomic transactions for "long running activities" between businesses). The default should always be yes. You are in doubt? - use transactions.
Why are transactions beneficial? They help you deal with crashes, failures, data consistency, error handling, they help you write simpler code etc. And the list of benefits will continue to grow with time.
Here is some more info from
Remember in SQL Server all single statement CRUD operations are in an implicit transaction by default. You just need to turn on explict transactions (BEGIN TRAN) if you need to make multiple statements act as an atomic unit.
The answer is, it depends. You do not always need transaction safety. Sometimes it's overkill. Sometimes it's not.
I can see that, for example, when you implement a checkout process you only want to finalize it once you gathered all data, etc.. Think about a payment f'up, you can rollback - that's an example when you need a transaction. Or maybe when it's wise to use them.
Do you need a transaction when you create a new user account? Maybe, if it's across 10 tables (for whatever reason), if it's just a single table then probably not.
It also depends on what you sold your client on and who they are, and if they requested it, etc.. But if making a decision is up to you, then I'd say, choose wisely.
My bottom line is, avoid premature optimization. Build your application, keep in mind that you may want to go back and refactor/optimize later when you need it. Look at a couple opensource projects and see how they implemented different parts of their app, learn from that. You'll see that most of them don't use transactions at all, yet there are huge online stores that use them.
Of course, it depends.
It depends upon the work that the particular stored procedure performs and, perhaps, not so much the "read/write ratio" that you suggest. In general, you should consider enclosing a unit of work within a transaction if it is query that could be impacted by some other, simultaneously running query. If this sounds nondeterministic, it is. It is often difficult to predict under what circumstances a particular unit of work qualifies as a candidate for this.
A good place to start is to review the precise CRUD being performed within the unit of work, in this case within your stored procedure, and decide if it a) could be affected by some other, simultaneous operation and b) if that other work matters to the end result of this work being performed (or, even, vice versa). If the answer is "Yes" to both of these then consider wrapping the unit of work within a transaction.
What this is suggesting is that you can't always simply decide to either use or not use transactions, rather you should apply them when it makes sense. Use the properties defined by ACID (Atomicity, Consistency, Isolation, and Durability) to help decide when this might be the case.
One other thing to consider is that in some circumstances, particularly if the system must perform many operations in quick succession, e.g., a high-volume transaction processing application, you might need to weigh the relative performance cost of the transaction. Depending upon the size of the unit of work, a commit (or rollback) of a transaction can be resource expensive, perhaps negatively impacting the performance of your system unnecessarily or, at least, with limited benefit.
Unfortunately, this is not an easy question to precisely answer: "It depends."
Use them if:
There are some errors that you may want to test for and catch which won't be caught except by you going out and doing the work (looking things up, testing values, etc.), usually from within a transaction so that you can roll back the whole operation.
There are multi-step operations of any sort, which should, logically, be rolled back as a group if they fail.