is SETNX truly atomic? - redis

Redis has SETNX command.
I read the documentation. However, it doesn't explicitly says whether it's atomic or not.
I read that it's not recommended to be used for distributed locks. However, my case is a little bit simpler. All I need to make sure that whomever comes first (first caller) set the value and second caller receives an error (SETNX will return 0).
So, the question is, is it truly atomic?

Yes, SETNX is atomic and will do what you ask regardless of how many callers there are.
Individual Redis commands are essentially always atomic, since Redis is single-threaded. So the documentation doesn't bother to specify that for every single command. (Perhaps the most direct reference comes from the first line of the FAQ: "Redis is a different evolution path in the key-value DBs where values can contain more complex data types, with atomic operations defined on those data types.")
When it comes to setting and releasing locks you're no longer dealing with a single command, so that's why there are other considerations. But the SETNX command itself is atomic.

If you need to perform atomic operations you can use MULTI and EXEC to group commands into a transaction, at which point all commands will be atomic.
You can read more on their website: https://redis.io/topics/transactions
Make sure you read the whole document because there are some important notes about failures.
It's important to note that even when a command fails, all the other commands in the queue are processed – Redis will not stop the processing of commands.

Related

Combining code that relies on different transaction isolation levels in Postgres

I have two functions which both require a transaction. One is calling the other. I have code that can nest such transactions using SAVEPOINT into a single one.
If they have the same transaction isolation level there is no problem. Now, if they do not, is there still way I could 'correctly' combine the transactions?
What would be the risk, other than decreased performance, if I ran both transaction under the most restrictive isolation level of the two?
In this situation, yes, generally you can combine transaction into the more restrictive isolation level.
The risk is pretty much that higher isolation level is going to catch more serialisation errors (i.e. ERROR: could not serialize access due to concurrent update in REPEATABLE READ and ERROR: could not serialize access due to read/write dependencies among transactions in SERIALIZABLE). The typical way to handle these serialisation failures is to retry the transactions, but you should verify whether this makes sense within the context of your application.
Another possible error that might occur is dead locks. Postgres should detect these and break the dead lock (after which the failing transaction should retry), but if you can, you should always try to write your application so dead locks can't exists in the first place. Generally, the main technique to avoid dead lock is to make sure that all applications that acquires any locks (implicit or explicit locks) to acquire those locks in consistent order.
You may need to take special care if your application needs to make requests to another external service, as you may need to verify whether the retry are going to cause you to make unwanted duplicate requests, especially if these external requests are not idempotent.

In a redis server side evaluated lua script should keys be used instead of scan?

Given that you need to traverse the keyspace and that the script is going to block until it has finished regardless of what it is doing - is it better to just use 'keys' and get this over with as quickly as possibly as opposed to multiple calls to 'scan'?
You're not supposed to iterate the entire keyspace, as this is a slow operation. If you really must do that, and insist on using server-side Lua, you'd be better off with SCAN, as it will not consume as much memory as KEYS may consume (for the reply).

Transactional behavior across in-memory objects

I want to make a sequence of in-memory operations atomic. I presume there is no framework supplied functionality for this and that I would have to implement my own rollback functionality using memento (or something)?
If it needs to be really atomic there is no such thing AFAIK in the Framework itself - an interesting link discussing this issue.
What you ask is called STM (Software Transactional Memory) and is an inherent part for example of Haskell.
Basically any implementation uses some sort of copy meachnism - either keeping the old data till the transaction is commited OR makring a copy first and then do all "changes" on the copy and switch references on commit... anyway always some log and/or copying mechanism involved...
For C# check these links out:
http://research.microsoft.com/en-us/downloads/6cfc842d-1c16-4739-afaf-edb35f544384/default.aspx
http://download.microsoft.com/download/9/5/6/9560741A-EEFC-4C02-822C-BB0AFE860E31/STM_User_Guide.pdf
http://blogs.msdn.com/b/stmteam/
IF F# is an option then check these links out:
http://cs.hubfs.net/blogs/hell_is_other_languages/archive/2008/01/16/4565.aspx
http://geekswithblogs.net/Podwysocki/archive/2008/02/07/119387.aspx
Another option could be to use an "in-memory-Database" - there are several out there with transaction support thus providing atomic operation via the DB... as long as the DB is "in-memory" it should perform well

Restarting agent program after it crashes

Consider a distributed bank application, wherein distributed agent machines modify the value of a global variable : say "balance"
So, the agent's requests are queued. A request is of the form wherein value is added to the global variable on behalf of the particular agent. So,the code for the agent is of the form :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
lock_global_variable(balance,agent_machine_id);
///////////////////// **POINT A**
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
/////////////////// **POINT B**
dequeue(); // once transaction is complete, request can be dequeued
}
Now, if an agent's code crashes at POINT B, then obviously the request should not be processed again, otherwise the variable will be modified twice for the same request. To avoid this, we can make the code atomic, thus :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
*atomic*
{
lock_global_variable(balance,agent_machine_id);
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
dequeue(); // once transaction is complete, request can be dequeued
}
}
I am looking for answers to these questions :
How to identify points in code which need to be executed atomically 'automatically' ?
IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
Again,logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
In general,how can identify such atomic blocks in case of agents that work together. If one agent fails, others have to wait for it to restart ? How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
How to make the atomic blocks more fine-grained, to reduce performance bottlenecks ?
Q> How to identify points in code which need to be executed atomically 'automatically' ?
A> Any time, when there's anything stateful shared across different contexts (not necessarily all parties need to be mutators, enough to have at least one). In your case, there's balance that is shared between different agents.
Q> IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
A> It can help, but it has high costs attached. You need to rollback X entries, replay the scenario, etc. Better approach is to either make it all-transactional or have effective automatic rollback scenario.
Q> Again, logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
A> In some cases you can relax consistency. For example, CopyOnWriteArrayList does a concurrent write-behind and switches data on for new readers after when it becomes available. If write fails, it can safely discard that data. There's also compare and swap. Also see the link for the previous question.
Q> In general,how can identify such atomic blocks in case of agents that work together.
A> See your first question.
Q> If one agent fails, others have to wait for it to restart ?
A> Most of the policies/APIs define maximum timeouts for critical section execution, otherwise risking the system to end up in a perpetual deadlock.
Q> How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
A> It can to a fair degree. However testing concurrent code requires as much skills as to write the code itself, if not more.
Q> How to make the atomic blocks more fine-grained, to reduce performance bottlenecks?
A> You have answered the question yourself :) If one atomic operation needs to modify 10 different shared state variables, there's nothing much you can do apart from trying to push the external contract down so it needs to modify more. This is pretty much the reason why databases are not as scalable as NoSQL stores - they might need to modify depending foreign keys, execute triggers, etc. Or try to promote immutability.
If you were Java programmer, I would definitely recommend reading this book. I'm sure there are good counterparts for other languages, too.

Will pool the connection help threading in sqlite (and how)?

I currently use a singleton to acces my database (see related question) but now when try to add some background processing everything fall apart. I read the sqlite docs and found that sqlite could work thread-safe, but each thread must have their own db connection. I try using egodatabase that promise a sqlite wrapper with thread safety but is very buggy, so I return to my old FMDB library I start to see how use it in multi-thread way.
Because I have all code with the idea of singleton, change everything will be expensive (and a lot of open/close connections could become slow), so I wonder if, as the sqlite docs hint, build a pooling for each connection will help. If is the case, how make it? How to know which connection to get from the pool (because 2 threads can't share the connection)?
I wonder if somebody already use sqlite in multi-threading with NSOperation or similar stuff, my searching only return "yeah, its possible" but let the details to my imagination...
You should look at using thread-local variables to hold the connection; if the variable is empty (i.e., holding something like a NULL) you know you can safely open a connection at that point to serve the thread and store the connection back in the variable. Don't know how to do this with Obj-C though.
Also be aware that SQLite is not tuned for concurrent writes. Writer locks are expensive, so keep any time in a writing transaction (i.e., one that includes an INSERT, UPDATE or DELETE) to a minimum in all threads. Transaction commits are also expensive too.