Can memcached be used for locking? - locking

memcached can be used for a caching static data which reduces database lookup and typically does memcached.get(id) and memcached.set(id).
However is it fine to use this for locking mechanisms? Does memcache.set and memcached.get always give the data if it is present or will it just return None if the request is taking too much time?
I want to avoid concurrent access to a particular resource identified by a id and I use this logic:
def access(id):
if memcache.get(id):
return access
else:
memcache.set(id)
return true
If any user tries to access that resource, if memcache.get(id) = username returns a value we decline the access else we do memcache.set(id) = username to stop subsequent access and allow access for the current user.
Is it fine to using memcached like this? Will set and get actually give the data if its available regardless of the time it takes or does it give best possible result in the least possible time from whatever I have found (for example: Guaranteed memcached lock) so far is of the former category and might not work for locking thought it might work 99% of the time.
Can anyone clarify and if there are alternative locking mechanisms?

For anyone intersted in this, I have created a thread on Memcache Github at Will memcached work reliably for implementing a locking mechanism?. It explains some of the common caveats using get and set and how to avoid that using add. Some blogs also explain this problem if you can search for distributed locking using memcache on your favorite search engine.
There is also a related question Memcached, Locking and Race Conditions which might help on getting more clarity on memcache race conditions.
Here is more discussions on this at the Memcache Forum:
Thread 1 and Thread 2

Related

How to cache connections to different Postgres/MySQL databases in Golang?

I am having an application where different users may connect to different databases (those can be either MySQL or Postgres), what might be the best way to cache those connections across different databases? I saw some connection pools but seems like they are more for one db multiple connections than for multiple db multiple connections.
PS:
For adding more context, I am designing a multi tenant architecture where each tenant connects to one or multiple databases, I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases. Or should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
An example for the software that I want to build is https://www.sigmacomputing.com/ where the user can connects to multiple databases for working with different tables.
Both MySQL and Postgres do not allow to connection sharing between multiple database users, single database user is specified in connection credentials. If you mean that your different users have their own database credentials, then it is not possible to share connections between them.
If by "different users" you mean your application users and if they share single database user to access DB deeper in the app, then you don't need to do anything particular to "cache" connections. sql.DB keeps and reuses open connections in its pool by default.
Go automatically opens, closes and reuses DB connections with a *database/sql.DB. By default it keeps up to 2 connections open (idle) and opens unlimited number of new connections under concurrency when all opened connections are already busy.
If you need some fine tuning on pool efficiency vs database load, you may want to alter sql.DB config with .Set* methods, for example SetMaxOpenConns.
You seem to have to many unknowns. In cases like this I would apply good, old agile and start with prototype of what you want to achieve with tools that you already know and then benchmark the performance. I think you might be surprised how much go can handle.
Since you understand how to use map[string]*sql.DB for that purpose I would go with that. You reach some limits? Add another machine behind haproxy. Solving scaling problem doesn't necessary mean writing new db pool in go. Obviously if you need this kind of power you can always do it - pgx postgres driver has it's own pool implementation so you can get your inspiration there. However doing this right now seems to be pre-mature optimization - solving problem you don't have yet. Building prototype with map[string]*sql.DB is easy, test it, benchmark it, you will see if you need more.
p.s. BTW you will most likely hit first file descriptor limit before you will be able to exhaust memory.
Assuming you have multiple users with multiple databases with an N to N relation, you could have a map of a database URL to database details (explained below).
The fact that which users have access to which databases should be handled anyway using configmap or a core database; For Database Details, we could have a struct like this:
type DBDetail {
sync.RWMutex
connection *sql.DB
}
The map would be database URL to database's details (dbDetail) and if a user is write it calls this:
dbDetail.Lock()
defer dbDetail.Unock()
and for reads instead of above just use RLock.
As said by vearutop the connections could be a pain but using this you could have a single connection or set the limit with increment and decrement of another variable after Lock.
There isn’t necessarily a correct architectural answer here. It depends on some of the constraints of the system.
I have an option for using map[string]*sql.DB where the key is the url of the database, but it can be hardly scaled when we have numerous number of databases.
Whether this will scale sufficiently depends on the expectation of how numerous the databases will be. If there are expected to be tens or hundreds of concurrent users in the near future, is probably sufficient. Often a good next step after using a map is to transition over to a more full featured cache (for example https://github.com/dgraph-io/ristretto).
A factor in the decision of whether to use a map or cache is how you imagine the lifecycle of a database connection. Once a connection is opened, can that connection remain opened for the remainder of the lifetime of the process or do connections need to be closed after minutes of no use to free up resources.
Should we have a sharding layer for each incoming request sharded by connection url, so each machine will contain just the right amount of database connections in the form of map[string]*sql.DB?
The right answer here depends on how many processing nodes are expected and whether there will be gain additional benefits from routing requests to specific machines. For example, row-level caching and isolating users from each other’s requests is an advantage that would be gained by sharing users across the pool. But a disadvantage is that you might end up with “hot” nodes because a single user might generate a majority of the traffic.
Usually, a good strategy for situations like this is to be really explicit about the constraints of the problem. A rule of thumb was coined by Jeff Dean for situations like this:
Ensure your design works if scale changes by 10X or 20X but the right solution for X [is] often not optimal for 100X
https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf
So, if in the near future, the system needs to support tens of concurrent users. The simplest that will support tens to hundreds of concurrent users (probably a map or cache with no user sharding is sufficient). That design will have to change before the system can support thousands of concurrent users. Scaling a system is often a good problem to have because it usually indicates a successful project.

Multiple application on network with same SQL database

I will have multiple computers on the same network with the same C# application running, connecting to a SQL database.
I am wondering if I need to use the service broker to ensure that if I update record A in table B on Machine 1, the change is pushed to Machine 2. I have seen applications that need to use messaging servers to accomplish this before but I was wondering why this is necessary, surely if they connect to the same database, any changes from one machine will be reflected on the other?
Thanks :)
This is mostly about consistency and latency.
If your applications always perform atomic operations on the database, and they always read whatever they need with no caching, everything will be consistent.
In practice, this is seldom the case. There's plenty of hidden opportunities for caching, like when you have an edit form - it has the values the entity had before you started the edit process, but what if someone modified those in the mean time? You'd just rewrite their changes with your data.
Solving this is a bunch of architectural decisions. Different scenarios require different approaches.
Once data is committed in the database, everyone reading it will see the same thing - but only if they actually get around to reading it, and the two reads aren't separated by another commit.
Update notifications are mostly concerned with invalidating caches, and perhaps some push-style processing (e.g. IM client might show you a popup saying you got a new message). However, SQL Server notifications are not reliable - there is no guarantee that you'll get the notification, and even less so that you'll get it in time. This means that to ensure consistency, you must not depend on the cached data, and you have to force an invalidation once in a while anyway, even if you didn't get a change notification.
Remember, even if you're actually using a database that's close enough to ACID, it's usually not the default setting (for performance and availability, mostly). You need to understand what kind of guarantees you're getting, and how to write code to handle this. Even the most perfect ACID database isn't going to help your consistency if your application introduces those inconsistencies :)

Advantage of LDAP over RDBMS?

I have an application with a backend as database.
The application is sort of PUB-SUB model where users post changes to the application and other peers subscribe to those changes. These changes may happen very frequently or periodically and all the changes have to be written to database.
Now, I am being asked to find the possibility of replacing this RDBMS with LDAP. Probably they want unified DB for all applications but anyways I have to find the advantage/disadvantages of both approaches.
I cannot directly compare RDBMS a with LDAP as I have almost no idea of LDAP though I tried to get some.
I understand that LDAP is designed for directory access and is optimized for Read access, so it is write once and read many. I have read that frequent writes will reduce the performance of LDAP server as each write will result a trigger to indexing process.
Just to give a scenario in regards with indexing in LDAP, my table will have few columns say 2 viz. Name and Desc. Now in LDAP I suppose this would become two attributes as Name and Desc. In my scenario it's Desc which will be frequently updated. I assume Name will be indexed so even if Desc is changing frequently it won't trigger indexing process.
I point is worth mentioning that the database will be hosted on some cloud platform.
I tried to find out the differences but nothing conclusive I could find out.
LDAP is a protocol, REST is a service based on the HTTP (protocol). So when the LDAP server shall not be exposed to the internet, how do you want to get the data from it? As LDAP is the protocol you would need direct access to the LDAP-server. Its like a database server that you would not expose directly to the internet. You would build an interface to encapsulate it. and that might as well be a REST interface.
I'd try to get the point actos that one is the transfer protocol and a storage backend and the ither is the public interface to its data. It's a bit like why is mysql better than a webinterface. You'd never make the mysql-server publicly available but encapsulate its protocol into an application.
REST is an interface. It doesn't matter how you orgsnize your data behind that interface. When you decide that you want to organize it differently you can do so without the consumer of your API noticing any change. And you can provide different versions of your API depending on improvements of your service.
LDAP on the other hand is an implementation. You can't change the way your data is handled without the consumer noticing it. So there's no way to rearrange your backend without affecting the consumer.
With REST you can therefore change the backend from MySQL to PostgreSQL even to LDAP without notice which you won't be able with LDAP.
Hope that helps
Now that we finally know what you're actually asking, which has nothing to do with your title, the body of your question, or REST, the simple answer is that there is no particular reason to believe that an LDAP server will perform significantly better than an RDBMS in this application, with two riders:
it may not even be feasible, due to the schema issue, and
if it is feasible it may not be semantically suitable, due to the lack of ACID properties, lack of JOINs, and the other issues mentioned in comments.
I will state that this is one of the worst formulated questions I have seen here for some considerable time, and the difficulty of extracting the actual question was extreme.

Log user activity - which is better

I am using Action Filter Attributes for loging user activity on certain action which has SQL database interaction. Similarly I can log the activity in the SQL tables using triggers on tables during each activity on the tables. I would like to know which of the above two methods is a best practice ( perfomance wise )
I think that the actionfilter is certainly the cleanest and best practice appraoch since it is in the application layer. Part of the benefit of being there is its managed code and if something breaks you can easily locate the problem. There is also the benefit that all your code is in one spot too.
Database triggers are a big no no in many companies since they have a habit of causing infinite loop well an unknowing programmer creates some logic that steps on the trigger over and over again causing the database to fail. Some companies do allow triggers but very well documented and very lightly used. Hope this helps.
Performance of logging depends greatly on the system architecture. If you have 3 load balanced web servers hitting one main database, triggers would have to handle all the load while Action Filters would split the load in three. In that scenario, Action Filters would be better.
In terms of best practices, I wouldn't use either of those approaches. I would set up Transactional Replication to another SQL server. This approach would run without impacting performance at all. The transaction log is already being generated and replication would just spin up a separate process that's reading that log.

NHibernate - counters with concurrency and second-level-caching

I'm new to NHibernate and am having difficulties setting it up for my current website. This website will run on multiple webservers with one database server, and this leaves me facing some concurrency issues.
The website will have an estimated 50.000 users or so registered, and each user will have a profile page. On this page, other users can 'like' another user, much like Facebook. This is were the concurrency problem kicks in.
I was thinking of using second-level cache, most likely using the MemChached provider since I'll have multiple webservers. What is the best way to implement such a 'Like' feature using NHibernate? I was thinking of three options:
Use a simple Count() query. There will be a table 'User_Likes' where each row would represent a like from one user to another. To display the number the number of likes, I would simply ask the number of Likes for a user, which would be translated to the database as a simple SELECT COUNT(*) FROM USER_LIKES WHERE ID = x or something. However, I gather this would be come with a great performance penalty as everytime a user would visit a profile page and like another user, the number of likes would have to be recalculated, second-level cache or not.
Use an additional NumberOfLikes column in the User table and increment / decrement this value when a user likes or dislikes another user. This however gives me concurrency issues. Using a simple for-loop, I tested it by liking a user 1000 times on two servers and the result in the db was around 1100 likes total. That's a difference of 900. Whether a realistic test or not, this is of course not an option. Now, I looked at optimistic and pessimistic locking as a solution (is it?) but my current Repository pattern is, at the moment, not suited to use this I'm afraid, so before I fix that, I'd like to know if this is the right way to go.
Like 2, but using custom HQL and write the update statement myself, something along the lines of UPDATE User SET NumberOfLikes = NumberOfLikes + 1 WHERE id = x. This won't give me any concurrency issues in the database right? However, I'm not sure if I'll have any datamismatch on my multiple servers due to the second level caching.
So... I really need some advice here. Is there another option? This feels like a common situation and surely NHibernate must support this in an elegant manner.
I'm new to NHIbernate so a clear, detailed reply is both necessary and appreciated :-) Thanks!
I suspect you will see this issue in more locations. You could solve this specific issue with 3., but that leaves other locations where you're going to encounter concurrency issues.
What I would advise is to implement pessimistic locking. The usual way to do this is to just apply a transaction to the entire HTTP request. With the BeginRequest in your Global.asax, you start a session and transaction. Then, in the EndRequest you commit it. With the Error event, you go the alternative path of doing a rollback and discarding the session.
This is quite an accepted manner of applying NHibernate. See for example http://dotnetslackers.com/articles/aspnet/Configuring-NHibernate-with-ASP-NET.aspx.
I'd go with 3. I believe this in this kind of application it's not so critical if some pages show a slightly outdated value for a while.
IIRC, HQL updates do not invalidate the entity cache entry, so you might have to do it manually.