Prevent entry of GemFire cache being accessed by more than one request - gemfire

I have an application using Springboot, Gemfire and MySQL. The Springboot application serves as a rest api. I want to "lock" the cache entry so that only one request sent to rest api can access certain entry in GemFire at a time. Others cannot do CRUD on that entry until the entry owner release the possession. I have two approaches as of now.
Approach 1 - Create a GemFire function, which performs a lock/unlock on the entry when invoked by rest api(at different time) using org.apache.geode.cache.Region.getDistributedLock.
Approach 2 - Create a region(eg. Lock) where an entry is created when an entry of target region(eg. Customer) is accessed for the fist time. When the 2nd request wants to access the same entry, the rest api checks the region Lock first. Rest api retrieves and returns the entry from region Customer if the key does not exist in region Lock. Otherwise, no entry will be returned. Once the first requester finishes, rest api removes the entry in region Lock.
I am wondering if there are any alternatives besides these two options.

If you want a more space efficient solution, you could add a boolean field to the value to indicate if it was locked. You can then use region.replace(K,V,V) to efficiently set the "lock" on the entry as well. Although, this will leak your locking concerns into your business objects.

Related

Implementing a RMW operation in Redis

I would like to maintain comma separated lists of entries of the following form <ip>:<app> indexed by a an account ID. There would be one such list for each user indexed by their account ID with the number of users in the millions. This is mainly to track which server in a cluster a user using a certain application is connected to.
Since all servers are written in Java, with Redisson I'm currently doing:
RSet<String> set = client.getSet(accountKey);
and then I can modify the set using some typical Java container APIs supported by Redisson. I basically need three types of updates to these comma separated lists:
Client connects to a new application = append
Client reconnects with existing application to new endpoint = modify
Client disconnects = remove
A new connection would require a change to a field like:
1.1.1.1:foo,2.2.2.2:bar -> 1.1.1.1:foo,2.2.2.2:bar,3.3.3.3:baz
A reconnect would require an update like:
1.1.1.1:foo,2.2.2.2:bar -> 3.3.3.3:foo,2.2.2.2:bar
A disconnect would require an update like:
1.1.1.1:foo,2.2.2.2:bar -> 2.2.2.2:bar
As mentioned the fields would be keyed by the account ID of the user.
My question is the following: Without using Redisson how can I implement this "directly" on top of Redis commands? The goal is to allow rewriting certain components in a language different than Java. The cluster handles close to a million requests per second.
I'm actually quite curious how Redisson implements an RSet under the hood and I haven't had time to dig into it. I guess one option would be to use Lua, but I've never used it with Redis. Any ideas how to efficiently implement these operations on top of Redis on a manner that is easily supported by multiple languages, i.e. not relying on a specific library?
Having actually thought about the problem properly, it can be solved directly with a HSET. Where <app> is the field name and the value are the IPs. Keys being user accounts.

Multiple Dbset/entity modification with single call to SaveChanges()

I am working on a .NET Core Web API which needs to interact using EF Core 5.0.2 with an Azure SQL database.
I have different repository methods where I am interacting with DbContext to add/edit/delete records for different DbSet.
For example:
UserRepository.AddUser(userdata);
Implementation of AddUser is like this,
ourDbContext.UserTable.AddAsync(userdata);
So in user service method, am calling different repository method sequentially and none of those methods call ourDbContext.SaveChangesAsync() individually. A single call to SaveChanges is present after all the repository methods calls which is acting like a unit of work pattern for all the calls as single transaction.
Example:
UserRepository.AddUser(userdata);
ActivityRepository.AddActivity("New User got added");
ourDbContext.SaveChangesAsync();
So my question is: if any saving changes to any of the tables/entities fails, will the previous successful tables change will be rolled back?
For example, suppose this operation
UserRepository.AddUser(userdata);
was successful and the new user record was added to the User table.
But this was not successful:
ActivityRepository.AddActivity("New User got added");
So no activity record was added to the Activity table.
Will SaveChangesAsync() be able to handle this situation automatically and will roll back User table new changes as well?
If not are we supposed to wrap the above codes with transaction scope? Or what is the recommended way to do it.
Briefly how DbContext's Change Tracker works:
You load entities: ChangeTracker remembers current values of all loaded entities (except you use AsNoTracking())
You have modified loaded entities, delete, add new.
You call SaveChanges: ChangeTracker starts searching which objects are changed since last load by comparing with previous values.
DML SQL is generated and everything saved in one SQL statement or in several statements in Transaction.
So, if you have one DbContext for each repository, you do not need to worry about rollbacking, just do not call SaveChanges(). For sure for restart process, you have to recreate DbContext because it contains not needed state.

Rest API and UUID

One of the reasons, and probably the main one, to use UUID is to avoid having a "centralized" point responsible for creating and assigning ids to resources.
That means that, for REST APIs, the clients could (and should) be able to generate, and give the UUID for a certain resource when they POST that specific resource for the first time. That would minimize problems related with successfully posting a resource for the first time but not getting the ID back as response (connectivity problems for example). That can result in a new post for some of the clients, generating duplicated resources.
My question are:
Having UUID generated by clients and being exposed in the REST API is a best practice?
Are there any other alternatives to that?
REST does not really care if the UUID is generated by the server or by the client. It just needs a unique resource-identifier in form of an URI.
What form the URI has, is not important to clients and servers - only that they are unique and may be obtained by clients (HATEOAS). You need of course also a resource on the server side which is able to create the sub-resource for you and understands that you want to provide the UUID instead of generating an own one. Instead of a UUID you could f.e. also use a url-encoded title of a blog-post or like this question a combination of hash-value and question-title 31584303/rest-api-and-uuid to uniquely identify a resource.
Generating a UUID by a client is in my opinion not used that ofen in practice, but I may be wrong on this matter. The actual question is rather, why should a client really provide an own UUID instead of letting the server create one? The client is, IMO, only interested in getting the data to the server and having some way to retrieve it at some later point, which will be provided through the location header returned in the response of a POST request.
If connection issues are an actual concern, you could let the client send an empty POST to create a resource and send back the location within the header. The content is than added via a PUT request.
There still can be some connection issues involved:
request does not reach the server
response does not reach the client
While the primer one is no issue for the client as well as for the server as no operation is executed and the request can simply be resent, the latter one will actually create a resource at the server side, though the link never reaches the client. Therefore the actual resource is in an unreferencable state unless you provide a way to iterate over all resources, which contains also the empty ones.
A server can have a cleanup thread in the back which removes empty resources after a given amount of time. If a client sends a further empty POST request but this time also receives the URI of the created resource, he can update the state of the resource via PUT. PUT is idempotent. If the server did not receive the request, the client can simply resent it. PUT has the semantic of updating existing or creating a new resource if it is not yet available. So, the server can create the resource in that case with the provided content. If the request did reach the server, a further update does not change the state of the resource.
one advantage of client-generated UUID is: the client knows the resource key even before creating the resource. no need to read the response of the POST/PUT except for errors

Gemfire region with data expiration

Regarding this document, "entry-time-to-live-expiration" means How long the region's entries can remain in the cache without being accessed or updated. The default is no expiration of this type. However, when I use Spring Cache and client-region with following configuration, I find that setting dose not work well with being accessed. Going forward, regarding this document-> XMLTTL tab, it said "Configures a replica region to invalidate entries that have not been modified for 15 seconds.". So I am confused if TTL work for being accessed.
<gfe:client-region id="Customer2" name="Customer2" destroy="false" load-factor="0.5" statistics="true" cache-ref="client-cache">
<gfe:entry-ttl action="DESTROY" timeout="60"/>
<gfe:eviction threshold="5"/>
</gfe:client-region>
So, the documentation you might want refer to is here and here. Perhaps relevant to your situation is...
"Requests for entries that have expired on the consumers will be forwarded to the producer."
Based on your configuration, given you did not set either a ClientRegionShortcut or DataPolicy, your Client Region, "Customer2", defaults to ClientRegionShortcut.LOCAL, which sets a DataPolicy of "NORMAL". DataPolicy.NORMAL states...
"Allows the contents in this cache to differ from other caches. Data that this region is interested in is stored in local memory."
And for the shortcut of "LOCAL"...
"A LOCAL region only has local state and never sends operations to a server. ..."
However, it does not mean the client Region cannot receive data (of interests) from the Server. It simply implies operations are not distributed to the Server. It may be expiring the entry and then repopulating it from the Server (producer).
Of course, I am speculating and have not tested these ideas. You might try setting the Expiration Action to "LOCAL_DESTROY" and/or changing your distribution properties through different ClientRegionShortcuts.
Post back if you are still having problems. I too echo what #hubbardr is asking.
Cheers!

PUT vs POST in an audit-table or revision history situation

Let's say I have a REST method to update a record. That would obviously be a POST because it's updating a resource. However in the same motion, a new record in an audit or revision history table needs to be created.
Is there a standard or best practice here, of whether to use POST or PUT?
Does the REST method come from what is happening on the user side, or does it come from what happens in the database?
One possibility is to call just one method, which updates a record in one table and creates a new record in another table.
Another possibility would be to force that the POST only updates one table, and would require an additional method to do a PUT in the audit table. This forces the use of 2 methods and puts the responsibility on the developer, which I'm not too keen on.
PUT is actually recommended for the replacement (update) of an existing record.
The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server.
There is also some information about the difference between POST and PUT:
The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity. That resource might be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended and the server MUST NOT attempt to apply the request to some other resource.
See here.
To me it sounds like you should use a PUT request to update the resource. Auditing is a side-effect of doing that, and so it should be handled as part of PUTting the new resource.