Gemfire region with data expiration - gemfire

Regarding this document, "entry-time-to-live-expiration" means How long the region's entries can remain in the cache without being accessed or updated. The default is no expiration of this type. However, when I use Spring Cache and client-region with following configuration, I find that setting dose not work well with being accessed. Going forward, regarding this document-> XMLTTL tab, it said "Configures a replica region to invalidate entries that have not been modified for 15 seconds.". So I am confused if TTL work for being accessed.
<gfe:client-region id="Customer2" name="Customer2" destroy="false" load-factor="0.5" statistics="true" cache-ref="client-cache">
<gfe:entry-ttl action="DESTROY" timeout="60"/>
<gfe:eviction threshold="5"/>
</gfe:client-region>

So, the documentation you might want refer to is here and here. Perhaps relevant to your situation is...
"Requests for entries that have expired on the consumers will be forwarded to the producer."
Based on your configuration, given you did not set either a ClientRegionShortcut or DataPolicy, your Client Region, "Customer2", defaults to ClientRegionShortcut.LOCAL, which sets a DataPolicy of "NORMAL". DataPolicy.NORMAL states...
"Allows the contents in this cache to differ from other caches. Data that this region is interested in is stored in local memory."
And for the shortcut of "LOCAL"...
"A LOCAL region only has local state and never sends operations to a server. ..."
However, it does not mean the client Region cannot receive data (of interests) from the Server. It simply implies operations are not distributed to the Server. It may be expiring the entry and then repopulating it from the Server (producer).
Of course, I am speculating and have not tested these ideas. You might try setting the Expiration Action to "LOCAL_DESTROY" and/or changing your distribution properties through different ClientRegionShortcuts.
Post back if you are still having problems. I too echo what #hubbardr is asking.
Cheers!

Related

Is there any object-storage standardized protocol for storing data+metadata?

When it comes to query databases, you can rely on SQL or when exchanging emails you can rely on SMTP.
They are both standards across vendors. We could generalize this to many other "standardized services".
This allows the backing provider of the service to be switched easily. I can change a local MySQL to a managed one, or I can switch an SMTP by another just changing the config of the application without changing the code.
Question:
When it comes to "store binaries", like for example say a company that needs to store photos sent by users, PDFs sent by providers and phone-call recordings from the quality department...
...when it's about "storing binaries over the net" is there any industry-standard in the form of "key-value-metadata" where the value is the binary, the metadata is a free-object (with possibly info about creation time, who is the creator, context of the creation, etc); and the ID is a function of the content (for example a Sha1 or whatever), that ensures anti-duplication measures by design?
When I say industry-standard I mean that I don't care if "behind" my connecting API, the "real" storage is implemented by a redis or a mysql or a plain-file-system or an S3 from AWS... just something "standard" given by multiple parties, to where I connect to and I can "easily change what's behind" by configuration and change from one storage-provider to another with the same simplicity we change our SMTP for example not caring if there's a sendmail or a postfix behind.

Prevent entry of GemFire cache being accessed by more than one request

I have an application using Springboot, Gemfire and MySQL. The Springboot application serves as a rest api. I want to "lock" the cache entry so that only one request sent to rest api can access certain entry in GemFire at a time. Others cannot do CRUD on that entry until the entry owner release the possession. I have two approaches as of now.
Approach 1 - Create a GemFire function, which performs a lock/unlock on the entry when invoked by rest api(at different time) using org.apache.geode.cache.Region.getDistributedLock.
Approach 2 - Create a region(eg. Lock) where an entry is created when an entry of target region(eg. Customer) is accessed for the fist time. When the 2nd request wants to access the same entry, the rest api checks the region Lock first. Rest api retrieves and returns the entry from region Customer if the key does not exist in region Lock. Otherwise, no entry will be returned. Once the first requester finishes, rest api removes the entry in region Lock.
I am wondering if there are any alternatives besides these two options.
If you want a more space efficient solution, you could add a boolean field to the value to indicate if it was locked. You can then use region.replace(K,V,V) to efficiently set the "lock" on the entry as well. Although, this will leak your locking concerns into your business objects.

GET or PUT to reboot a remote resource?

I am struggling (in some sense) to determine which HTTP method is more appropriate for rebooting a remote resource: GET or PUT?
On one hand, it seems more semantic to call http://tools.serviceprovider.net/canopies/d34db33fc4f3?reboot=true because one might want to GET a representation of a freshly rebooted canopy.
On the other hand, a reboot is not 'safe' (nor is it necessarily idempotent, but then a canopy or modem is not just a row in a database) so it might seem more semantic to PUT the canopy into a state of rebooting, then have the server return a 202 to indicate that the reboot was initiated and is processing.
I have been reading up on HTTP/1.1, REST, HATEOAS, and other related concepts over the last week, so I am still putting the pieces together. Could a more seasoned developer please weigh in and confirm or dispel my hunch?
A GET doesn't seem appropriate because a GET is expected, like you said, to be "safe". i.e. no action other than retrieval.
A PUT doesn't seem appropriate because a PUT is expected to be idempotent. i.e. multiple identical operations cause same side-effects as as a single operation. Moreover, a PUT is usually used to replace the content at the request URI with the request body.
A POST appears most appropriate here. Because:
A POST need not be safe
A POST need not be idempotent
It also appears meaningful in that you are POSTing a request for a reboot (much like submitting a form, which also happens via POST), which can then be processed, possibly leading to a new URI containing reboot logs/results returned along with a 303 See Other status code.
Interestingly, Tim Bray wrote a blog post on this exact topic (which method to use to tell a resource representing a virtual machine to reboot itself), in which he also argued for POST. At the bottom of that post there are links to follow-ups on that topic, including one from none other than Roy Fielding himself, who concurs.
Rest is definitely not HTTP. But HTTP definitely does not have only four (or eight) methods. Any method is technically valid (even if as an extension method) and any method is RESTful when it is self describing — such as ‘LOCK’, ‘REBOOT’, ‘DELETE’, etc. Something like ‘MUSHROOM’, while valid as an HTTP extension, has no clear meaning or easily anticipated behavior, thus it would not be RESTful.
Fielding has stated that “The REST style doesn’t suggest that limiting the set of methods is a desirable goal. [..] In particular, REST encourages the creation of new methods for obscure operations” and that “it is more efficient in a true REST-based architecture for there to be a hundred different methods with distinct (non-duplicating), universal semantics.”
Sources:
http://xent.com/pipermail/fork/2001-August/003191.html
http://tech.groups.yahoo.com/group/rest-discuss/message/4732
With this all in mind I am going to be 'self descriptive' and use the REBOOT method.
Yes, you could effectively create a new command, REBOOT, using POST. But there is a perfectly idempotent way to do reboots using PUT.
Have a last_reboot field that contains the time at which the server was last rebooted. Make a PUT to that field with the current time cause a reboot if the incoming time is newer than the current time. If an intermediate server resends the PUT, no problem -- it has the same value as the first command, so it's a no-op.
You might want to get the current time from the server you're rebooting, unless you know that everyone is reasonably time-synced.
Or you could just use a times_rebooted count, eliminating the need for a clock. A PUT times_rebooted: 4 request will cause a reboot if times_rebooted is currently 3, but not if it's 4 or 5. If the current value is 2 and you PUT a 4, that's an error.
The only advantage to using time, if you have a clock, is that sometimes you care about when it happened. You could of course have BOTH a times_rebooted and a last_reboot_time, letting times_rebooted be the trigger.

WCF Paged Results & Data Export

I've walked into a project that is using a WCF service for the data tier. Currently, when data is needed for a grid, all rows are returned and the results are bound to a grid and the dataset is stuffed into a session variable for paging/sorting/rebinding. We've already hit a max message size problem, so I'm thinking it's time to convert from fetch and cache to fetch only the current page.
Face value this seems easy enough, but there's a small catch. The user is allowed to export the entire result set at any point. This means that for grid viewing purposes fetching the current page is fine, but when they want to do an export, I still need to make a call for all data.
This puts me back into the max message size issue. What is the recommended approach for this type of setup?
We are currently using the wsHttpBinding...
Thanks for any assistance.
I think the recommended approach for large files is to use WCF streaming. I'm not sure the exact details for your scenario, but you could take a look at this as a starting point:
http://msdn.microsoft.com/en-us/library/ms789010.aspx
I would probably do something like this in your case
create a service with a "paged" GetData() method - where you specify the page index and the page size as additional parameters. This should give you a nice clean interface for "regular" use, and that should not hit the maxMessageSize limits
create a second service (or method) that would send all data - ideally, you could bundle that up into a ZIP file or something on the server, before sending it. If that ZIP file is still too large, you might want to check out WCF streaming for handling large files, as Andy already pointed out
The maxMessageSizeLimit is in place for a good reason: to avoid Denial of Service attacks where a WCF service would just get flooded with large messages and thus brought to its knees. If you can, always try to keep that in mind and don't just jack up the maxMessageSize to 2 GB - it might come back to bite you :-)

System.DirectoryServices pegs my processor when multi-threaded - can I lower the burden?

My application takes the currently logged-in user and uses an a DirectoryServices.DirectorySearcher to pull a few additional detail about them (some properties we have stored in a few custom AD fields, as well as their email address). This works great, though I've always though it was a little slow - my single-threaded code could only make about 2-3 requests/second to AD.
The real problem came when I moved this code to a web server. With multiple simultaneous users, the number of requests/second jumps greatly, and the LSASS.EXE process pegs on my server. I've checked the domain controllers, and they're just fine - the bottleneck is clearly on the application side. I suspect that what's slowing my down is the NTLM/Kerberos challenge/response, and the number of simultaneous requests pegs even the multi-core processor.
Our network policy doesn't allow anonymous reads from AD, so that choice is out. Also, I've tried every member of "AuthenticationTypes" (in the example, I'm using .FastBind), but they all seem to have about the same throughput rate with the same load on the processor.
Does anybody have an idea for how I might work around this restriction and lower my demands on the processor?
Here is the code I'm using - pretty straightforward:
Dim sPath As String = "LDAP://" & stringUserDN
Dim entry As New DirectoryEntry(sPath)
entry.AuthenticationType = AuthenticationTypes.FastBind
For Each stringADNumber As String In entry.Properties(_ADPROP_EMPLOYEENUMBER)
'return first item
Return Convert.ToInt32(stringADNumber)
Next
Return String.Empty
I don't have a ton of experience with looking up items in AD. However, one suggestion is that you might want to check in the HttpContext for the request. There is some basic information for the current user that is making the request, such as groups, SID, and token information. I don't beleive there is an email address field by default, but you might be able to use the User.Name property + "#your.domain" to build an email address.
In order for this data to show up, you will need IIS to be requiring authentication for requests. Anonymous users will not have this data populated. The accessor for this data is HttpContext.Current.Request.LogonUserIdentity or, alternatively, within the code behind for your page, you can call this.Request.LogonUserIdentity for short.
Hopefully this helps. Good luck.