MFP 8.0 adapter cache - ibm-mobilefirst

I am using MFP 8.0, and there are requirements that we want implement cache on the adapter level.
Whenever MFP server starts we want to dump all the database in cache till the server restart again.
Now whenever user hit some transaction or adapter procedure which call database so instead of calling database it must read from cache.

Adapters support read-only and transactional access modes to back-end systems.
Adapters are Maven projects that contain server-side code implemented in either Java or JavaScript. Adapters are used perform
any necessary server-side logic, and to transfer and retrieve
information from back-end systems to client applications and cloud
services.
JSONStore is an optional client-side API providing a lightweight, document-oriented storage system. JSONStore enables persistent storage
of JSON documents. Documents in an application are available in
JSONStore even when the device that is running the application is
offline. This persistent, always-available storage can be useful to
give users access to documents when, for example, there is no network
connection available in the device.

From your description, assuming you are talking about some custom DB where you have data stored, then you need to implement the logic of caching the data.
Adapter's have two classes <AdapterName>Application.java and <AdapterName>Resource.java. <>Application.java contains the lifecycle methods - init() and destroy().
You should put your custom code of loading data from your DB into cache in the init() method. And also take care of removing it in the destroy().
Now during transactional access (which hits <>Resource.java), you refer to the cache you have already created.
Your requirement, however may not be ideal for heavily loaded systems. You need to consider that:
a) Your adapter initialization is delayed. Any wrongly written code can also break the adapter initialization. An adapter isn't available to service your request until it has been initialized. In case of a clustered environment, the adapter load in all cluster members will delayed depending on the amount of data your are loading. Any client request intended for this adapter will get a runtime exception until the initialization is complete.
b) Holding the cache in memory means, so much space in the heap is used up. If your DB keeps growing, this adversely affects adapter initialization and also heap usage.
c) You are in charge maintaining the data at the latest level and also cleaning it up after use.
To summarize, while it is possible, it is not recommended. While this may work in case of very small data set, this cannot scale well. The design of adapters is to provide you transactional access to data/backend systems. You should use the adapter the way it was designed to.

Related

Syncronize multiple instances of Spring Cache with a Redis lock

I'm building a Spring Boot application that uses Spring Cache with a Redis backing store and needs to synchronize the updates made to the cache.
The caching is not made on the fly, but by an scheduled process that updates the cache periodically.
The algorithm I came up with is:
periodically the instances will check if the Redis cache is older than some predetermined time
if that's the case, the instance will try to acquire a lock on some Redis key
if the instance successfully locks the key, it will then proceed with the update
if some other instance already locked the key, move on
all instances can still read the cache
Everything is more or less already built, all I need is to implement the locking/releasing mechanism.
Spring Cache is using Lettuce to interact with Redis, what is the best way to get an connection to Redis and manage the locking mechanism?
As you may already be aware, Spring's Cache Abstraction provides simple coordination amongst multiple Threads in a single Spring [Boot] application process using the sync attribute on the #Cacheable annotation (see ref doc).
NOTE: Despite the comment ("... use the sync attribute to instruct the underlying cache provider to lock the cache entry while the value is being computed. As a result, only one thread is busy computing the value, while the others are blocked until the entry is updated in the cache.") in the documentation, the locking mechanics is handled by the core framework itself, and in most cases, not the provider. Anyway...
However, this "coordination" is only per-process and will not work for multiple Spring [Boot] application instances, or (OS) JVM processes. In this case, you need some form of distributed locking across your multiple Spring [Boot] application instances to coordinates access to shared cache entries stored in the single Redis server (cluster) shared by your Spring [Boot] application instances.
I am no Redis expert (I am still learning), but I am familiar with similar NoSQL stores (Apache Geode/VMware GemFire, Hazelcast, etc) and distributed locking mechanisms. I see that distributed locking is possible to achieve with Redis as well. In a quick search, I found "Distributed Locking" in Redis, and specifically, "Building a lock in Redis". This is probably the best way to go.
In addition, if you want to make this distributed locking automatically/transparently available through Spring's Cache Abstraction, then you could possibly create a custom AOP Aspect and weave this Aspect together with the framework provided Caching Aspect (Interceptor), being conscious of ordering, as 1 idea.
Alternatively, you could implement wrapper implementations for the Spring Cache and CacheManager SPI interfaces that implement distributed locking on top of the core Redis Cache and CacheManager provider implementations provided by Spring Boot/Spring Data Redis.
Of course, there are multiple ways to go about this. Just tossing out more ideas, but have a look at the distributed locking information in the book.

How big can the Opensplice DDS persistent datas be?

I wonder if I can put big amount of data in my software or if I'm obliged to use an external solution.
How much data can I put using the persistence of OpenSplice DDS or RTI's DDS ?
This depends on your definition of 'putting persistent data'.
In OpenSplice-DDS there are multiple ways to 'save' non-volatile data by 'persisting' it on some non-volatile media. The first way is to publish data as PERSISTENT (durability-QoS) in combination with having one or more durability-services (which are 'standard' available in the OpenSplice core [LGPL-v3], i.e. not an optional/commercial feature). When starting up the system, the durability-services (typically of the first node that starts) will inject the persisted data into the 'global data-space' and with that its available to each application (which can block for this information to be injected via the wait_for_historical_data API). A typical limitation of the size of persistent data is the size of available memory to 'hold it' once published (or put in DDS_terminology: the resource-limits as specified for the 'durability-service', expressed in max_samples, max_samples_per_instance and max_instances for each persistent topic). Note that you could PERSISTENT data as a subset of TRANSIENT data and if you'd have multiple durability-services configured, these will 'align' each other on startup (and/or adding a new node that also has a durability-service configured) resulting in this PERSISTENT data to be instantly available when applications start and/or join an already running system.
The second way is to utilize an add-on that transparently 'replicates' (2-way) data between DDS and a DBMS (ODBC 3.0 compliant) system. OpenSplice DDS has a pluggable service for this called 'DBMSConnect' which can be configured to forward data in both directions, either event-based or state-based (down-sampled) as well as potentially filter on contents.
The third way (for OpenSplice) is to use a generic gateway product called 'OpenSplice Gateway' that makes use of Apache Camel and thus all of the 'connectors' available for that infrastructure. Here you can define 'routes' and endpoints that allow your DDS-data to be routed to/from over 80 non-DDS technologies including relational-database systems that would then allow to 'persist' your DDS-information.
Hope this helps somewhat,
-Hans

Sharing static data between multiple processes

I have a WCF service (instantiated within a Console application on NetTCP), this service has static data (large volume) which gets instantiated on the load.
I have multiple instances of this Console application running at once, and all of them are doing the same static data initialization , is there a way that I can have a single data source and share the data among processes so that each process does not have to consume large amount of memory?
You can use memory mapped files; but each process must have its own memory due to how Windows protects applications.
From http://msdn.microsoft.com/en-us/library/dd997372.aspx:
Non-persisted files are memory-mapped files that are not associated with a file on a disk. When the last process has finished working with the file, the data is lost and the file is reclaimed by garbage collection. These files are suitable for creating shared memory for inter-process communications (IPC).
With any sort of "shared" data, you'll have the additional task of synchronizing access.
The quick solution would be to write another dedicated service which you run first. It would load the data once and makes it available to other service instances as needed.
The more robust solution is to store the data in a database or caching layer that all the services connect to. The caching layer is a nice choice because your service can lazy load it if its not in the cache (keeping more of your current design) and it can be fast (in memory). Some cache options include:
Windows AppFabric
Memcached
NCache

Guidelines for using lucene.net in a web service app?

Just started reading up on Lucene.net and I would like some of my REST based web services to use the powerful searching facilities of Lucene.net
However I came across a link which said that I should create a windows service (with WCF) to do all the lucene searches/indexes etc as IIS recycles the application pool which will cause all sorts of locking issues.
My question is, is this correct? If so, is there another way of resolving this problem without creating a windows service (with WCF)? Also since I have REST based services, would I make a call from these services to the Windows WCF service which would make things slower?
Indexing
During your reading you would have picked up that indexing is done using the IndexWriter class. Lucene will only allow 1 IndexWriter instance open at a time. When using the default locking it creates a lock file in the index directory and prevents any other IndexWriter instances from being created. For this reason it may be better to implement indexing in a process that you have more control over.
If your indexing process is terminated with extreme prejudice and your IndexWriter class does not get closed, the lock on your index folder is maintained and no other instances will be allowed. Because of this Lucene allows you to lift a lock from an Indexed folder (using IndexWriter.unlock)- a dangerous method because if there are two IndexWriters open on the same index it will corrupt the index. If you have a windows service that is performing the indexing, and it's the only process in your solution that does the indexing (and any updates), you can confidently unlock the indexing folder on startup of the service. In a web service based environment where you are performing indexing from a web method - controlling and recovering from locking issues becomes problematic.
Searching
The IndexSearcher class is used for the searches. This in readonly mode can be done from your service based code. I don't think it's necessary to create a separate set of WCF methods for this purpose.
Optimization
The index may required to be optimized for performance periodically depending on the volumes. Once again having the indexing in a separate process you can schedule the optimization nightly, weekly or what ever is required. Optimization is done by a call to one method.
Indexing new data
How and when to get the indexing process to index new data.... I don't know what data you're indexing so it's hard to tell. In my scenario I have WCF methods that are responsible for input data - high volume. I require the data that has been received to be available for searching as soon as possible. So,
my Model layer has a notification layer that when new records of the required type have been successfully committed, a simple notification message is inserted into a local queue in MSMQ.
The reason for MSMQ is that the queue is persisted and transactional and that any messages in there are available even after a crash of system reboot - allowing me to never (cough!) lose any messages.
The indexing service takes the notification, build the Lucene Document and indexes the data.
The indexing service can also be triggered to do a full re-index by deleting the existing index an crawling the Db.
EDIT:
Example architecture:
WCF Service Methods taking on data commiting it to the Model layer. The Model layer notifies a listening client that an CRUD operation occurred successfully on items. The listening client posts the notification in a queue.
Windows Service handles Indexing of data, watching the queue for indexing requests.
ASP.Net app provides user interface with search features.
You can simply disable application pool recycling and host your application/service in IIS.
To disable recycling on config changes, use the disallowRotationOnConfigChange parameter.
You can also split your application in two parts: Index updates and searches.
Handle index updates from a windows service, and have your IIS portion handles searches (readonly). You would do this by having a mechanism that detects index updates, and refresh the IndexSearchers. This way, if the performance penalty of using services is a concern for you, it wont impact search time which is the important aspect for the users. With this configuration you can even have a master index update node, and distribute searches across different web servers in a farm. The only downside is you dont have the near real time searching functionality thats built in the IndexWriter class.
http://wiki.apache.org/lucene-java/NearRealtimeSearch
That being said, I've never had performance issues with setups that have the Lucene functions exposed over a WCF service, especially if your running either on the same machine with NetNamedPipe or on a local LAN with NetTcp.

Why does Quartz Scheduler(JobSToreCMT) require the use of two datasources?

I found this annswer:
1. Long answer to Quartz requiring to data sources, however, if you want an even deeper answer, I believe I’ll need to dig into the source code or do more research:
a. JobStoreCMT relies upon transactions being managed by the application which is using Quartz. A JTA transaction must be in progress before attempt to schedule (or unschedule) jobs/triggers. This allows the "work" of scheduling to be part of the applications "larger" transaction. JobStoreCMT actually requires the use of two datasources - one that has it's connection's transactions managed by the application server (via JTA) and one datasource that has connections that do not participate in global (JTA) transactions. JobStoreCMT is appropriate when applications are using JTA transactions (such as via EJB Session Beans) to perform their work. (Ref; http://quartz-scheduler.org/documentation/quartz-1.x/configuration/ConfigJobStoreCMT)
However, there is a believed conflict with a non transactional driver in our particular application. Does anyone know if Quartz (JobsStoreCMT) can just work with just a transactional data source?
Does anyone know if Quartz (JobsStoreCMT) can just work with just a transactional data source?
No you must have a datasource of each type. Invocations on the API by the client application use the connections that are XA-capable, so that the work join's the application's transaction. Work done by the scheduler's internal threads use the non-XA connections.