Sharing static data between multiple processes

Sharing static data between multiple processes - wcf

I have a WCF service (instantiated within a Console application on NetTCP), this service has static data (large volume) which gets instantiated on the load.
I have multiple instances of this Console application running at once, and all of them are doing the same static data initialization , is there a way that I can have a single data source and share the data among processes so that each process does not have to consume large amount of memory?

You can use memory mapped files; but each process must have its own memory due to how Windows protects applications.
From http://msdn.microsoft.com/en-us/library/dd997372.aspx:
Non-persisted files are memory-mapped files that are not associated with a file on a disk. When the last process has finished working with the file, the data is lost and the file is reclaimed by garbage collection. These files are suitable for creating shared memory for inter-process communications (IPC).
With any sort of "shared" data, you'll have the additional task of synchronizing access.

The quick solution would be to write another dedicated service which you run first. It would load the data once and makes it available to other service instances as needed.
The more robust solution is to store the data in a database or caching layer that all the services connect to. The caching layer is a nice choice because your service can lazy load it if its not in the cache (keeping more of your current design) and it can be fast (in memory). Some cache options include:
Windows AppFabric
Memcached
NCache

Related

What are some use cases for object storage?

What are some use cases for object storage, as opposed to file systems or block storage (database) systems?
From what I understand, object storage is mostly used for persistent storage for applications running on cloud systems. It seems to have a lot of overlap with file systems, except that the details of how the objects are stored is abstracted away so that apps can access them with simple web queries.
However, I'd love if someone could give examples of applications where this is actually used instead of or alongside the other two storage systems.

Some example use cases for object storage:
Off-site backups
Storing and serving user content (e.g. profile pictures)
Storing artifacts (e.g. JAR files, startup scripts) to be deployed to VMs
Distributing static content (e.g. video content for your users)
Caching intermediate data (e.g. individual frames from a render farm before assembly into output video)
Accepting input or providing output to a web service (as accepting data by POST can be difficult/inefficient for large input files).
archiving data for regulatory purposes
All these cases might be accompanied by a database to store metadata (ie to find the objects). Actually storing the data in the database would, however, exceed size limits or significantly harm database performance.
These use-cases can be achieved with a file-system, so long as your total usage can be handed by a single machine. If you have more traffic than that you will need replicated storage, load balancing etc, at which point you are effectively implementing a object storage system yourself.

MFP 8.0 adapter cache

I am using MFP 8.0, and there are requirements that we want implement cache on the adapter level.
Whenever MFP server starts we want to dump all the database in cache till the server restart again.
Now whenever user hit some transaction or adapter procedure which call database so instead of calling database it must read from cache.

Adapters support read-only and transactional access modes to back-end systems.
Adapters are Maven projects that contain server-side code implemented in either Java or JavaScript. Adapters are used perform
any necessary server-side logic, and to transfer and retrieve
information from back-end systems to client applications and cloud
services.
JSONStore is an optional client-side API providing a lightweight, document-oriented storage system. JSONStore enables persistent storage
of JSON documents. Documents in an application are available in
JSONStore even when the device that is running the application is
offline. This persistent, always-available storage can be useful to
give users access to documents when, for example, there is no network
connection available in the device.

From your description, assuming you are talking about some custom DB where you have data stored, then you need to implement the logic of caching the data.
Adapter's have two classes <AdapterName>Application.java and <AdapterName>Resource.java. <>Application.java contains the lifecycle methods - init() and destroy().
You should put your custom code of loading data from your DB into cache in the init() method. And also take care of removing it in the destroy().
Now during transactional access (which hits <>Resource.java), you refer to the cache you have already created.
Your requirement, however may not be ideal for heavily loaded systems. You need to consider that:
a) Your adapter initialization is delayed. Any wrongly written code can also break the adapter initialization. An adapter isn't available to service your request until it has been initialized. In case of a clustered environment, the adapter load in all cluster members will delayed depending on the amount of data your are loading. Any client request intended for this adapter will get a runtime exception until the initialization is complete.
b) Holding the cache in memory means, so much space in the heap is used up. If your DB keeps growing, this adversely affects adapter initialization and also heap usage.
c) You are in charge maintaining the data at the latest level and also cleaning it up after use.
To summarize, while it is possible, it is not recommended. While this may work in case of very small data set, this cannot scale well. The design of adapters is to provide you transactional access to data/backend systems. You should use the adapter the way it was designed to.

Guidelines for using lucene.net in a web service app?

Just started reading up on Lucene.net and I would like some of my REST based web services to use the powerful searching facilities of Lucene.net
However I came across a link which said that I should create a windows service (with WCF) to do all the lucene searches/indexes etc as IIS recycles the application pool which will cause all sorts of locking issues.
My question is, is this correct? If so, is there another way of resolving this problem without creating a windows service (with WCF)? Also since I have REST based services, would I make a call from these services to the Windows WCF service which would make things slower?

Indexing
During your reading you would have picked up that indexing is done using the IndexWriter class. Lucene will only allow 1 IndexWriter instance open at a time. When using the default locking it creates a lock file in the index directory and prevents any other IndexWriter instances from being created. For this reason it may be better to implement indexing in a process that you have more control over.
If your indexing process is terminated with extreme prejudice and your IndexWriter class does not get closed, the lock on your index folder is maintained and no other instances will be allowed. Because of this Lucene allows you to lift a lock from an Indexed folder (using IndexWriter.unlock)- a dangerous method because if there are two IndexWriters open on the same index it will corrupt the index. If you have a windows service that is performing the indexing, and it's the only process in your solution that does the indexing (and any updates), you can confidently unlock the indexing folder on startup of the service. In a web service based environment where you are performing indexing from a web method - controlling and recovering from locking issues becomes problematic.
Searching
The IndexSearcher class is used for the searches. This in readonly mode can be done from your service based code. I don't think it's necessary to create a separate set of WCF methods for this purpose.
Optimization
The index may required to be optimized for performance periodically depending on the volumes. Once again having the indexing in a separate process you can schedule the optimization nightly, weekly or what ever is required. Optimization is done by a call to one method.
Indexing new data
How and when to get the indexing process to index new data.... I don't know what data you're indexing so it's hard to tell. In my scenario I have WCF methods that are responsible for input data - high volume. I require the data that has been received to be available for searching as soon as possible. So,
my Model layer has a notification layer that when new records of the required type have been successfully committed, a simple notification message is inserted into a local queue in MSMQ.
The reason for MSMQ is that the queue is persisted and transactional and that any messages in there are available even after a crash of system reboot - allowing me to never (cough!) lose any messages.
The indexing service takes the notification, build the Lucene Document and indexes the data.
The indexing service can also be triggered to do a full re-index by deleting the existing index an crawling the Db.
EDIT:
Example architecture:
WCF Service Methods taking on data commiting it to the Model layer. The Model layer notifies a listening client that an CRUD operation occurred successfully on items. The listening client posts the notification in a queue.
Windows Service handles Indexing of data, watching the queue for indexing requests.
ASP.Net app provides user interface with search features.

You can simply disable application pool recycling and host your application/service in IIS.
To disable recycling on config changes, use the disallowRotationOnConfigChange parameter.
You can also split your application in two parts: Index updates and searches.
Handle index updates from a windows service, and have your IIS portion handles searches (readonly). You would do this by having a mechanism that detects index updates, and refresh the IndexSearchers. This way, if the performance penalty of using services is a concern for you, it wont impact search time which is the important aspect for the users. With this configuration you can even have a master index update node, and distribute searches across different web servers in a farm. The only downside is you dont have the near real time searching functionality thats built in the IndexWriter class.
http://wiki.apache.org/lucene-java/NearRealtimeSearch
That being said, I've never had performance issues with setups that have the Lucene functions exposed over a WCF service, especially if your running either on the same machine with NetNamedPipe or on a local LAN with NetTcp.

Redis Pub/Sub Usefulness?

I have a question that is bugging me quite heavily. What is the Redis pub/sub feature actually used for? I can only think of inter-process communication over TCP (either locally or distributed), however not much else.
Can someone please prove me wrong.

It's an easy way to plug into an event stream, generally between processes or machines. For instance, an user creates a published event. One process handles updating the database from the event, another updates user stats, another global stats, another updates the text search database, etc. They're all loosely coupled by subscribing to the channel. You can add new processes for testing updates and monitoring the system. It's a little different from a message queue in that there's no storing messages until they're processed, but Redis has other structures for those sorts of jobs.

a real use case in my experience.
Lets say you have a web application deployed on 4 different servers(nodes,virtual machines) mostly on your virtual private cloud.
The web application maintains an in memory java map for its static data cache which occasionally changes .
Now every time the data changes in your database you would need all your servers to update there own in memory caches,this is the problem.
one way is to maintain all the static data in redis or any other cache on a separate server and the cache updates based on a scheduler.But here to access the static content which occasionally changes you need a scheduler and a separate cache server like redis or memcached etc. and each server points to this external cache.
Using pubsub of redis here:
all servers subscribe to redis channel and if redis publishes the message when ever there is an update,addition,deletion of the data as a message to all of its subscribers.On receiving the message object and its type of update(ADD,REMOVED,UPDATED) each server updates its in memory static data map.

Sync Framework over WCF without sessions

I'm currently looking to use Microsoft Sync Framework (2.1) to sync clients (running SQL Server Express) with a cloud based central data store, using WCF for all communications.
The central data store is a SQL database, with a scalable number of processing nodes connected to it, each with an instance of my WCF service to process sync calls.
There could be a large amount of data transferred from the server to the clients when syncing, so I think batching is necessary to avoid out of memory issues, better handle unreliable connections, etc. My problem is that the N-tier examples I've seen seem to require an instancing mode of PerSession on the WCF service end, and batch files are stored to a location on disk, which isn't an option as there is no guarantee subsequent calls will go to the same processing node, so my WCF services are all set to PerCall instancing.
What is the best way for me to tackle this batching problem? Is there a way to store the batches on a central data store (say my server database) or is there an alternative to batching to reduce the size of the dataset to 'bite sized' transfers that will be more robust?

the batching in Sync Framework is just for the transmission of the changes, not the application of the changes. so if you have a sync session whose changes are batch into 10 batches, a single batch is not applied individually. rather, the entire 10 batches is applied as one. internally, the batches are actually byte arrays that are reconstructed to a dataset. so you can't have part of the batch in one node and the others on other nodes.
not sure if it helps, but the Windows Azure Sync Service sample may offer you some patterns on how to go about storing the batch file and the writing a similar service and handle the batching.
have a look at Walkthrough of Windows Azure Sync Service Sample

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas