MobileFirst 7.1 - Analytics dumping huge heapdump.phd and core.dmp files on server - ibm-mobilefirst

We are using analytics MobileFirst 7.1.0.00.20170505-1403 and we have noticed that analytics server is putting out huge files in GB's for heapdum.phd and core.dmp.
This is taking away all our San storage. How to turn off this dumping of huge files?

Heap dump is most likely because Analytics server has exhausted all the heap space allocated to it. This could indicate an under sized system without enough resources or a system that is heavily loaded most of the time.
Few quick things that you should keep in mind:
a) Do not install Analytics server and MFP server on the same runtime
b) Configure TTL values for your Analytics entries, so that they are not persisted for ever. Purge the data after an interval
c) In most cases, you will not need all the data to be pumped into your Analytics server. You can control the flow of data into Analytics.
Eg: in production use, limit the amount of event data ( from client) going to Analytics, limit server logs being forwarded to Analytics .
d) Configure circuit breakers to prevent Analytics server from attempting to load too large a data block. This is an Elasticsearch setting:
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/index-modules-fielddata.html
To configure circuit breakers and other ES properties, create a elasticsearch yml configuration file. For example, "elasticsearch.yml"
And add the path to this file in your environment entries under the JNDI property:
"analytics/settingspath"
For example:
<jndiEntry jndiName="analytics/settingspath" value="/home/system/elasticsearch.yml" />

Related

Cloud and local application sync ideas

I've a situation where my central MySQL db and file system (S3) runs on a EC2.
But one of my application runs locally at my client site on a PI-3 device, which needs to look up data and files from both the DB and file system on cloud. The application generates transactional records in turn and need to upload the DB and FS (may be at day end).
The irony is that sometimes the cloud may not be available due to connectivity issues (being in a remote area).
What could be the best strategies to accommodate this kind of a scenario?
Can AWS Greengrass help in here?
How to keep the Lookup data (DB and FS)in sync with the local devices?
How to update/sync the transactional data generated by the local devices?
And finally, what could be the risks in such a deployment model?
Appreciate some help/suggestions.
How to keep the Lookup data (DB and FS)in sync with the local devices?
You can have a Greengrass Group and includes all of the devices in the that group. Make the devices subscribe to a topic e.g. DB/Cloud/update. Once device received the message on that topic, trigger a on-demand lambda to download the latest information from the Cloud. To make sure the device do not miss any update when offline, you can use persistent session, it will make sure device will receive all the missing message when it is back online.
How to update/sync the transactional data generated by the local devices?
You may try with the Stream Manager. https://docs.aws.amazon.com/greengrass/latest/developerguide/stream-manager.html
Right now, it is allowed you to add a local use lambda to pre-process the data and sync it up with the cloud

Using Kubernetes Persistent Volume for Data Protection

To resolve a few issues we are running into with docker and running multiple instances of some services, we need to be able to share values between running instances of the same docker image. The original solution I found was to create a storage account in Azure (where we are running our kubernetes instance that houses the containers) and a Key Vault in Azure, accessing both via the well defined APIs that microsoft has provided for Data Protection (detailed here).
Our architect instead wants to use Kubernetes Persitsent Volumes, but he has not provided information on how to accomplish this (he just wants to save money on the azure subscription by not having an additional storage account or key storage). I'm very new to kubernetes and have no real idea how to accomplish this, and my searches so far have not come up with much usefulness.
Is there an extension method that should be used for Persistent Volumes? Would this just act like a shared file location and be accessible with the PersistKeysToFileSystem API for Data Protection? Any resources that you could point me to would be greatly appreciated.
A PersistentVolume with Kubernetes in Azure will not give you the same exact functionality as Key Vault in Azure.
PesistentVolume:
Store locally on a mounted volume on a server
Volume can be encrypted
Volume moves with the pod.
If the pod starts on a different server, the volume moves.
Accessing volume from other pods is not that easy.
You can control performance by assigning guaranteed IOPs to the volume (from the cloud provider)
Key Vault:
Store keys in a centralized location managed by Azure
Data is encrypted at rest and in transit.
You rely on a remote API rather than a local file system.
There might be a performance hit by going to an external service
I assume this not to be a major problem in Azure.
Kubernetes pods can access the service from anywhere as long as they have network connectivity to the service.
Less maintenance time, since it's already maintained by Azure.

Just how volatile is a Bluemix Virtual Server's own storage?

The Bluemix documentation leads a reader to believe that the only persistent storage for a virtual server is using Bluemix Block Storage. Also, the documentation leads you to believe that virtual server's own storage will not persist over restarts or failures. However, in practice, this doesn't seem to be the case at least as far as restarts are concerned. We haven't suffered any virtual server outages yet.
So we want a clearer understanding of the rationale for separating the virtual server's own storage from its attached Block Storage.
Use case: I am moving our Git server and a couple of small LAMP-based assets to a Bluemix Virtual Server as we simultaneously develop new mobile apps using Cloud Foundry. In our case, we don't anticipate scaling up the work that the virtual server does any time soon. We just want a reliable new home for an existing website.
Even if you separate application files and databases out into block storage, re-provisioning the virtual server in the event of its loss is not trivial even when the provisioning is automated with Ansible or the like. So, we are not expecting to have to be regularly provisioning the non-persistent storage of a Bluemix Virtual Server.
The Bluemix doc you reference is a bit misleading and is being corrected. The virtual server's storage on local disk does persist across restart, reboot, suspend/resume, and VM failure. If such was not the case then the OS image would be lost during any such event.
One of the key advantages of storing application data in a block storage volume is that the data will persist beyond the VM's lifecycle. That is, even if the VM is deleted, the block storage volume can be left in tact to persist data. As you mentioned, block storage volumes are often used to back DB servers so that the user data is isolated, which lends itself well to providing a higher class of storage specifically for application data, back up, recovery, etc.
In use cases where VM migration is desired the VMs can be set up to boot from a block storage volume, which enables one to more easily move the VM to a different hypervisor and simply point to the same block storage boot volume.
Based on your use case description you should be fine using VM local storage.

GCP - CDN Server

I'm trying to architect a system on GCP for scalable web/app servers. My initial intention was to have one disk per web server group hosting the OS, and another hosting the source code + imagery etc. My idea was to mount the OS disk on multiple VM instances so to have exact clones of the servers, with one place to store PHP session files (so moving in between different servers would be transparent and not cause problems).
The second idea was to mount a 2nd disk, containing the source code and media files, which would then be shared with 2 web servers, one configured as a CDN server and one with the main website and backend. The backend would modify/add/delete media files, and the CDN server would supply them to the browser when requested.
My problem arises when reading that the Persistent Disk Storage is only mountable on a single VM instance with read/write access, and if it's needed on multiple instances it can be mounted only in write access. I need to have one of the instances with read/write access with the others (possibly many) with read only access.
Would you be able to suggest ways or methods on how to implement such a system on the GCP, or if it's not possible at all?
Unfortunately, it's not possible.
But, you can create a Single-Node File Server and mount it as a read and write disk on other VMs.
GCP has documentation on how to create a single-Node File Server
An alternative to using persistent (which as you said, only alows a single RW mount or many read-only) is to use Cloud Storage - which can be mounted through FUSE.

Web Server being used as File Storage - How to improvise?

I am making a DR plan for a web application which is hosted on a production web server. Now that web server also acts as a file storage for storing the feed uploads files (used by the web application as input) and report files( output of web application processing). Now if the web server goes down , the files data is also lost, so need to design a solution and give recomendations which eliminates this single point of failiure.
I have thought of some recommendations as follows-
1) Use a seperate file server however it requires a new resources
2) Attach a data volume mounted on the web server which is mapped to some network filer ( network storage) which can be used to store the feeds and reports. In case the web server goes down , the network filer can be mounted and attached to the contingency web server.
3) There is one more web server which is load balanced however that is not currently being used as file storage , and if we can implement a feature which takes the back up of the file data regularly to that load balanced second web server , we can start using that incase the first web server goes down. The back up can be done through a back up script, or seperate windows service , or some scheduling job for scheduling the backup job every night.
Please help me to review above or suggest new recommendations to help eliminate this single point of failiure problem on the web server. It would be highly appreciated?
Regards
Kapil
I've successfully used Amazon's S3 to store the "output" data of web and non-web applications. Using a service like that is beneficial from the single-point-of-failure perspective because then any other instance of that web application, or a different type of client, on the same server or in a completely different datacenter still has access to the same output files. Another similar option is Rackspace's CloudFiles.
Both of these services are very redundant, and you could use them as the back, and keep the primary storage on your server, or use them as the primary and keep a backup on your other web server. There are lots of options! Hops this info helps.