I am building a Java based online application using Java and Ignite. Ignite is used an in memory store/cache and also for data persistence for the application. My question is if I should deploy Ignite in a separate server or should it reside int the same server as the application?
You can do it both ways, it's up to you.
My Requirements:
1. User's will run the sql queries through Apache nifi to Amazon S3.
Is this possible to achieve Nifi integration with Amazon Athena?
You should be able to easily integrate Apache NiFi and Amazon Athena. The NiFi capabilities to leverage/plug-in JDBC drivers and reuse that context in many areas helps here greatly. See here for info on the JDBC drivers with Athena https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html and here for using some of NiFi's DBCP facilities https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-dbcp-service-nar/1.5.0/org.apache.nifi.dbcp.DBCPConnectionPool/index.html
You should be able to by using a combination of an ExecuteStreamCommand and the awscli. The cli has the capabilities to issue Athena queries
For a large online application, use k8s to run it. The scale maybe daily activity user 500,000.
The application inside k8s need messaging feature - Pub/Sub, there are these options:
Kafka
RabbitMQ
Redis
Kafka
It needs zookeeper and good to run on os depends on disk I/O. So if install it into k8s cluster, how? The performance will be worse?
And, if keep Kafka outside of the k8s cluster, connect Kafka from application inside the k8s cluster, how about that performance? They are in the different layer, won't be slow?
RabbitMQ
It's slow than Kafka, but for a daily activity user 500,000 application, is it good enough? If so, maybe it's a good choice.
Redis
It's another option. Maybe the most simple one. But from the internet I got that it will lose message sometimes. If true, that's terrible.
So, the most important thing is, use Kafka(also with zookeeper) on k8s, good or not in this use case?
Yes, running Kafka on Kubernetes is great. Check out this example: https://github.com/Yolean/kubernetes-kafka. It includes ZooKeeper and Kafka as StatefulSets.
PS. Running any of the services in your question on Kubernetes will be pleasant. You can Google the name of the service and "kubernetes" and find example manifests. Many examples here: https://github.com/kubernetes/charts.
For Kafka, you can find some suggestion here. Kubernetes 1.7+ supports local persistent volume, which may be good for Kafka deployment.
You can also take a look to the following project :
https://github.com/EnMasseProject/barnabas
It's about running Kafka on Kubernetes and OpenShift as well. It provides deploying with StatefulSets with persistent volumes or just in memory (for developing or just testing purpose). It provides deploying for Kafka Connect and Prometheus metrics as well.
Another simple configuration of Kafka/Zookeeper on Kubernetes in DigitalOcean with external access:
https://github.com/StanislavKo/k8s_digitalocean_kafka
You can connect to Kafka from outside of AWS/DO/GCE by regular binary protocol. Connection is PLAINTEXT or SASL_PLAINTEXT (user/password).
Kafka cluster is StatefulSet, so you can scale cluster easily.
We are using apache ignite as a IMDG in our micro services environment.
For scalability and load balancing we are considering to use a service registry like eureka or consul which is supported by spring cloud for the deployed micro services.
There is a concept of service grid providing support for node singleton and cluster singleton in apache ignite.
I also see WCF,weblogic and JBoss to having the same sort of features.
I am trying to understand what these service grids are and if i can use them to achieve the same benefits as the eureka service registry provided by netflix and supported by spring cloud.
Can someone guide if i can achieve the same using service grid in apache ignite.
No, you cannot use Apache Ignite Service Grid for the same purposes as Eureka. Eureka is used for load balancing and service discovery over WAN. Using Ignite clusters spanning over multiple AWS zones and remote client machines is not the most efficient way of using it.
More information on Ignite Service Grid can be found here - http://apacheignite.gridgain.org/docs/service-grid
Thanks!
UPD (for the 1st comment):
You cannot (in most cases) span and effectively use Ignite over WAN networks with high latencies and lower throughput characteristics.
As far as local clusters in non-cloud environments - go ahead! This is the best environment for systems of such kind.
Currently i am searching for possible Clustering technologies usable by Apache Axis2 servers. I know that WSO2 Platform uses Apache Tribes for their Servers which are based on Axis2? I want to know if there is an alternative to Apache Tribes for Clustering?
We actually also tried JGroups .. but AFAIK, it did not workout very well.
Are you guys are hitting a problem with Tribes based implementation? Within WSO2 platform, tribes has worked out very well.
We are also looking into the possibility of writing a ZooKeeper based clustering implementation.