How to evenly distribute data into and compute in an apache ignite cluster - ignite

The purpose is to demonstrate data balancing and compute collocation. For this purpose, I want to load say 100000 records into the ignite cluster.
(Using IgniteRepository, from ignite-spring), and do affinityRun with an IgniteRunnable that retrieves data by some search condition and process it.
Ignite is consistently passing the compute job to another node(different from where I submit), however, all 100K records are processed onto that single node.
So either my data is not balanced, or affinityRun is not taking effect.
Thanks in advance for any help!
Ignite config
#Bean
public Ignite igniteInstance() {
IgniteConfiguration config = new IgniteConfiguration();
CacheConfiguration cache = new CacheConfiguration("ruleCache");
cache.setIndexedTypes(String.class, RuleDO.class);
//config.setPeerClassLoadingEnabled(true);
cache.setRebalanceBatchSize(24);
config.setCacheConfiguration(cache);
Ignite ignite = Ignition.start(config);
return ignite;
}
RestController method to trigger processing
#RequestMapping("/processOnNode")
public String processOnNode(#RequestParam("time") String time) throws Exception {
IgniteCache<Integer, String> cache = igniteInstance.cache("ruleCache");
igniteInstance.compute().affinityRun(Collections.singletonList("ruleCache"), 0, new NodeRunnable(time));
return "done";
}
NodeRunner -> run()
#Override
public void run() {
final RuleIgniteRepository igniteRepository = SpringContext.getBean(RuleIgniteRepository.class);
igniteRepository.findByTime(time).stream().forEach(ruleDO -> System.out.println(ruleDO.getId() + " : " + ruleDO));
System.out.println("done on the node");
}
I expect 100k processing to be evenly distributed on my 3 nodes.

You execute the logic for a single partition 0 only
igniteInstance.compute().affinityRun(Collections.singletonList("ruleCache"), 0, new NodeRunnable(time));
The data gets distributed across 1024 partitions (by default) and a primary copy of partition 0 is stored on one of the nodes. This code needs to be executed for several partitions or different affinity keys if you want to see that every node takes part in the calculation.

Thanks all for the help! Specially #dmagda broadcast worked well, however with the repository method it ran on whole lot of data on the cluster defeating purpose of collocation.
I had to chuck out jpa and use cache methods which worked wonders.
this is the IgniteRunnable class :
#Override
public void run() {
final Ignite ignite = SpringContext.getBean(Ignite.class);
IgniteCache<String, RuleDO> cache = ignite.cache("ruleCache");
cache.localEntries(CachePeekMode.ALL)
.forEach(entry -> {
System.out.println("working on local data, key, value" + entry.getKey() + " : " + entry.getValue();
}
}
And instead of affinityRun i am calling broadcast :
igniteInstance.compute().broadcast(new NodeRunnable(time));

Related

Getting ClassNotFoundException while storing a list of custom objects in ignite cache and works well if I don't use a list

I have a non-persistent ignite cache that stores the following elements,
Key --- java.lang.String
Value --- Custom class
public class Transaction {
private int counter;
public Transaction(int counter) {
this.counter = counter;
}
public String toString() {
return String.valueOf(counter);
}
}
The below code works fine even though I am trying to put a custom object into Ignite.
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setClientMode(true);
cfg.setPeerClassLoadingEnabled(true);
cfg.setDeploymentMode(DeploymentMode.CONTINUOUS);
TcpDiscoveryMulticastIpFinder ipFinder = new TcpDiscoveryMulticastIpFinder();
ipFinder.setAddresses(Collections.singletonList("127.0.0.1:47500..47509"));
cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(ipFinder));
Ignite ignite = Ignition.start(cfg);
IgniteCache<String, Transaction> cache = ignite.getOrCreateCache("blocked");
cache.put("c1234_t2345_p3456", new Transaction(100);
The below code fails with ClassNotFoundException when I am trying to set a list of objects instead. This code is exactly same as the above, except for the list of objects. Why is it that list of objects fail where-in custom objects stored directly works fine?
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setClientMode(true);
cfg.setPeerClassLoadingEnabled(true);
cfg.setDeploymentMode(DeploymentMode.CONTINUOUS);
TcpDiscoveryMulticastIpFinder ipFinder = new TcpDiscoveryMulticastIpFinder();
ipFinder.setAddresses(Collections.singletonList("127.0.0.1:47500..47509"));
cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(ipFinder));
Ignite ignite = Ignition.start(cfg);
IgniteCache<String, List<Transaction>> cache = ignite.getOrCreateCache("blocked");
cache.put("c1234_t2345_p3456", Arrays.asList(new Transaction(100)));
Storing custom objects in-memory to ignite worked, but trying to store List objects instead caused ClassNotFoundException in server. I was able to solve this by copying the custom class definition to "/ignite_home/bin/libs", but curious to know why the first case worked and second case didn't. Can anyone please help me understand what's happening in this case? Is there any other way to resolve this issue?
Ok, after many trials, I have an observation that evens out the differences between the above 2 scenarios. In dynamic declaration of caches as I have done earlier in code, somehow I am seeing the error from Ignite to keep the custom classes in bin/libs folder. But if I define the cache in the ignite-config.xml, then ignite is somehow able to digest both use-case evenly and doesn't even throw the ClassNotFoundException. So, I take the summary here that pre-declared caches are much safer as I am seeing some different behaviour while using them as dynamic ones from code. So, I changed the cache declaration to declarative model and now the above use-cases are working fine.

The way to use Redis in Apache Flink

I am using the Flink and want to insert the result value into the Redis.
When I googled the Redis, I found the redis-connector included in the Apache bahir.
So I am able to insert the result value into the redis using the reids-connector in the Apache bahir.
However, I think that I am also able to connect to the Redis using the Jedis.
I had the experiment showing that I was able to connect the redis and found the value inserted into the redis using jedis as shown in the code below.
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer<>(flinkParams.getRequired("topic"), new SimpleStringSchema(), flinkParams.getProperties())).setParallelism(Math.min(hosts * cores, kafkaPartitions));
messageStream.keyBy(new KeySelector<String, String>() {
#Override
public String getKey(String s) throws Exception {
return s;
}
}).flatMap(new RedisConnector());
In the RedisConnector module, without the redis-connector in the Apache bahir, I also successfully connected to the redis and found the message processed after the Flink.
The example code is shown as below
public class ProcessorCommon {
private static final Logger logger = LoggerFactory.getLogger(ProcessorCommon.class);
private Jedis jedis;
private Set<DummyPair> dummy;
public ProcessorCommon(String redisServerHostName) {
this.jedis = new Jedis(redisServerHostName);
}
public void writeToRedis(String key, String value) {
this.jedis.set(key, value);
}
public String getFromRedis(String key) {
return this.jedis.get(key);
}
public void close() {
this.jedis.close();
}
}
So I am wondering that there is a difference between using redis-connector in the bahir and Jedis.
There is currently no real Redis connector maintained by the Flink community. The Redis connector in Bahir is rather outdated. There is a new Redis Streams connector in the works, which can be found at https://github.com/apache/flink-connector-redis-streams

Apache Ignite performance problem on Azure Kubernetes Service

I'm using Apache Ignite on Azure Kubernetes as a distributed cache.
Also, I have a web API on Azure based on .NET6
The Ignite service works stable and very well on AKS.
But at first request, the API tries to connect Ignite and it takes around 3 seconds. After that, Ignite responses take around 100 ms which is great. Here are my Web API performance outputs for the GetProduct function.
At first, I've tried adding the Ignite Service to Singleton but it failed sometimes as 'connection closed'. How can I keep open the Ignite connection always? or does anyone has something better idea?
here is my latest GetProduct code,
[HttpGet("getProduct")]
public IActionResult GetProduct(string barcode)
{
Stopwatch _stopWatch = new Stopwatch();
_stopWatch.Start();
Product product;
CacheManager cacheManager = new CacheManager();
cacheManager.ProductCache.TryGet(barcode, out product);
if(product == null)
{
return NotFound(new ApiResponse<Product>(product));
}
cacheManager.DisposeIgnite();
_logger.LogWarning("Loaded in " + _stopWatch.ElapsedMilliseconds + " ms...");
return Ok(new ApiResponse<Product>(product));
}
Also, I add CacheManager class here;
public CacheManager()
{
ConnectIgnite();
InitializeCaches();
}
public void ConnectIgnite()
{
_ignite = Ignition.StartClient(GetIgniteConfiguration());
}
public IgniteClientConfiguration GetIgniteConfiguration()
{
var appSettingsJson = AppSettingsJson.GetAppSettings();
var igniteEndpoints = appSettingsJson["AppSettings:IgniteEndpoint"];
var igniteUser = appSettingsJson["AppSettings:IgniteUser"];
var ignitePassword = appSettingsJson["AppSettings:IgnitePassword"];
var nodeList = igniteEndpoints.Split(",");
var config = new IgniteClientConfiguration
{
Endpoints = nodeList,
UserName = igniteUser,
Password = ignitePassword,
EnablePartitionAwareness = true,
SocketTimeout = TimeSpan.FromMilliseconds(System.Threading.Timeout.Infinite)
};
return config;
}
Make it a singleton. Ignite node, even in client mode, is supposed to be running for the entire lifetime of your application. All Ignite APIs are thread-safe. If you get a connection error, please provide more details (exception stack trace, how do you create the singleton, etc).
You can also try the Ignite thin client which consumes fewer resources and connects instantly: https://ignite.apache.org/docs/latest/thin-clients/dotnet-thin-client.

Elastic Search Scroll API Asynchronous execution

I'm running an elastic search cluster 5.6 version with 70Gb index size/ day. At the end of the day we are requested to make summarizations of each hour for the last 7 day. We are using the java version of the High Level Rest client and considering the amount of docs each query returns is critical to scroll the results.
In order to take advantage of the CPUs we have, and decrease the reading time, we were thinking about using the search Scroll Asynchronous version but we are missing some example and at least the logic inside it to move forward.
We already check elastic related documentation but it's to vague:
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/5.6/java-rest-high-search-scroll.html#java-rest-high-search-scroll-async
We also ask in the elastic discussion forum as they say but it looks like nobody can't answer that:
https://discuss.elastic.co/t/no-code-for-example-of-using-scrollasync-with-the-java-high-level-rest-client/165126
Any help on this will be very appreciated and for sure I'm not the only one having this req.
Here the example code:
public class App {
public static void main(String[] args) throws IOException, InterruptedException {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(HttpHost.create("http://localhost:9200")));
client.indices().delete(new DeleteIndexRequest("test"), RequestOptions.DEFAULT);
for (int i = 0; i < 100; i++) {
client.index(new IndexRequest("test", "_doc").source("foo", "bar"), RequestOptions.DEFAULT);
}
client.indices().refresh(new RefreshRequest("test"), RequestOptions.DEFAULT);
SearchRequest searchRequest = new SearchRequest("test").scroll(TimeValue.timeValueSeconds(30L));
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
System.out.println("response = " + searchResponse);
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId)
.scroll(TimeValue.timeValueSeconds(30));
//I was missing to wait for the results
final CountDownLatch countDownLatch = new CountDownLatch(1);
client.scrollAsync(scrollRequest, RequestOptions.DEFAULT, new ActionListener<SearchResponse>() {
#Override
public void onResponse(SearchResponse searchResponse) {
System.out.println("response async = " + searchResponse);
}
#Override
public void onFailure(Exception e) {
}
});
//Here we wait
countDownLatch.await();
//Clear the scroll if we finish before the time to keep it alive.
//Otherwise it will be clear when the time is reached.
ClearScrollRequest request = new ClearScrollRequest();
request.addScrollId(scrollId);
client.clearScrollAsync(request, new ActionListener<ClearScrollResponse>() {
#Override
public void onResponse(ClearScrollResponse clearScrollResponse) {
}
#Override
public void onFailure(Exception e) {
}
});
client.close();
}
}
Thanks to David Pilato
elastic discussion
summarizations of each hour for the last 7 day
It sounds like you would like to run some aggregations on the data, and not get the raw docs. probably at the first level a date histogram in order to aggregate on intervals of 1hour. inside that date histogram you need an inner aggs to run your aggregations - either metrics/buckets depending on the summarizations needed.
Starting Elasticsearch v6.1 you can use the Composite Aggregation in order to get all the results buckets using paging. from the docs I linked:
the composite aggregation can be used to paginate all buckets from a multi-level aggregation efficiently. This aggregation provides a way to stream all buckets of a specific aggregation similarly to what scroll does for documents.
Unfortunately this option doesn't exist pre v6.1, so either you'll need to upgrade ES to use it, or find another way, like breaking to multiple queries, that together will cover the 7 days requirement.

Ignite :Remove data from cache on count of 10 put operation in cache

I have a json Object and i am putting it into cache by using a thread which calls for every 5 sec,i want to remove cache data after 10 put oration perform and put that data into third party database.how can i do this,which are the techniques to do this.if have a sample example please share.Thanks
You can achieve a similar behaviour by using a cache store with write-behind along with an expiry policy.
But given the number of records, that you want to keep in the cache, I would do something like this:
private static final int BATCH_SIZE = 10;
private Map<K, V> batch = new HashMap<>();
public void addRecord(K key, V val) {
batch.put(k, v);
if (batch.size() == BATCH_SIZE) {
flush(batch); // Write data into the database.
batch.clear();
}
}