1st question: could I deploy a ComputeJob to every node on IgniteCompute?
I know Ignite can deploy ComputeTask which is made up of ComputeJobs. After I checked the internal code, it seems to me that ComputeJob is serialzied, sent together with args to the remote code, and then processed. Please correct me if I am wrong.
In my case, one node will load a locally deployed copy of the ComputeTask and then ComputeTask will spawn ComputeJobs which are sent to other nodes for computation. These ComputeJobs spawned are all same except the args passed to it. In this case, if I could load a locally deployed copy of ComputeJob on the remote node and pass only the args that sent to this node through network, the network communication should decrease. The theoretical bottleneck of my application is network bandwith, which I am trying to optimize.
2nd question: if I could not deploy a ComputeJob, is there any workaround to prevent sending same ComputeJob multiple times?
Thanks a lot for your insight!
When ComputeJob is serialized using Ignite's BinaryMarshaller it will not carry much overhead. You will not be sending the job's class bytes every time. Binary representation also doesn't contain any field names or other info that can be inferred from the class. Almost all of the bytes will be the actual values of your job's fields which usually correspond to the job arguments. You can find more on the Ignite's binary format here.
If you want to take a look at the size and structure of your job's binary form, you can serialize it manually with something like
byte[] array = ignite.configuration().getMarshaller().marshal(job);
Related
Vernemq build : 1.10.4.1+build.76.ref4f0bbab
Erlang Version : 22
As per vernemq documents the hook data is stored in in memory cache and is not actively disposed.
We have around 360k clients distributed over cluster of 8 nodes.
The client id, username and password do not change and are fixed for 320k clients, where as the rest 40k clients keep changing. These 40k clients also subscribe and publish to at most 3 topics. The clients tend to disconnect and connect back to any node from the cluster once in a day, due to which the hook data is being cached on all the nodes and increasing the memory. The memory keeps increasing on a daily basis, and the memory usage curve has not flattened.
Issue: I fear at some point of time we will get in OOM errors and the nodes can go down.
I tried clearing memory using echo commands (1 2 and 3) but only buff cache memory was cleared and the hook data was not.
Is there a way to clear or evict the hook data from the in memory?
Since vernemq is written in erlang, the hook data is stored in Built-in term storage (ETS-Erlang term storage). These provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data. Find more details here https://erlang.org/doc/man/ets.html
Note : Only the owner process can delete the table or all the objects from the table, here the owner process is vernemq broker itself.
To answer my question, below are the following code changes made to the vernemq source code inorder to evict/ delete all the cache objects(it is mandatory to build the vernemq from source).
There is a command ./vmq-admin webhooks cache show --reset which resets (deletes all objects) from the vmq_webhooks_stats ets table (resets the cache hits,entries,missess).
This command is defined in vmq_webhooks_cli.erl, under cache_stats_cmd() function.
Just replace vmq_webhooks_cache:reset_stats() with vmq_webhooks_cache:purge_all()
Build the source and start the broker with the updated changes.
On invoking ./vmq-admin webhooks cache show --reset command both the hook data and the stats will be deleted.
This change helped me in solving the OOM issue, which was eventually faced after sometime.
I am new to multithread programming, I put my media_player and server in two threads, the server is to receive the data which is the operation to media_player from another client program. But the valueof "operation" i get from server isn't updata to my main thread, so the output of operation in media_player is always none, I hope it will change as the server receives data.
global operation shall be added into the function, otherwise it would just make a local copy of it and start from here.
I use Ignite.Net and run ignite in my .net core app process.
My application receives some messages (5000 per second) and I put or remove some keys according to the messages received. The cache mode is replicated, with default Primary_Sync write mode.
Everything is good and I can process up to 20,000 messages/sec.
But when I run another ignite node on another machine, everything changes. Processing speed is reduced up to 1000 messages per second.
perhaps it's due to that some operations do on the network, but I want just put or remove keys on the local instance and replicate them (changed keys) to other nodes. Write mode is Primary_Sync and this means ignite must put or remove key on the local node (because all nodes are the same due to replicated mode and no need to distribute them on other nodes) and then replicate them to other nodes asynchronously.
Where is the problem?
Is the slowdown due to network operations?
Looking at the code (could not run it - requires messing with SQL server), I can provide the following recommendations:
Use DataStreamer. Always use streamer when adding/removing batches of data.
Try using multiple threads to load the data. Ignite APIs are thread-safe.
Maybe try CacheWriteSynchronizationMode.FullAsync
Together this should result in a noticeable speedup, no matter how many nodes.
I have read the standard (and the javadoc) but still have some questions.
My use case is simple:
A batchlet fetches data from an external source and acknowledges the data (meaning that the data is deleted from the external source after acknowledgement).
Before acknowledging the data the batchlet produces relevant output (in-menory-object) that is to be passed to the next chunk oriented step.
Questions:
1) What is the best practice for passing data between a batchlet and a chunk step?
It seems that I can do that by calling jobContext#setTransientUserData
in the batchlet and then in my chunk step I can access that data by calling
jobContext#getTransientUserData.
I understand that both jobContext and stepContext are implemented in threadlocal-manner.
What worries me here is the "Transient"-part.
What will happen if the batchlet succeeds but my chunk-step fails?
Will the "TransientUserData"-data still be available or will it be gone if the job/step is restarted?
For my use case it is important that the batchlet is run just once.
So even if the job or the chunk step is restarted it is important that the output data from the successfully-run-batchlet is preserved - otherwise the batchlet have to be once more. (I have already acknowledged the data and it is gone - so running the batchlet once more would not help me.)
2)Follow up question
In stepContext there is a couple of methods: getPersistentUserData and setPersistentUserData.
What is these method's intended usage?
What does the "Persistent"-part refer to?
Are these methods relevant only for partitioning?
Thank you!
/ Daniel
Transient user data is just transient, and will not be available during job restart. A job restart can happen in a different process or machine, so users cannot count on job transient from previous run being available at restart.
Step persistent user data are those application data that the batch job developers deem necessary to save/persist for purpose of restarting, monitoring or auditing. They will be available at restart, but they are typically scoped to the current step (not across steps).
From reading your brief descriptioin, I got the feeling that your 2 steps are too tightly coupled and you can almost consider them one single unit of work. You want them either both succeed or both fail in order to maintain your application state integrity. I think that could be the root of the problem.
I am using v3.9.0 of Chronicle Map where I have two processes where Process A writes to a ChronicleMap and Process B just initializes with the same persistent file that A uses. After loading, I print Map.size in Process A and Process B but I get different Map size. I am expecting both sizes to be the same. In what cases, can I see this behaviour?
How can I troubleshoot this problem? Is there any kind of flush operation required?
One thing, I tried to do is to dump the file using getAll method but it dumps everything as a json on a single file which is pretty much killing any of the editors I have. I tried to use MapEntryOperations in Process B to see if anything interesting happening but seems like it is mainly invoked when something is written into the map but not when Map is initialized directly from the persistent store.
I was using createOrRecoverPersistedTo instead of createPersistedTo method. Because of this, my other process was not seeing the entire data.
As described in Recovery section in the tutorial elaborates:
.recoverPersistedTo() needs to access the Chronicle Map exclusively. If a concurrent process is accessing the Chronicle Map while another process is attempting to perform recovery, result of operations on the accessing process side, and results of recovery are unspecified. The data could be corrupted further. You must ensure no other process is accessing the Chronicle Map store when calling for .recoverPersistedTo() on this store."
This seems pretty strange. Size in Chronicle Map is not stored in a single memory location. ChronicleMap.size() walks through and sums each segment's size. So size is "weakly consistent" and if one process constantly writes to a Map, size() calls from multiple threads/processes could return slightly different values. But if nobody writes to a map (e. g. in your case just after loading, when process A hasn't yet started writing) all callers should see the same value.
Instead of getAll() and analyzing output manually, you can try to "count" entries by something like
int entries = 0;
for (K k : map.keySet()) {
entries++;
}