Grid is in invalid state to perform this operation - ignite

How can I solve this problem when using ignite ?
java.lang.IllegalStateException: Grid is in invalid state to perform this operation. It either not started yet or has already being or have stopped [gridName=grid.cfg, state=STOPPED]
at org.apache.ignite.internal.GridKernalGatewayImpl.illegalState(GridKernalGatewayImpl.java:190)
at org.apache.ignite.internal.GridKernalGatewayImpl.readLock(GridKernalGatewayImpl.java:90)
at org.apache.ignite.internal.cluster.ClusterGroupAdapter.guard(ClusterGroupAdapter.java:170)
at org.apache.ignite.internal.cluster.ClusterGroupAdapter.forPredicate(ClusterGroupAdapter.java:367)
at org.apache.ignite.internal.cluster.ClusterGroupAdapter.forServers(ClusterGroupAdapter.java:392)
at org.apache.ignite.internal.IgniteKernal.services(IgniteKernal.java:363)
at org.apache.ignite.IgniteSpringBean.services(IgniteSpringBean.java:156)

This exception means that you're trying to use Ignite after it was already stopped. You should check the logs for any exceptions and also your code - there can be a mistake or some race condition.

Related

Load from GCS to GBQ causes an internal BigQuery error

My application creates thousands of "load jobs" daily to load data from Google Cloud Storage URIs to BigQuery and only a few cases causing the error:
"Finished with errors. Detail: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 7916072"
The application is written on Python and uses libraries:
google-cloud-storage==1.42.0
google-cloud-bigquery==2.24.1
google-api-python-client==2.37.0
Load job is done by calling
load_job = self._client.load_table_from_uri(
source_uris=source_uri,
destination=destination,
job_config=job_config,
)
this method has a default param:
retry: retries.Retry = DEFAULT_RETRY,
so the job should automatically retry on such errors.
Id of specific job that finished with error:
"load_job_id": "6005ab89-9edf-4767-aaf1-6383af5e04b6"
"load_job_location": "US"
after getting the error the application recreates the job, but it doesn't help.
Subsequent failed job ids:
5f43a466-14aa-48cc-a103-0cfb4e0188a2
43dc3943-4caa-4352-aa40-190a2f97d48d
43084fcd-9642-4516-8718-29b844e226b1
f25ba358-7b9d-455b-b5e5-9a498ab204f7
...
As mentioned in the error message, Wait according to the back-off requirements described in the BigQuery Service Level Agreement, then try the operation again.
If the error continues to occur, if you have a support plan please create a new GCP support case. Otherwise, you can open a new issue on the issue tracker describing your issue. You can also try to reduce the frequency of this error by using Reservations.
For more information about the error messages you can refer to this document.

How to disable all operations on ignite cache when topology is not valid

I have 2 server nodes and one client node. I am using TopologyValidator to validate the topology.
If any server node left the cluster I want disable all operations. TopologyValidator disables only update operation not get operation. Can you help me to do this?
Currently TopologyValidator disables update operations only.
You can use IgniteCache#close() operations to disable all operations on specific caches.
See: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#close--
If you do the following:
IgniteCache cache = ignite.getOrCreateCache(config);
cache.put(1L , new Person(1L, "A", "B"));
cache.close();
System.out.println(cache.get(1L)); //exception here.
you will get the following exception on the get call:
[INFO ][exchange-worker-#43%node1%][GridCacheProcessor] Finish proxy initialization, cacheName=test1, localNodeId=...
Exception in thread "main" java.lang.IllegalStateException: Cache has been closed: test1
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.checkProxyIsValid(GatewayProtectedCacheProxy.java:1548)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1580)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:634)
In addition to Alex's answer, you might implement a custom analog of the TopologyValidator. All you need is to listen for the EVT_NODE_LEFT and EVT_NODE_JOINED events to trigger the custom logic, like stopping a cache or switching some application access validator.

Apache Flink - exception handling in "keyBy"

It may happen that data that enters Flink job triggers exception either due to bug in code or lack of validation.
My goal is to provide consistent way of exception handling that our team could use within Flink jobs that won't cause any downtime in production.
Restart strategies do not seem to be applicable here as:
simple restart won't fix issue and we fall into restart loop
we cannot simply skip event
they can be good for OOME or some transient issues
we cannot add custom one
try/catch block in "keyBy" function does not fully help as:
there's no way to skip event in "keyBy" after exception is handled
Sample code:
env.addSource(kafkaConsumer)
.keyBy(keySelector) // must return one result for one entry
.flatMap(mapFunction) // we can skip some entries here in case of errors
.addSink(new PrintSinkFunction<>());
env.execute("Flink Application");
I'd like to have ability to skip processing of event that caused issue in "keyBy" and similar methods that are supposed to return exactly one result.
Beside the suggestion of #phanhuy152 (which seems totally legit to me) why not filter before keyBy?
env.addSource(kafkaConsumer)
.filter(invalidKeys)
.keyBy(keySelector) // must return one result for one entry
.flatMap(mapFunction) // we can skip some entries here in case of errors
.addSink(new PrintSinkFunction<>());
env.execute("Flink Application");
Can you reserve a special value like "NULL" for the keyBy to return in such case? Then your flatMap function can skip when encounter such value?

Talend (7.0.1) - Cannot modify mapred.job.name at runtime

I am having some trouble running a simple tHiveCreateTable job in Talend OS for Big Data (Print of the job where I am getting this error).
The Hive connection is fine and the job worked until Ranger was activated in the cluster.
After ranger, I started getting the following log:
[statistics] connecting to socket on port 3345
[statistics] connected
Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
[statistics] disconnected
This error occurs either using Tez or MapReduce for the job, throwing an exception in the following line of the automatically generated code:
// For MapReduce Mode
stmt_tHiveCreateTable_1.execute("set mapred.job.name=" + queryIdentifier);
Do you know any solution or workarround for this?
Thanks in advance
It is possible to disable changing mapreduce.job.name and hive.query.name at runtime by Talend7 jobs.
Edit the file
{talend_install_dir}/plugins/org.talend.designer.components.localprovider_7.1.1.20181026_1147/components/templates/Hive/SetQueryName.javajet
and comment out lines 6 and 11 like that:
// stmt_<%=cid %>.execute("set mapred.job.name=" + queryIdentifier_<%=cid %>);
// stmt_<%=cid %>.execute("set hive.query.name=" + queryIdentifier_<%=cid %>);
It solved this issue for me.

Getting exception while reading data from blob in azure

While I am trying to read the list of blob data on azure, I am getting the following error:
Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation.
How to resolve this?
Please see the following link. Your code likely has a endless loop. https://msdn.microsoft.com/en-us/library/ms234762.aspx