I have a problem when executing a query on BigQuery and I end up with the following error:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:568)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:326)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.wepay.kafka.connect.bigquery.exception.BigQueryConnectException: A write thread has failed with an unrecoverable error
Caused by: The job encountered an internal error during execution and was unable to complete successfully.
at com.wepay.kafka.connect.bigquery.write.batch.KCBQThreadPoolExecutor.lambda$maybeThrowEncounteredError$0(KCBQThreadPoolExecutor.java:101)
at java.base/java.util.Optional.ifPresent(Optional.java:183)
at com.wepay.kafka.connect.bigquery.write.batch.KCBQThreadPoolExecutor.maybeThrowEncounteredError(KCBQThreadPoolExecutor.java:100)
at com.wepay.kafka.connect.bigquery.BigQuerySinkTask.put(BigQuerySinkTask.java:236)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:546)
... 10 more
Caused by: com.google.cloud.bigquery.BigQueryException: The job encountered an internal error during execution and was unable to complete successfully.
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:113)
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:623)
at com.google.cloud.bigquery.BigQueryImpl$34.call(BigQueryImpl.java:1222)
at com.google.cloud.bigquery.BigQueryImpl$34.call(BigQueryImpl.java:1217)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:1216)
at com.google.cloud.bigquery.BigQueryImpl.getQueryResults(BigQueryImpl.java:1200)
at com.google.cloud.bigquery.Job$1.call(Job.java:332)
at com.google.cloud.bigquery.Job$1.call(Job.java:329)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.poll(RetryHelper.java:64)
at com.google.cloud.bigquery.Job.waitForQueryResults(Job.java:328)
at com.google.cloud.bigquery.Job.getQueryResults(Job.java:291)
at com.google.cloud.bigquery.BigQueryImpl.query(BigQueryImpl.java:1187)
at com.wepay.kafka.connect.bigquery.MergeQueries.mergeFlush(MergeQueries.java:158)
at com.wepay.kafka.connect.bigquery.MergeQueries.lambda$mergeFlush$1(MergeQueries.java:119)
... 3 more
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
GET https://www.googleapis.com/bigquery/v2/projects/car-project-prd/queries/3df89651-3567-4b28-88a8-0655a174574c?location=EU&maxResults=0&prettyPrint=false
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "The job encountered an internal error during execution and was unable to complete successfully.",
"reason" : "jobInternalError"
} ],
"message" : "The job encountered an internal error during execution and was unable to complete successfully.",
"status" : "INVALID_ARGUMENT"
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:149)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:112)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:39)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:443)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1108)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:541)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:474)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:591)
at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:621)
... 20 more
The context is as follows:
I use the Kafka BigQuerySinkConnector to update partitioned tables in a dataset. The connector works perfectly for 1, 2 or 3 days and then fails with the trace above.
Can you tell me if you have an idea of the reason for this error or do you have any leads that I could follow to discover the problem and solve it.
I tried to get information to google help but it's just the following message.
Support Google for Error
My application creates thousands of "load jobs" daily to load data from Google Cloud Storage URIs to BigQuery and only a few cases causing the error:
"Finished with errors. Detail: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 7916072"
The application is written on Python and uses libraries:
google-cloud-storage==1.42.0
google-cloud-bigquery==2.24.1
google-api-python-client==2.37.0
Load job is done by calling
load_job = self._client.load_table_from_uri(
source_uris=source_uri,
destination=destination,
job_config=job_config,
)
this method has a default param:
retry: retries.Retry = DEFAULT_RETRY,
so the job should automatically retry on such errors.
Id of specific job that finished with error:
"load_job_id": "6005ab89-9edf-4767-aaf1-6383af5e04b6"
"load_job_location": "US"
after getting the error the application recreates the job, but it doesn't help.
Subsequent failed job ids:
5f43a466-14aa-48cc-a103-0cfb4e0188a2
43dc3943-4caa-4352-aa40-190a2f97d48d
43084fcd-9642-4516-8718-29b844e226b1
f25ba358-7b9d-455b-b5e5-9a498ab204f7
...
As mentioned in the error message, Wait according to the back-off requirements described in the BigQuery Service Level Agreement, then try the operation again.
If the error continues to occur, if you have a support plan please create a new GCP support case. Otherwise, you can open a new issue on the issue tracker describing your issue. You can also try to reduce the frequency of this error by using Reservations.
For more information about the error messages you can refer to this document.
I am getting this error again and again.
"operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
For the above error I have already found something from this https://bugzilla.redhat.com/show_bug.cgi?id=1343027 i.e
Rabbit can join the rabbitmq cluster if the controller-0 was rebooted,came up,started all the resources and only when everything works controller-1 goes for the reboot. In other words everything should work when rebooting one of the controllers. If,for some reason, controller-1 reboots while controller-0 not fully recovered after its reboot - things go wrong.
But I am not sure why is the error log file also showing me the below error:
=ERROR REPORT==== 29-Dec-2019::17:44:26 === Mnesia('messaging#rabbit-2'): ** ERROR ** (ignoring core) ** FATAL ** mnesia_monitor crashed: {badarg, [{ets, lookup, [mnesia_decision, 'messaging#rabbit-3'], []}, {mnesia_recover, has_mnesia_down, 1, [{file, "mnesia_recover.erl"}, {line, 299}]}, {mnesia_monitor, check_mnesia_down, 2, [{file, "mnesia_monitor.erl"}, {line, 862}]}, {mnesia_monitor, handle_info, 2, [{file, "mnesia_monitor.erl"}, {line, 579}]}, {gen_server, try_dispatch, 4, [{file, "gen_server.erl"}, {line, 615}]}, {gen_server, handle_msg, 5, [{file, "gen_server.erl"}, {line, 681}]}, {proc_lib, init_p_do_apply, 3, [{file, "proc_lib.erl"}, {line, 240}]}]} state: {state, <0.745.0>, [], [], true, [], undefined, [], []}
The error message says one system process of the Mnesia DB, mnesia_monitor is crashing when it tries to look up a value from an ETS table (mnesia_decision) owned by an other system process of the DB, mnesia_recover. This can only happen if the ETS table no longer exists, that is if the mnesia_recover has stopped.
This error message doesn't say why mnesia_recover has stopped. If it has crashed, there should be an other error message about that event in the log. But it is also possible that the whole Mnesia application has been stopping at that time, because the supervisor would stop mnesia_recover before mnesia_monitor. If that's the case, this error is just caused by bad timing: mnesia_monitor sees the messaging#rabbit-3 node coming up at a point when Mnesia on its node is already shutting down.
I've built a fairly simple application linking Flask, Celery, and RabbitMQ using docker-compose by linking together a few solutions I saw online. I'm having some issues trying to update task states to reflect if a failure occurred. To keep error visibility at it's highest, I've had my custom class only raise expected errors, else the errors are handled at the celery app level as follows (in celery_app.py):
#celery_app.task(name='celery_worker.summary')
def async_summary(data):
"""Background summary processing"""
try:
logger.info('Summarizing text')
return BdsSummary(data, nlp=en_nlp).create_summary()
except Exception as e:
current_task.update_state(state='FAILURE', meta={'error_message': str(traceback.format_exc())})
logger.exception('Text Summary worker raised: %r'%e)
I've been doing some negative testing against my application, and when I pass it data that I know will throw an error (non-text data, for example), when i run r = requests.get('http://my.app.addr:8888/task/my-task-id') I get {'status': 'SUCCESS', 'result': None}. I'm vexed as to why this is happening. Based on my admittedly limited understanding of Celery's behavior, it should update the status to show a traceback and ExceptionClass, why would it not do this?
I am relatively new to Celery, so my understanding of the Canvas that they reference in the documentation is extremely basic. I'm just trying to provide some basic task failure information to the response/task. For context, when I give it proper input, I get back {'status': 'SUCCESS', 'result': {'summary': 'My Summary text here', 'num_sentences': 3, ...}}.
Any insight here would be much appreciated
Sql server 2005 service pack 2 version: 9.00.3042.00
All maintenance plans fail with the same error.
The details of the error are:-
Execute Maintenance Plan
Execute maintenance plan. test7 (Error)
Messages
Execution failed. See the maintenance plan and SQL Server Agent job history logs for details.
The advanced information section shows the following;
Job 'test7.Subplan_1' failed. (SqlManagerUI)
Program Location:
at Microsoft.SqlServer.Management.SqlManagerUI.MaintenancePlanMenu_Run.PerformActions()
At this point the following appear in the windows event log:
Event Type: Error
Event Source: SQLISPackage
Event Category: None
Event ID: 12291
Date: 28/05/2009
Time: 16:09:08
User: 'DOMAINNAME\username'
Computer: SQLSERVER4
Description:
Package "test7" failed.
and also this:
Event Type: Warning
Event Source: SQLSERVERAGENT
Event Category: Job Engine
Event ID: 208
Date: 28/05/2009
Time: 16:09:10
User: N/A
Computer: SQLSERVER4
Description:
SQL Server Scheduled Job 'test7.Subplan_1' (0x96AE7493BFF39F4FBBAE034AB6DA1C1F) - Status: Failed - Invoked on: 2009-05-28 16:09:02 - Message: The job failed. The Job was invoked by User 'DOMAINNAME\username'. The last step to run was step 1 (Subplan_1).
There are no entries in the SQl Agent log at all.
Probably no points for this, but you're likely to get more help on this over at ServerFault.com now that they are open.