BigQuery Load Job [invalid] Too many errors encountered - google-bigquery

I'm trying to insert data into BigQuery using the BigQuery Api C# Sdk.
I created a new Job with Json Newline Delimited data.
When I use :
100 lines for inputs : OK
250 lines for inputs : OK
500 lines for inputs : KO
2500 lines : KO
The error encountered is :
"status": {
"state": "DONE",
"errorResult": {
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 0."
},
"errors": [
{
"reason": "internalError",
"location": "File: 0",
"message": "Unexpected. Please try again."
},
{
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 0."
}
]
}
The file works well when I use the Bq Tools with command :
bq load --source_format=NEWLINE_DELIMITED_JSON dataset.datatable pathToJsonFile
Something seems to be wrong on server side or maybe when I transmit the file but we cannot get more log than "internal server error"
Does anyone have more informations on this ?
Thanks you

"Unexpected. Please try again." could either indicate that the contents of the files you provided had unexpected characters, or it could mean that an unexpected internal server condition occurred. There are several questions which might help shed some light on this:
does this consistently happen no matter how many times you retry?
does this directly depend on the lines in the file, or can you construct a simple upload file which doesn't trigger the error condition?
One option to potentially avoid these problems is to send the load job request with configuration.load.maxBadRecords higher than zero.
Feel free to comment with more info and I can maybe update this answer.

Related

Big JSON record to BigQuery is not showing up

I wanted to try to upload big JSON record object to BigQuery.
I am talking of JSON records of 1.5 MB each, with a complex nested schema up to 7th degree.
For simplicity, I started to load file with a single record on one line.
At first I try to have BigQuery to autodetect my schema, but that resulted in table that is not responsive and I cannot perform query on, albeit it says it had at least a record.
Then, assuming that my schema could be too hard to reverse for the loader, I tried to write the schema myself and I then I tried to load my my file with single record.
At first I got a simple error with just "invalid".
bq load --source_format=NEWLINE_DELIMITED_JSON invq_data.test_table
my_single_json_record_file
Upload complete.
Waiting on bqjob_r5a4ce64904bbba9d_0000015e14aba735_1 ... (3s) Current
status: DONE
BigQuery error in load operation: Error processing job 'invq-
test:bqjob_r5a4ce64904bbba9d_0000015e14aba735_1': JSON table
encountered too many errors, giving up. Rows:
1; errors: 1.
Which after checking for the job error was just giving me the following:
"status": {
"errorResult": {
"location": "file-00000000",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
},
"errors": [
{
"location": "file-00000000",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
}
],
"state": "DONE"
},
The after a couple of more attempts creating new tables, it actually started to succeed on command line, without reporting errors:
bq load --max_bad_records=1 --source_format=NEWLINE_DELIMITED_JSON invq_data.test_table_4 my_single_json_record_file
Upload complete.
Waiting on bqjob_r368f1dff98600a4b_0000015e14b43dd5_1 ... (16s) Current status: DONE
with no error on the status checker...
"statistics": {
"creationTime": "1503585955356",
"endTime": "1503585973623",
"load": {
"badRecords": "0",
"inputFileBytes": "1494390",
"inputFiles": "1",
"outputBytes": "0",
"outputRows": "0"
},
"startTime": "1503585955723"
},
"status": {
"state": "DONE"
},
But no actual records are added to my tables.
I tried to perform the same from WebUI but the result is the same. Green on the completed job, but no actual record added.
Is there something else that I can do for checking where the data is sinking to? Maybe some more log?
I can imagine that maybe I am on the the edge of the 2 MB JSON row size limit but, if so, should this be reported as error?
Thanks in advance for the help!!
EDIT:
It turned out the complexity of my schema was a bit the devil in here.
My json files were valid, but my complex schema had several errors.
It turned out that I had to simplify it such schema anyway, because I got a new batch of data where single json instances where more 30MB and I had to restructure this data in a more relational way, whilst making smaller rows to insert in the database.
Funny enough when the schema was scattered across multiple entities (ergo, simplified) the actually error/inconsistencies of the schema started to actually show up in error returned and it was easier to fix them. (Mostly it was new nested undocumented data which I was not aware anyway... but still my bad).
The lesson here, is when a table schema is too long (I didn't experiment how much precisely is too long) BigQuery just hide itself behind reporting too many errors to show.
But that is a point where you should consider simplify the schema(/structure) of your data.

com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found

When using a Talend bigquery input component (BQ java api) to read from bigquery, I get the following error (for a long running job) -
Exception in component tBigQueryInput_4
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table rand-cap:_f000fcf374688fc5e7da50a4c0c04ba228d993c3.anon0849eba05949a62962f218a0433d6ee82bf13a7b",
"reason" : "notFound"
} ],
"message" : "Not found: Table rand-cap:_f000fcf374688fc5e7da50a4c0c04ba228d993c3.anon0849eba05949a62962f218a0433d6ee82bf13a7b"
}
Is this because of the "temporary" table that bq creates when querying results not being available after 24hrs. Or is it because rate limit was exceeded since I am querying a large table ?
In either case, how can I find more details on this error and what steps should I take to prevent this ?
Thank you !
This seems to be a problem in Talend, there are other users describing your issue: https://www.talendforge.org/forum/viewtopic.php?id=44734
Google Bigquery has a property i.e. Allowlargeresults but its not there in TBigqueryinput.
Hi there - I am currently using Talend open studio v6.1.1 and this issue still exists.

How to get useful BigQuery errors

I have a job that I run with jobs().insert()
Currently I have the job failing with:
2014-11-11 11:19:15,937 - ERROR - bigquery - invalid: Could not convert value to string
Considering I have 500+ columns, I find this error message useless and pretty pathetic. What can I do to receive a proper and better error details from BigQuery?
The structured error return dictionary contains 3 elements, a "reason", a "location", and a "message". From the log line you include, it looks like only the message is logged.
Here's an example error return from a CSV import with data that doesn't match the target table schema:
"errors": [
{
"reason": "invalid",
"location": "File: 0 / Line:2 / Field:1",
"message": "Value cannot be converted to expected type."
},
...
Similar errors are returned from JSON imports with data that doesn't match the target table schema.
I hope this helps!

Frequent 503 errors raised from BigQuery Streaming API

Streaming data into BigQuery keeps failing due to the following error, which occurs more frequently recently:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
{
"code" : 503,
"errors" : [ {
"domain" : "global",
"message" : "Connection error. Please try again.",
"reason" : "backendError"
} ],
"message" : "Connection error. Please try again."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:312)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1049)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
Relevant question references:
Getting high rate of 503 errors with BigQuery Streaming API
BigQuery - BackEnd error when loading from JAVA API
We (the BigQuery team) are looking into your report of increased connection errors. From the internal monitoring, there hasn't been global a spike in connection errors in the last several days. However, that doesn't mean that your tables, specifically, weren't affected.
Connection errors can be tricky to chase down, because they can be caused by errors before they get to the BigQuery servers or after they leave. The more information your can provide, the easier it is for us to diagnose the issue.
The best practice for streaming input is to handle temporary errors like this to retry the request. It can be a little tricky, since when you get a connection error you don't actually know whether the insert succeeded. If you include a unique insertId with your data (see the documentation here), you can safely resend the request (within the deduplication window period, which I think is 15 minutes) without worrying that the same row will get added multiple times.

Viewing enqueued messages with hawtio

I'm trying to use hawtio to view some enqueued topics in ActiveMQ.
But when I click on view messages, I get a blank list as output (even though I know the contents are not blank).
This is the error message I get when I browse around my localhost on /8080/hawtio/, so I'm guessing something regarding this is causing it.
Failed to get a response! { "error_type": "javax.management.InstanceNotFoundException", "error": "javax.management.InstanceNotFoundException : org.fusesource.insight:type=LogQuery", "status": 404, "request": { "operation": "logResultsSince", "mbean": "org.fusesource.insight:type=LogQuery", "arguments": [ 0 ], "type": "exec" }, "stacktrace": "javax.management.InstanceNotFoundException: org.fusesource.insight:type=LogQuery\n\tat com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)\n\tat com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1375)\n\tat com.sun.jmx.mbeanserver.JmxMBeanServer.getMBeanInfo(JmxMBeanServer.java:920)\n\tat org.jolokia.handler.ExecHandler.extractMBeanParameterInfos(ExecHandler.java:167)\n\tat org.jolokia.handler.ExecHandler.extractOperationTypes(ExecHandler.java:133)\n\tat org.jolokia.handler.ExecHandler.doHandleRequest(ExecHandler.java:84)\n\tat org.jolokia.handler.ExecHandler.doHandleRequest(ExecHandler.java:40)\n\tat org.jolokia.handler.JsonRequestHandler.handleRequest(JsonRequestHandler.java:89)\n\tat org.jolokia.backend.MBeanServerExecutorLocal.handleRequest(MBeanServerExecutorLocal.java:109)\n\tat org.jolokia.backend.MBeanServerHandler.dispatchRequest(MBeanServerHandler.java:102)\n\tat org.jolokia.backend.LocalRequestDispatcher.dispatchRequest(LocalRequestDispatcher.java:91)\n\tat org.jolokia.backend.BackendManager.callRequestDispatcher(BackendManager.java:388)\n\tat org.jolokia.backend.BackendManager.handleRequest(BackendManager.java:150)\n\tat org.jolokia.http.HttpRequestHandler.executeRequest(HttpRequestHandler.java:197)\n\tat org.jolokia.http.HttpRequestHandler.handlePostRequest(HttpRequestHandler.java:131)\n\tat org.jolokia.jvmagent.JolokiaHttpHandler.executePostRequest(JolokiaHttpHandler.java:195)\n\tat org.jolokia.jvmagent.JolokiaHttpHandler.handle(JolokiaHttpHandler.java:143)\n\tat com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)\n\tat sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)\n\tat com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)\n\tat sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:677)\n\tat com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)\n\tat sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:649)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:724)\n" }
Incidentally ActiveMQ doesn't support browsing of topics; only queues
You'll need to upgrade to hawt.io 1.2M27, which fixes this issue. 1.2M26 was assuming the log query was always installed, M27 removed it from the default.
Also we don't yet support all activemq message types, there's an open issue for that -> https://github.com/hawtio/hawtio/issues/655
So if your messages are not text messages this could be why you're not seeing the message body.