How to get useful BigQuery errors - google-bigquery

I have a job that I run with jobs().insert()
Currently I have the job failing with:
2014-11-11 11:19:15,937 - ERROR - bigquery - invalid: Could not convert value to string
Considering I have 500+ columns, I find this error message useless and pretty pathetic. What can I do to receive a proper and better error details from BigQuery?

The structured error return dictionary contains 3 elements, a "reason", a "location", and a "message". From the log line you include, it looks like only the message is logged.
Here's an example error return from a CSV import with data that doesn't match the target table schema:
"errors": [
{
"reason": "invalid",
"location": "File: 0 / Line:2 / Field:1",
"message": "Value cannot be converted to expected type."
},
...
Similar errors are returned from JSON imports with data that doesn't match the target table schema.
I hope this helps!

Related

Bigquery Can't Upload RECORD Data Type File

I want to upload my Sales data into Bigquery. The file contains STRUCT (RECORD) data type, therefore, I have to upload my file as ndjson.
However, once I do that, I get the error message: 'Failed to create table: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details."
The ndjson I'm trying to upload look something like this: {"SalesOrder": "SalesOrder", "CreatedBy": "CreatedBy", "CustomerID": "CustomerID", "SalesChannel": "SalesChannel", "SalesChannelType": "SalesChannelType", "CreatedDate": "CreatedDate", "WebDiscount": "WebDiscount", "RevenueFromPlatform": "RevenueFromPlatform", "DeliveryFee": "DeliveryFee", "PriceBeforeVAT": "PriceBeforeVAT", "VAT": "VAT", "DeliveryChannel": "DeliveryChannel", "DeliveryStatus": "DeliveryStatus", "Payment.Status": "Payment.Status", "Payment.Channel": "Payment.Channel", "Payment.Value": "Payment.Value", "Payment.Date": "Payment.Date", "Product.SKU": "Product.SKU", "Product.UnitsSold": "Product.UnitsSold", "Product.UnitPrice": "Product.UnitPrice", "Product.Discount": "Product.Discount", "Product.Price": "Product.Price", "": ""}
How do I solve this issue? Thanks in advance

Azure data factory V2 copy data issue - error code: 2200 An item with the same key has already been added

I am trying to copy data from a csv file into the Azure sql but I am getting an unique error only during the deployment of pipeline. I am using a normal copy data
{
"errorCode": "2200",
"message": "ErrorCode=InvalidParameter,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The
value of the property '' is invalid: 'An item with the same key has
already been
added.'.,Source=,''Type=System.ArgumentException,Message=An item with
the same key has already been added.,Source=mscorlib,'",
"failureType": "UserError",
"target": "Copy data1",
"details": [] }
Kindly help to solve
The above error is due to a duplicate column/header in the csv file. The exception message states An item with the same key has already been added that is the duplicate column might be getting added to a dictionary and the exception has been thrown.

Big JSON record to BigQuery is not showing up

I wanted to try to upload big JSON record object to BigQuery.
I am talking of JSON records of 1.5 MB each, with a complex nested schema up to 7th degree.
For simplicity, I started to load file with a single record on one line.
At first I try to have BigQuery to autodetect my schema, but that resulted in table that is not responsive and I cannot perform query on, albeit it says it had at least a record.
Then, assuming that my schema could be too hard to reverse for the loader, I tried to write the schema myself and I then I tried to load my my file with single record.
At first I got a simple error with just "invalid".
bq load --source_format=NEWLINE_DELIMITED_JSON invq_data.test_table
my_single_json_record_file
Upload complete.
Waiting on bqjob_r5a4ce64904bbba9d_0000015e14aba735_1 ... (3s) Current
status: DONE
BigQuery error in load operation: Error processing job 'invq-
test:bqjob_r5a4ce64904bbba9d_0000015e14aba735_1': JSON table
encountered too many errors, giving up. Rows:
1; errors: 1.
Which after checking for the job error was just giving me the following:
"status": {
"errorResult": {
"location": "file-00000000",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
},
"errors": [
{
"location": "file-00000000",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
}
],
"state": "DONE"
},
The after a couple of more attempts creating new tables, it actually started to succeed on command line, without reporting errors:
bq load --max_bad_records=1 --source_format=NEWLINE_DELIMITED_JSON invq_data.test_table_4 my_single_json_record_file
Upload complete.
Waiting on bqjob_r368f1dff98600a4b_0000015e14b43dd5_1 ... (16s) Current status: DONE
with no error on the status checker...
"statistics": {
"creationTime": "1503585955356",
"endTime": "1503585973623",
"load": {
"badRecords": "0",
"inputFileBytes": "1494390",
"inputFiles": "1",
"outputBytes": "0",
"outputRows": "0"
},
"startTime": "1503585955723"
},
"status": {
"state": "DONE"
},
But no actual records are added to my tables.
I tried to perform the same from WebUI but the result is the same. Green on the completed job, but no actual record added.
Is there something else that I can do for checking where the data is sinking to? Maybe some more log?
I can imagine that maybe I am on the the edge of the 2 MB JSON row size limit but, if so, should this be reported as error?
Thanks in advance for the help!!
EDIT:
It turned out the complexity of my schema was a bit the devil in here.
My json files were valid, but my complex schema had several errors.
It turned out that I had to simplify it such schema anyway, because I got a new batch of data where single json instances where more 30MB and I had to restructure this data in a more relational way, whilst making smaller rows to insert in the database.
Funny enough when the schema was scattered across multiple entities (ergo, simplified) the actually error/inconsistencies of the schema started to actually show up in error returned and it was easier to fix them. (Mostly it was new nested undocumented data which I was not aware anyway... but still my bad).
The lesson here, is when a table schema is too long (I didn't experiment how much precisely is too long) BigQuery just hide itself behind reporting too many errors to show.
But that is a point where you should consider simplify the schema(/structure) of your data.

com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found

When using a Talend bigquery input component (BQ java api) to read from bigquery, I get the following error (for a long running job) -
Exception in component tBigQueryInput_4
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table rand-cap:_f000fcf374688fc5e7da50a4c0c04ba228d993c3.anon0849eba05949a62962f218a0433d6ee82bf13a7b",
"reason" : "notFound"
} ],
"message" : "Not found: Table rand-cap:_f000fcf374688fc5e7da50a4c0c04ba228d993c3.anon0849eba05949a62962f218a0433d6ee82bf13a7b"
}
Is this because of the "temporary" table that bq creates when querying results not being available after 24hrs. Or is it because rate limit was exceeded since I am querying a large table ?
In either case, how can I find more details on this error and what steps should I take to prevent this ?
Thank you !
This seems to be a problem in Talend, there are other users describing your issue: https://www.talendforge.org/forum/viewtopic.php?id=44734
Google Bigquery has a property i.e. Allowlargeresults but its not there in TBigqueryinput.
Hi there - I am currently using Talend open studio v6.1.1 and this issue still exists.

BigQuery Load Job [invalid] Too many errors encountered

I'm trying to insert data into BigQuery using the BigQuery Api C# Sdk.
I created a new Job with Json Newline Delimited data.
When I use :
100 lines for inputs : OK
250 lines for inputs : OK
500 lines for inputs : KO
2500 lines : KO
The error encountered is :
"status": {
"state": "DONE",
"errorResult": {
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 0."
},
"errors": [
{
"reason": "internalError",
"location": "File: 0",
"message": "Unexpected. Please try again."
},
{
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 0."
}
]
}
The file works well when I use the Bq Tools with command :
bq load --source_format=NEWLINE_DELIMITED_JSON dataset.datatable pathToJsonFile
Something seems to be wrong on server side or maybe when I transmit the file but we cannot get more log than "internal server error"
Does anyone have more informations on this ?
Thanks you
"Unexpected. Please try again." could either indicate that the contents of the files you provided had unexpected characters, or it could mean that an unexpected internal server condition occurred. There are several questions which might help shed some light on this:
does this consistently happen no matter how many times you retry?
does this directly depend on the lines in the file, or can you construct a simple upload file which doesn't trigger the error condition?
One option to potentially avoid these problems is to send the load job request with configuration.load.maxBadRecords higher than zero.
Feel free to comment with more info and I can maybe update this answer.