one of my jobs keeps failing and when I looked into why (by requesting job details) I get the following output:
status": {
"errorResult": {
"location": "gs://sf_auto/Datastore Mapper modules.models.userData/15716706166748C8426AD/output-46",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
},
"errors": [
{
"location": "gs://sf_auto/Datastore Mapper modules.models.userData/15716706166748C8426AD/output-46",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
}
],
"state": "DONE"
Problem is, it doesn't help at all, and I need more details. Is there anyway to understand which column or attribute caused the failings? Is there any way to get more information?
Edit Additional Details
We're running a map reduce job on appengine to transfer our datastore from appengine to BigQuery
The files are stored on Google Cloud Store
It's creating a brand new table instead of adding to an existing one
Update #2
I played around with the query trying lots of things as well as adjusting the scheme and i've narrowed down the problem to the uuid. For some reason this type of data messes everything up:
"uuid": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
The schema defines it as a String
OK, after loads of debugging I found the error... in the JSON Newline file we had two attributes that were similar:
uuid: "XXX..."
uuId: "XXX..."
This has been there for a while so I think some change within bigquery started to require that keys be unique regardless of capitalization. Will test so more and confirm!
A recent change made loads of JSON data case insensitive in field names similar to be consistent with how SQL queries treat field names. I have opened a work item to track the improvement of error message for this case.
Related
I'm creating a project where I communicate with the Gitlab API and I'm currently thinging on how to handle errors. Typically, when error occurs you would show any notification in the frontend (e.g. https://codeseven.github.io/toastr/demo.html).
But according to https://docs.gitlab.com/ee/api/#data-validation-and-error-reporting I'm sometimes getting back arrays of errors. So I thought, "ok, simply show multipl error notifications", but now I tried to create a project with the same name twice and getting back the following errors:
"message": {
"project_namespace.name": [
"has already been taken"
],
"name": [
"has already been taken"
],
"path": [
"has already been taken"
]
}
By the way, I only specify the project name and namespace_id. So path and name are the same.
In this case I would show 3 error messages with "has already been taken" which would be a bit confusing in the frontend, seeing the same error 3 times and even without the project name (has already been taken instead of <Projectname> has already been taken). So I wonder how other ppl are handling this behavior? Any idea how to handle this? Would you filter it so that it shows has already been taken only once?
I am trying to publish data directly from pubsub to bigquery.
I have created a topic with a schema.
I have created a table.
But when I create the subscription, I get an error request contains an invalid argument
gcloud pubsub subscriptions create check-me.httpobs --topic=check-me.httpobs --bigquery-table=agilicus:checkme.httpobs --write-metadata --use-topic-schema
ERROR: Failed to create subscription [projects/agilicus/subscriptions/check-me.httpobs]: Request contains an invalid argument.
ERROR: (gcloud.pubsub.subscriptions.create) Failed to create the following: [check-me.httpobs].
there's not really a lot of diagnostics i can do here.
Is there any worked out example that shows? What am i doing wrong for this error?
Side note: its really a pain to have to create the BQ schema w/ its native json format, and then create the message schema in avro format. Similar, but different, and no conversion tools that I can find.
If i run with --log-http, it doesn't really enlighten:
{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"status": "INVALID_ARGUMENT"
}
}
-- update:
switched to protobuf, same problem.
https://gist.github.com/donbowman/5ea8f8d8017493cbfa3a9e4f6e736bcc has the details.
gcloud version
Google Cloud SDK 404.0.0
alpha 2022.09.23
beta 2022.09.23
bq 2.0.78
bundled-python3-unix 3.9.12
core 2022.09.23
gsutil 5.14
I have confirmed all the fields are present and correct format, as per https://github.com/googleapis/googleapis/blob/master/google/pubsub/v1/pubsub.proto#L639
specifically:
{"ackDeadlineSeconds": 900, "bigqueryConfig": {"dropUnknownFields": true, "table": "agilicus:checkme.httpobs", "useTopicSchema": true, "writeMetadata": true}, "name": "projects/agilicus/subscriptions/check-me.httpobs", "topic": "projects/agilicus/topics/check-me.httpobs"}
I have also tried using the API Explorer to post this, same effect.
I have also tried using the python example:
https://cloud.google.com/pubsub/docs/samples/pubsub-create-bigquery-subscription#pubsub_create_bigquery_subscription-python
to create. Same error w/ a slight bit more info (ip, grpc_status 3)
debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B2607:f8b0:400b:807::200a%5D:443 {created_time:"2022-10-04T20:54:44.600831924-04:00", grpc_status:3, grpc_message:"Request contains an invalid argument."}"
We use Azure Data Factory copy pipeline to transfer data from REST api's to a Azure SQL Database and it is doing some strange things. Because we loop over a set of API's that need to be transferred the mapping is empty from the copy activity.
But for one API the automatic mapping is going wrong, the destination table is created with all the needed columns and correct datatypes based on the received metadata. When we run the pipeline for that specific API, the following message is showed.
{ "errorCode": "2200", "message": "ErrorCode=SchemaMappingFailedInHierarchicalToTabularStage,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to process hierarchical to tabular stage, error message: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks.\r\nParameter name: ticks,Source=Microsoft.DataTransfer.ClientLibrary,'", "failureType": "UserError", "target": "Copy data1", "details": [] }
As a test we did do the mapping for that API manually by using the "Import Schema" option on the Mapping page. there we see that all the fields are correctly mapped. We execute the pipeline again using the mapping and everything is working fine.
But of course, we don't want to use a manually mapping because it is used in a loop for different API's also.
When I upsert a row that mismatches schema I get a PartialFailureError along with a message, e.g.:
[ { errors:
[ { message: 'Repeated record added outside of an array.',
reason: 'invalid' } ],
...
]
However for large rows this isn't sufficient, because I have no idea which field is the one creating the error. The bq command does report the malformed field.
Is there either a way to configure or access name of the offending field, or can this be added to the API endpoint?
Please see this Github Issue: https://github.com/googleapis/nodejs-bigquery/issues/70 . Apparently node.js client library is not getting the location field from the API so it's not able to return it to the caller.
Workaround that worked for me: I copied the JSON payload to my Postman client and manually sent a request to REST API (let me know if you need more details of how to do it).
I updated our BigQuery client to new client Google API and suddenly I started seeing this error when uploading via JSON:
"errors": [
{
"reason": "invalid",
"location": "Offset:0 / Line:1 / Column:159 / Field:q1",
"message": "Could not convert value to string"
},
Job reference:
"jobReference": {
"projectId": "dot-metrics",
"jobId": "job_8e0511a40c1845cca5717daf78b605f7"
},
This worked before we updated our client, afterwards it just stopped working so it must be some change inside BigQuery. Any help is appreciated!
This looks like a regression in a recent release that broke importing null values in json. A fix should be forthcoming.
Note if you drop the null fields (i.e. instead of "field: null" you just don't include "field" at all) it should continue to work.