I am trying to mess with the auto detection feature in Bigquery and currently I am encountering issues on updating the schema on my table.
What currently I have done.
I created manually a dataset and table name in Bigquery.
Execute my first bq load command (Works perfectly fine):
bq --location=${LOCATION} load --autodetect --source_format=${FORMAT} ${DATASET}.${TABLE} ${PATH_TO_SOURCE}.
I try to append a new JSON object introduced with new field to update the current schema.
Execute 2nd bq load command:
bq --location=${LOCATION} load --autodetect --schema_update_option=ALLOW_FIELD_ADDITION --source_format=${FORMAT} ${DATASET}.${TABLE} ${PATH_TO_SOURCE}
Throws an error:
Error in query string.
Error processing job. Schema has no fields.
I thought when --autodetect flag is enabled bq load command will not request for schema on your load job. Has anyone already encountered this issue?
First object:
{
"chatSessionId": "123",
"chatRequestId": "1234",
"senderType": "CUSTOMER",
"senderFriendlyName": "Player"
}
Second Object:
{
"chatSessionId": "456",
"chatRequestId": "5678",
"senderType": "CUSTOMER",
"senderFriendlyName": "Player",
"languageCode": "EN"
}
I reproduced your steps but I couldn't reproduce the same error as you can see in the images below:
Loading first JSON
First table's data
Loading second JSON
Second's table data
The only thing I changed in your data was the format: you provided a JSON and I turned it into a NEWLINE DELIMITED JSON (type of JSON that BigQuery expects).
You can find more information about it here.
Please let me know if it clarifies something for you.
I hope it helps.
Related
On Google Cloud Platform, I am trying to submit a pyspark job that writes a dataframe to BigQuery.
The code that executes the writing is as the following:
finalDF.write.format("bigquery")\
.mode('overwrite')\
.option("table","[PROJECT_ID].dataset.table")\
.save()
And I get the mentioned error in the title. How can I set the GCS temporary path?
As the github repository of spark-bigquery-connector states
One can specify it when writing:
df.write
.format("bigquery")
.option("temporaryGcsBucket","some-bucket")
.save("dataset.table")
Or in a global manner:
spark.conf.set("temporaryGcsBucket","some-bucket")
Property "temporaryGcsBucket" needs to be set either at the time of writing dataframe or while creating sparkSession.
.option("temporaryGcsBucket","some-bucket")
or like .option("temporaryGcsBucket","some-bucket/optional_path")
1. finalDF.write.format("bigquery") .mode('overwrite').option("temporaryGcsBucket","some-bucket").option("table","[PROJECT_ID].dataset.table") .save()
I am trying to export cloud firestore data into bigquery to do sql operations.
Exported cloud firestore to cloud storage [using] (https://cloud.google.com/firestore/docs/manage-data/export-import)
gcloud beta firestore export gs://htna-3695c-storage --collection-ids='users','feeds'
Followed https://cloud.google.com/bigquery/docs/loading-data-cloud-firestoreto import from bigquery.
We have 2 collections: Users & Feeds in the cloud firestore.
I have successfully exported feeds collection but am not able to export users collection.
I am getting an error while importing data from storage to bigquery
Error: unexpected property name 'Contacts'. we have contacts field in the collection users.
this contacts field is of type 'Map'.
I also tried the command line. Below is the command to export bigquery.
**bq --location=US load --source_format=DATASTORE_BACKUP myproject_Dataset.users gs://myproject-storage/2019-04-19T13:29:28_75338/all_namespaces/kind_users/all_namespaces_kind_users.export_metadata**
here also I got the same error:
'unexpected property name 'Contacts'.
I thought to add projection fields to export only specified fields some thing like below
**bq --location=US load --projection_fields=[Coins,Referral,Profile] --source_format=DATASTORE_BACKUP myproject_Dataset.users gs://myproject-storage/2019-04-19T13:29:28_75338/all_namespaces/kind_users/all_namespaces_kind_users.export_metadata**
here also I got the error:
Waiting on >bqjob_r73b7ddbc9398b737_0000016a4909dd27_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'myproject:bqjob_r73b7ddbc9398b737_0000016a4909dd27_1': An internal error
occurred and the request could not be completed. Error: 4550751
Can anyone please let me know how to fix these issues?
Thank you in advance.Image of firestore Db
It seems to be an issue with some of your documents. According to the limitations of the BigQuery import of Firestore data, you must have a schema with less than 10,000 unique field names. In your Contacts schema, you are using the contacts' names as keys, that design is likely to produce a big amount of field names. You would need to check if this is happening to your other documents.
As a workaround, you could change the design (at some stage of the loading process) from:
"Contacts" : {
"contact1" : "XXXXX",
"contact2" : "YYYYY",
}
to:
"Contacts" : [
{
"name" : "contact1",
"number" : "XXXXX"
},
{
"name" : "contact2",
"number" : "YYYYY"
},
]
This schema would reduce drastically the number of field_names which will make it easier to manipulate from BigQuery.
We're trying BigQuery for the first time, with data extracted from mongo in json format. I kept getting this generic parse error upon loading the file. But then I tried a smaller subset of the file, 20 records, and it loaded fine. This tells me it's not the general structure of the file, which I had originally thought was the problem. Is there any way to get more info on the parse error, such as the string of the record that it's trying to parse when it has this error?
I also tried using the max errors field, but that didn't work either.
This was via the website. I also tried it via the Google Cloud SDK command line 'bq load...' and got the same error.
This error is most likely caused by some of the JSON records not compying with table schema. It is not clear whether you used schema autodetect feature, or you are supplying schema for the load. But here is one example where such error could happen:
{ "a" : "1" }
{ "a" : { "b" : "2" } }
If you only have a few of these and they are for invalid records - you can automatically ignore them by using max_bad_records option for load job. More details at: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json
Using logging service https://console.cloud.google.com/logs?project=xxxx&service=bigquery.googleapis.com I'm not able to find logs related to job for UI or bq command line.
[EDIT ]
As suggested by #DoIT International bq ls -j show list of jobs but no log about the failure
Use the following command to list the history of query jobs:
bq ls -j
The bq is the part of Google CloudSDK available here: https://cloud.google.com/sdk/
You can use Jobs: list API to collect job info and upload it to GBQ
Since it is in GBQ - you can analyze it any way you want using power of BigQuery
You can either flatten result or use original - i recommend using original as it is less headache as no any transformation before loading to GBQ (you just literally upload whatever you got from API). Of course all this in simple app/script that you still have to write
Note: make sure you use full value for projection parameter
Info about failure should be present as below
"errorResult": {
"reason": string,
"location": string,
"debugInfo": string,
"message": string
},
see more details in https://cloud.google.com/bigquery/docs/reference/v2/jobs/list#response
I have a JSON schema:
[{"name":"timestamp","type":"integer"},{"name":"xml_id","type":"string"},{"name":"prod","type":"string"},{"name":"version","type":"string"},{"name":"distcode","type":"string"},{"name":"origcode","type":"string"},{"name":"overcode","type":"string"},{"name":"prevcode","type":"string"},{"name":"ie","type":"string"},{"name":"os","type":"string"},{"name":"payload","type":"string"},{"name":"language","type":"string"},{"name":"userid","type":"string"},{"name":"sysid","type":"string"},{"name":"loc","type":"string"},{"name":"impetus","type":"string"},{"name":"numprompts","type":"record","mode":"repeated","fields":[{"name":"type","type":"string"},{"name":"count","type":"integer"}]},{"name":"rcode","type":"record","mode":"repeated","fields":[{"name":"offer","type":"string"},{"name":"code","type":"integer"}]},{"name":"bw","type":"string"},{"name":"pkg_id","type":"string"},{"name":"cpath","type":"string"},{"name":"rsrc","type":"string"},{"name":"pcode","type":"string"},{"name":"opage","type":"string"},{"name":"action","type":"string"},{"name":"value","type":"string"},{"name":"other","type":"record","mode":"repeated","fields":[{"name":"param","type":"string"},{"name":"value","type":"string"}]}]
(http://jsoneditoronline.org/ for pretty print)
When loading through the browser GUI the schema is accepted as valid. The cli throws the following error:
BigQuery error in load operation: Invalid schema entry: "fields":[{"name":"type"
Is there something wrong with my schema as specified?
If you are passing the schema as json, you should write it to a file and pass the file name as the schema parameter. Passing the schema inline on the command line is only allowed for simple flat schemas.