How do I get the list of bad records that didn't load in Bigquery? - google-bigquery

Is there a way to get all the bad records that get skipped while doing a Bigquery load job and setting --max_bad_records ?

I believe the status.errors field will have a list of errors that occurred during job processing, including non-fatal errors like bad rows that were skipped.
https://cloud.google.com/bigquery/docs/reference/v2/jobs

Related

Bigquery internal error during copy job to move tables between datasets

I'm currently migrating around 200 tables in Bigquery (BQ) from one dataset (FROM_DATASET) to another one (TO_DATASET). Each one of these tables has a _TABLE_SUFFIX corresponding to a date (I have three years of data for each table). Each suffix contains typically between 5 GB and 80 GB of data.
I'm doing this using a Python script that asks BQ, for each table, for each suffix, to run the following query:
-- example table=T_SOME_TABLE, suffix=20190915
CREATE OR REPLACE TABLE `my-project.TO_DATASET.T_SOME_TABLE_20190915`
COPY `my-project.FROM_DATASET.T_SOME_TABLE_20190915`
Everything works except for three tables (and all their suffixes) where the copy job fails at each _TABLE_SUFFIX with this error:
An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 4893854
Retrying the job after some time actually works but of course is slowing the process. Is there anyone who has an idea on what the problem might be?
Thanks.
It turned out that those three problematic tables were some legacy ones with lots of columns. In particular, the BQ GUI shows this warning for two of them:
"Schema and preview are not displayed because the table has too many
columns and may cause the BigQuery console to become unresponsive"
This was probably the issue.
In the end, I managed to migrate everything by implementing a backoff mechanism to retry failed jobs.

Inconsistency in BigQuery Data Ingestion on Streaming Error

​Hi,
While streaming data to BigQuery, we are facing some inconsistency in data ingested when making https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll requests using BigQuery Java library.
Some of the batches fail with error code: backendError, while some requests time-out with exception stacktrace: https://gist.github.com/anonymous/18aea1c72f8d22d2ea1792bb2ffd6139
For batches which have failed, we have observed 3 different kinds of behaviours related to ingested data:
All records in that batch fail to be ingested into BigQuery
Only some of the records fail to be ingested into BigQuery
All records successfully gets ingested into BigQuery​ in-spite of the​ thrown error
Our questions are:
How can we distinguish between these 3 cases.
For case 2, how can we handle partially ingested data, i.e., which records from that batch should be retried?
For case 3, if all records were successfully ingested, why is error thrown at all?
Thanks in advance...
For partial success, the error response will indicate which rows got inserted and which ones failed - especially, for parsing errors. There are cases where the response fails to reach your client resulting in timeout errors even though the insert succeeded.
In general, you can retry the entire batch and it will be deduplicated if you use the approach outlined in the data consistency documentation.

BigQuery "Backend Error, Job aborted" when exporting data

The export job for one of my tables fails in BigQuery with no error message, I checked the job id hoping to get more info but it just says "Backend Error, Job aborted". I used the command-line tool with tis command
bq extract --project_id=my-proj-id --destination_format=NEWLINE_DELIMITED_JSON 'test.table_1' gs://mybucket/export
I checked this question but I know that it is not a problem with my destination bucket in GCS, Because exporting other tables to same bucked is done successfully.
The only difference here is that this table has a repeated record field and each json can get pretty large but I did not find any limit for this on BigQuery docs.
Any ideas on what be the problem can be?
Job Id from one of my tries: bqjob_r51435e780aefb826_0000015691dda235_1

Stale BigQuery table after load job

I've ran into a situation where a BigQuery table has become stale. I can't even run a count query on it. This occurred right after I ran the first load job.
For each query I run I get an error:
Error: Unexpected. Please try again.
See for example Job IDs: job_OnkmhMzDeGpAQvG4VLEmCO-IzoY, job_y0tHM-Zjy1QSZ84Ek_3BxJ7Zg7U
The error is "illegal field name". It looks like the field 69860107_VID is causing it. BigQuery doesn't support column rename, so if you want to change the schema you'll need to recreate the table.
I've filed a bug to fix the internal error -- this should have been blocked when the table was created.

Getting error from bq tool when uploading and importing data on BigQuery - 'Backend Error'

I'm getting the error: BigQuery error in load operation: Backend Error when I try to upload and import data on BQ. I already reduced size, increased time between imports, but nothing helps. The strange thing is that if I wait for a time and retry it just works.
In the BigQuery Browser tool it appears like an error in some line/field, but I checked and there is none. And obviously this is a fake message, because if I wait and retry to upload/import the same file, it works.
Tnks
I looked up our failing jobs in the bigquery backend, and I couldn't find any jobs that terminated with 'backend error'. I found several that failed because there were ascii nulls found in the data. (it can be helpful to look at the error stream errors, not just the error result). It is possible that the data got garbled on the way to bigquery... are you certain the data did not change between the failing import and the successful one on the same data?
I've found exporting from a big query table to csv in cloud storage hits the same error when certain characters are present in one of the columns (in this case a column storing the raw results from a prediction analysis). By removing that column from the export it resolved the issue.