BigQuery Error: Destination deleted/expired during execution - google-bigquery

I've a batch script to load data from google cloud bucket to a table in big query. A scheduled SSIS job executes this batch file daily.
bq load -F "\t" --encoding=UTF-8 --replace=true db_name.tbl_name gs://GSCloudBucket/file.txt "column1:string, column2:string, column3:string"
Weirdly, the execution is successful some days and not some other time. Here is what I have on the log.
Waiting on bqjob_r790a43a4_00000155a65559c2_1 ... (0s) Current status: RUNNING ......
Waiting on bqjob_r790a43a4_00000155a65559c2_1 ... (7s) Current status: DONE
BigQuery error in load operation: Error processing job: Destination
deleted/expired during execution

one option is if you have 1 day (or multiple of days) expiration on that table (either on table directly or via default expiration on dataset). In this case - because actual time of load very you can get to situation when destination table has expired by that time.
You can use configuration.load.createDisposition attribute to address this.
Or/and you can make sure you have proper expiration set - for daily process it would be let's say - 26 hours - so you have extra 2 hours for your SSIS job to complete before table can expire

Related

Not able to update Big query table with Transfer from a Storage file

I am not able to update a big query table from a storage file. I have latest data file and transfer runs successfully. But it say "8:36:01 AM Detected that no changes will be made to the destination table.".
Tried multiple ways.
Please help.
Thanks,
-Srini
You have to wait 1 hour after your file has been updated in Cloud Storage: https://cloud.google.com/bigquery-transfer/docs/cloud-storage-transfer?hl=en_US#minimum_intervals
I had the same error. I created two transfers from GCS to BigQuery, with write preference set to MIRROR and APPEND. I got the logs below (no error). The GCS file was uploaded less than one hour before.
MIRROR: Detected that no changes will be made to the destination table. Summary: succeeded 0 jobs, failed 0 jobs.
APPEND: None of the 1 new file(s) found matching "gs://mybucket/myfile" meet the requirement of being at least 60 minutes old. They will be loaded in next run. Summary: succeeded 0 jobs, failed 0 jobs.
Both jobs went through one hour later.

Check scheduled job status in SAP BODS

I am new to BODS, At present I have configured a job to execute every 2 mins to perform transaction from MySQL server and Load into HANA tables.
But sometimes when the data volume in MySQL is too large to transform and load into HANA within 2 Mins, the job is still executing my next iteration for the same job starts which results in BODS failure.
My question is: is there is any option BODS to check for the execution status of the scheduled JOB between runs?
Please help me out with this.
You can create a control/audit table to keep history of each run of bods job. The table should contain fields like eExtractionStart, ExtractionEnd, EndTime etc. And you need to make a change in the job, so that it reads status of previous run from this table before starting the load to Hana data flow. If previous run has not finished, the job can raise an exception.
Let me know if this has been helpful or if you need more information.

Bigquery load job said successful but data did not get loaded into table

I submitted a Bigquery load job, it ran and returned with the status successful. But the data didn't make into the destintation table.
Here was the command that was run:
/usr/local/bin/bq load --nosynchronous_mode --project_id=ardent-course-601 --job_id=logsToBq_load_impressions_20140816_1674a956_6c39_4859_bc45_eb09db7ef99a --source_format=NEWLINE_DELIMITED_JSON dw_logs_impressions.impressions_20140816 gs://sm-uk-hadoop/queries/logsToBq_transformLogs/impressions/20140816/9307f6e3-0b3a-44ca-8571-7107c399998c/part* /opt/sm-analytics/projects/logsTobqMR/jsonschema/impressionsSchema.txt
I checked the job status of the job logsToBq_load_impressions_20140816_1674a956_6c39_4859_bc45_eb09db7ef99a. The input file count and size showed the correct number of input files and total size.
Does anyone know why the data didn't make into the table but yet the job is reported as successful?
Just in case this is not a mistake on our side, I ran the load job again but to a different destination table and this time the data made into the destination table fine.
Thank you.
I experienced this recently with BigQuery in sandbox mode without a billing account.
In this mode the partition expiration is automatically set to 60 days. If you load data into the table where the partitioned column(e.g. date) is older than 60 days it won't show up in the table. The load job still succeeds with the correct number of output rows.
This is very surprising, but I've confirmed via the logs that this is indeed the case.
Unfortunately, the detailed logs for this job, which ran on August 16, are no longer available. We're investigating whether this may have affected other jobs more recently. Please ping this thread if you see this issue again.
we had this issue in our system and the reason was like table was set with partition expiry for 30 days and table was partitioned on timestamp column.. Hence when someone was ingesting data which is older than partition expiry date bigquery load jobs were successfully completed in Spark but we see no data in ingestion tables.. since it was getting deleted moment after it was ingested.. due to partition expiry set on.
Please check your bigquery table partition expiry parameters and see the partition column value of incoming data. If it value will be lower than partition expiry.. you wont see data in bigquery tables.. it will get deleted just after the ingestion.

Backend error on import from Cloud Storage to BigQuery

Recently, we have begun to see a number of errors such as this when importing from Cloud Storage to BigQuery:
Waiting on job_72ae7db68bb14e93b7a6990ed628aedd ... (153s) Current status: RUNNING
BigQuery error in load operation: Backend Error
Waiting on job_894172da125943dbb2cd8891958d2d10 ... (364s) Current status: RUNNING
BigQuery error in load operation: Backend Error
This process runs hourly, and had previously been stable for a long time. Nothing has changed in the import script or the types of data being loaded. Please let me know if you need any more information.
I looked up these jobs in the BigQuery logs-- both of them appear to have succeeded. It is possible that the error you got was in reading the job state. I've filed an internal bug that we should distinguish between errors in the job and errors getting the state of the job in the bq tool.
After the job runs, you can use bq show -j <job_d> to see what the actual state of the job is. If it is still running, you can run bq wait <job_id>.
I also took a look at the front-end logs; all of the status requests for those job ids returned HTTP 200 (success) codes.
Can you add the --apilog=file.txt parameter to your bq command line (you'll need to add it to the beginning of the command line, as in bq --apilog=file.txt load ...) and send the output of a case where you get another failure? If you're worried about sensitive data, feel free to send it directly to me (tigani at google).
Thanks /
Jordan Tigani /
Google BigQuery Engineer

What happens when bigquery upload job fails after loaded a portion of the JSON file?

As the title mentioned, what happens when I start a bigquery upload job and, let's say, after loading 50% of the rows in the JSON file the job failed. Does bigquery rollback everything of the load job or am I left with 50% of the data loaded?
I am appending data daily into a single table and keeping duplicate-free is very important. We are using the HTTP Rest API
BigQuery appends data atomically. You will never get half of the data in the table if the load fails. If the job completes successfully, all of the data will show up at once.
There are two additional tricks you can use to prevent duplicates:
Specify a job id for the load job. Imagine you pull your network cable mid way through starting the job... how do you know whether it succeeded? Specifying a job id lets you look up the job later if the job creation request fails.
Perform your loads to a temporary table, and specify WRITE_TRUNCATE as the writeDisposition. This means that you can run import jobs idempotently to the temporary table, and if you don't know whether a job succeeded, just run another one, and it will overwrite the data. Once you have a load job that completes successfully, run a table copy job with a writeDisposition to WRITE_APPEND to append the new data to your main table.