Bigquery Backend Error when exporting results to a table - google-bigquery

There is some time that the query runs perfectly, but lately has appeared to me this error: "Backend Error".
I know that my query is huge, and it takes about 300 seconds to execute. But I imagine this is some BigQuery's bug, so I wonder why this error is happening.
This error started appears when I was executing some other queries, when I just wanted the results and not export them.
So I started to create a table with the results hopping that BigQuery could be able to perform the query
Here is an image that shows the error:

I looked up your job in the BigQuery job database, and it completed successfully after 160 seconds.
BigQuery queries are fundamentally asynchronous. That is, when you run a query, it runs as a named Job by the BigQuery service. Since the original call may timeout, usual best-practice is to poll for completion by using the jobs.getQueryResults() API. My guess is that this is the API call that actually failed.
We had reports of an elevated number of Backend Errors yesterday and we're still investigating. However, these don't appear to be actual failed queries, instead they are failures getting the status of queries or getting the results, that should go away by retrying.
How did you run the query? Did you use the BigQuery Web UI? If you are using the API, did you call the bigquery.jobs.insert() api or the bigquery.jobs.query() api?

Related

BigQuery - How to see failed scheduled queries in Cloud Logging?

I would like to monitor the state of my BigQuery scheduled queries in a Cloud Monitoring dashboard. I have created several logs-based metrics to track errors in other services/resources, but having trouble finding any indication of scheduled query errors in Cloud Logging.
From the Scheduled Queries page in the BigQuery UI, I can check the run details on failed scheduled queries and it shows some log entries explaining the error e.g.
9:02:59 AM Error code 8 : Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex..; JobID: PROJECT:12345abc-0000-12a3-1234-123456abcdf
9:00:17 AM Starting to process the query job with no parameters.
9:00:00 AM Dispatched run to data source with id 1234567890
But for some reason I cannot find any of these messages in Cloud Logging. For succeeded jobs, there are some entries in the BigQuery logs, but the failed jobs are missing completely.
Any idea how to view failed scheduled queries in Cloud Logging or Cloud Monitoring?
You can use the following advanced filter to filter all the BigQuery errors related to "jobservice.insert"
resource.type="bigquery_resource"
protoPayload.serviceName="bigquery.googleapis.com"
protoPayload.methodName="jobservice.insert"
severity: "ERROR"
This is the result of that query:
Even a simple query like:
resource.type="bigquery_resource"
severity: "ERROR"
Is able to retrieve all the BigQuery related errors, as you can see here:
Once you find the one related to failed scheduled queries, you can click over the protopayload of the result and select "Show matching entries" to start constructing your own Advanced Query.
I was able to assemble this filter using the Advanced logs queries and BigQuery queries documents.
Please verify the permissions you have - in my case I was able to see failed scheduled queries in Cloud Logging in project where I had been set up as Owner. For another project, where I am just an Editor I was unable to see the erros, just like in your case.
The code bellow works (e.g. I am able to get errors related to "Permission denied while getting Drive credentials" - missing credentials on the Google Drive files for service account running scheduled query):
resource.type="bigquery_resource"
protoPayload.serviceName="bigquery.googleapis.com"
protoPayload.methodName="jobservice.insert"
severity=ERROR

BigQuery python client library dropping data on insert_rows

I'm using the Python API to write to BigQuery -- I've had success previously, but I'm pretty novice with the BigQuery platform.
I recently updated a table schema to include some new nested records. After creating this new table, I'm seeing significant portions of data not making it to BigQuery.
However, some of the data is coming through. In a single write statement, my code will try to send through a handful of rows. Some of the rows make it and some do not, but no errors are being thrown from the BigQuery endpoint.
I have access to the stackdriver logs for this project and there are no errors or warnings indicating that a write would have failed. I'm not streaming the data -- using the BigQuery client library to call the API endpoint (I saw other answers that state issues with streaming data to a newly created table).
Has anyone else had issues with the BigQuery API? I haven't found any documentation stating about a delay to access the data (I found the opposite -- supposed to be near real-time, right?) and I'm not sure what's causing the issue at this point.
Any help or reference would be greatly appreciated.
Edit: Apparently the API is the streaming API -- missed on my part.
Edit2: This issue is related. Though, I've been writing to the table every 5 minutes for about 24 hours, and I'm still seeing missing data. I'm curious if writing to a BigQuery table within 10 minutes of it's creation puts you in a permanent state of losing data or if it would be expected to catching everything after the initial 10 minutes from creation.

Which one is the best either Cached nor Non-Cached Google BigQuery in C# application

I have developed C# application that reads data from Google Big-query using .Net Client Library.
Query:
Select SUM(Salary), Count(Employee_ID) From Employee_Details
If i am using Non-Cached Query (JobConfig.UseCacheQuery=false) in Job Configuration then able to get the result in ~6 Seconds.
If i am using Cached Query (JobConfig.UseCacheQuery=true) in Job Configuration then able to get the same result in ~2 Seconds.
Which is the best way to use Google BigQuery whether Cached nor Non-Cached. (Cached Query Execution time is faster than Non-Cached once).
If there is any drop-backs are present in Cached Queries? Kindly Clarify this.
If you run a BigQuery query twice in a row, the query cache will allow the second query invocation to simply return the same results that the first query already computed, without actually running the query again. You get your results faster, and you don't get charged for it.
The query cache is a simple way to prevent customers from overspending by repeating the same query, which sometimes happens in automated environments.
Query caching is on by default, and I would recommend leaving it enabled unless you have a specific reason to disable it. One reason you might disable caching is if you are doing performance testing and want to actually run the query to see how long it takes. But those scenarios are rare.
Read more here:
https://cloud.google.com/bigquery/querying-data#querycaching

Random simple queries are failing on BigQuery

For the BQ team, queries that usually work, are failing sometimes.
Could you please look into what could be the issue, there is just this:
Error: Unexpected. Please try again.
Job ID: aerobic-forge-504:job_DTFuQEeVGwZbt-PRFbMVE6TCz0U
Sorry for the slow response.
BigQuery replicates data to multiple datacenters and may run queries from any of them. Usually this is transparent ... if your data hasn't replicated everywhere, we will try to find a datacenter that has all of the data necessary to run a query and execute the query from there.
The error you hit was due to BigQuery not being able to find a single datacenter that had all of the data for one of your tables. We try very hard to make sure this doesn't happen. In principle, it should be very rare (we've got solution designed to make sure that it never happens, but haven't finished the implementation yet). We saw an uptick in this issue this morning, and have a bug filed and are currently investigating the issue.
Was this a transient error? If you retry the operation, does it work now? Are you still getting errors on other queries?

BigQuery "backend error" on loading

We are having a number of backend errors on BigQuery's side when loading data files. Backend errors seem to be normal, occurring once or twice daily in our load jobs which are run every hour. In the last three days, we've had around 200 backend errors. This is causing cascading problems in our system.
Until the past three days, the system has been stable. The error is a simple "Backend error, try again." Usually the load job works when it's run again, but in the last three days the problem has become much worse. Please let me know if you need any other information from me.
We did have an issue on friday where all load jobs were returning backend error for a period of a few hours, but that should have been resolved. If you have seen backend errors since then please let us know and send a job id of a failing job.