Resources Exceeded on simple query when using DAY() or DATE() functions - google-bigquery

This query runs fails with resources exceeded:
SELECT
*,
DAY(event_timestamp) as whywontitwork,
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
But this one works fine:
SELECT
*
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
The source table is 14m rows but I've run similar queries on much larger datasets before. We have large results enabled and have tried both flattened results and not (though there are no nested fields anyway). The error also occurs if you use the DATE() function instead of DAY(), or a REGEXP_EXTRACT() function
The job id is realself-main:bquijob_69e3a888_152f1fdc205.

You've hit an internal error in BigQuery. We tweaked our query engine's configuration at around 3pm (US Pacific Time) in an effort to prevent the error.
Update: After observing the error rate, it looks like this change has fixed the problem. If you see any other issues, please let us know. Note that StackOverflow is best for usage questions, but if you suspect a bug, you can file an issue at our public issue tracker.

Related

Failed to load FileDescriptorProto for '_CLOUD_QUERY_METADATA_SCHEMA_'

My Firebase project is integrated with BigQuery, so all raw Google Analytics events are exported daily & streamed to a dedicated collection.
Since today even simple queries on those events are failing with an error:
Error running query
Failed to load FileDescriptorProto for
'CLOUD_QUERY_METADATA_SCHEMA': ;Field number 23 has already been
used in "Msg_0_CLOUD_QUERY_TABLE" by field "items".
An example query which is failing:
SELECT * FROM `project.analytics_184030700.events_*` WHERE event_name IN ("share")
As I mentioned, those (and more advanced) queries used to run until yesterday. I did not change the schema nor any other configuration in the meantime. I've also noticed that BigQuery was updated yesterday.
Looking at the error description, looks like my table schema indeed contains a field called items (a very last one, 23rd) but it was automatically added by Google Analytics.
My suspicions:
Something went wrong with the recent BigQuery release
Something went wrong during daily sync Google Analytics -> BigQuery
Some old job or cache is getting in the way of new queries
At this point I have no idea what to try next. Does anyone have any insight into what could be causing this error?
EDIT:
I noticed that this problem was also just reported in the Google Issue Tracker here: https://issuetracker.google.com/issues/192325507.
I have same issue and I didn't solve it yet but as you said it's cause is Firebase I guess. There's an extra field problem which are limited only for three days (26th,27th and 28th June).
I checked all data older than 26th June but there was no privacy_info field. As you see there is no privacy_info field again for 29th June. I think firebase put this new field but they changed their mind for some reason. But this causes a big problem for us.
Update:
I changed this part:
SELECT * FROM `project.analytics_184030700.events_*`
Like this:
SELECT * FROM `project.analytics_184030700.events_2*`
Interestingly this worked for me.
You can do a workaround for that issue; It seems there are problems with the field
privacy_info
If you select multiple table partitions, just make sure you only select the fields you need, and omit the field privacy_info.
Not using "SELECT *" did resolve this error for me.

ARRAY_AGG leading to OOM

I'm trying to run a pretty simple query but it's failing with an Resources exceeded error.
I read in another post that the heuristic used to allocate the number of mixers could fail from time to time.
SELECT
response.auctionId,
response.scenarioId,
ARRAY_AGG(response) AS responses
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
Is there a way to fix my query knowing that:
a response is composed of 38 fields (most of them being short strings)
the max(count()) of a single response is kind of low (165)
Query Failed
Error: Resources exceeded during query execution.
Job ID: teads-1307:bquijob_257ce97b_1566a6a3f27
It's a current limitation that arrays (produced by ARRAY_AGG or other means) must fit in the memory of a single machine. We've made a couple of recent improvements that should help to reduce the resources required for queries such as this, however. To confirm whether this is the issue, you could try a query such as:
SELECT
SUM(LENGTH(FORMAT("%t", response))) AS total_response_size
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
ORDER BY total_response_size DESC LIMIT 1;
This formats the structs as strings as a rough heuristic of how much memory they would take to represent. If the result is very large, then perhaps we can restructure the query to use less memory. If the result is not very large, then some other issue is at play, and we'll look into getting it fixed :) Thanks!

How to use BigQuery Slots

Hi,there.
Recently,I want to run a query in bigquery web UI by using "group by" over some tables(tables' name suits xxx_mst_yyyymmdd).The rows will be over 10 million. Unhappily,the query failed with this error:
Query Failed
Error: Resources exceeded during query execution.
I did some improvements with my query language,the error may not happen for this time.But with the increasement of my data, the Error will also appear in the future.So I checked the latest release of Bigquery,maybe there two ways to solve this:
1.After 2016/01/01,Bigquery will change the Query pricing tiers to satisfy the "High Compute Tiers" so that the "resourcesExceeded error" will not happen again.
2.BigQuery Slots.
I checked some documents in Google and didn't find a way on how to use BigQuery Slots.Is there any sample or usecase of BigQuery Slots?Or I have to contact with BigQuery Team to open the function?
Hope someone can help me to answer this question,thanks very much!
A couple of points:
I'm surprised that a GROUP BY with a cardinality of 10M failed with resources exceeded. Can you provide a job id of the failed query so we can investigate? You mention that you're concerned about hitting these errors more often as your data size increases; you should likely be able to increase your data size by a few more orders of magnitude without seeing this; likely you've encountered either a bug or something was strange with either your query or your data.
"High Compute Tiers" won't necessarily get rid of resourcesExceeded. For the most part, resourcesExceeded means that BigQuery ran into memory limitations; high compute tiers only address CPU usage. (and note, they haven't been enabled yet).
BigQuery slots enable you to process data faster and with more reliable performance. For the most part, they also wouldn't help prevent resourcesExceeded errors.
There is currently (as of Nov 5) a bug where you may need to provide an EACH keyword with a GROUP BY. Recent changes should enable BigQuery to automatically select the execution strategy, so EACH shouldn't be needed, but there are a couple of cases where it doesn't pick the right one. When in doubt, add an EACH to your JOIN and GROUP BY operations.
To get your project eligible for using slots you need to contact support.

Changes in query behaviour

I have some queries that run every day for several month with no problem. I didn't change anything in the queries for a long while.
In the past few days some of them fail. Error message says something regarding some fields: "Field 'myfield' not found.". these queries usually involve some sub-queries and window functions.
Example for the BQ guys:
On 2015-08-03 Job ID: job_EUWyK5DIFSxJxGAEC4En2Q_hNO8 run successfully
on the following days, same query, failed. Job IDs: (job_A9KYJLbQJQvHjh1g7Fc0Abd2qsc , job__15ff66aYseR-YjYnPqWmSJ30N8)
In addition, for some other queries running times extended from minutes to hours and sometime return "timeout".
My questions:
Was something changed in the BQ engine?
What should I do to make my queries run again?
Thanks
So the problem could be two folded:
An update to query execution engine was rolled out during the week of August 3, 2015 as documented in the announcement
If this is the case, you need to update your queries.
Some performance issues were detected lately, but in order to actually identify if there is something wrong with your project or not, you need to create a issue request I did the same in past and it was fixed.

BigQuery resources exceeded

I'm trying to transform table with 500M rows, about 30GB. It is simple grouping, the result should be another big table. I'm writing the result into new table with Allow Large Result option enabled - this should allow arbitrary large results but I'm getting Error: Resources exceeded during query execution. Job ID: job_6CHEAHSHETUOGK7QAGSTIX5A4QVV4ZKZ
Could you please check which resources exceeded?
Thanks,
Radek
It looks like your query is hitting a corner case in our configuration -- if it was a little bit larger, it would be fine, and if it was a little bit smaller, it would also be fine.
We've got a fix -- hopefully we can get it out today, but it might have to wait until Monday because of the Thanksgiving holiday this week.