I'm trying to transform table with 500M rows, about 30GB. It is simple grouping, the result should be another big table. I'm writing the result into new table with Allow Large Result option enabled - this should allow arbitrary large results but I'm getting Error: Resources exceeded during query execution. Job ID: job_6CHEAHSHETUOGK7QAGSTIX5A4QVV4ZKZ
Could you please check which resources exceeded?
Thanks,
Radek
It looks like your query is hitting a corner case in our configuration -- if it was a little bit larger, it would be fine, and if it was a little bit smaller, it would also be fine.
We've got a fix -- hopefully we can get it out today, but it might have to wait until Monday because of the Thanksgiving holiday this week.
Related
I noticed that running a SELECT count(*) FROM myTable on my larger BQ tables yields long running times, upwards of 30/40 seconds despite the validator claiming the query processes 0 bytes. This doesn't seem quite right when 500 GB queries run faster. Additionally, total row counts are listed under details -> Table Info. Am I doing something wrong? Is there a way to get total row counts instantly?
When you run a count BigQuery still needs to allocate resources (such as: slot units, shards etc). You might be reaching some limits which cause a delay. For example, the slots default per project is 2,000 units.
BigQuery execution plan provides very detail information about the process which can help you better understand the source of the delay.
One way to overcome this is to use an approximate method described in this link
This Slide by Google might also help you
For more details see this video about how to understand the execution plan
We are performance tuning, both in terms of cost and speed, some of our queries and the results we get are a little bit weird. First, we had one query that did an overwrite on an existing table, we stopped that one after 4 hours. Running the same query to an entirely new table and it only took 5 minutes. I wonder if the 5 minute query maybe used a cached result from the first run, is that possible to check? Is it possible to force BigQuery to not use cache?
If you run query in UI - expand Options and make sure Use Cached Result properly set
Also, in UI, you can check Job Details to see if cached result was used
If you run your query programmatically - you should use respective attributes - configuration.query.useQueryCache and statistics.query.cacheHit
I'm experiencing something rather strange with some queries that I'm performing in BigQuery.
Firstly, I'm using an externally backed table (csv.gz) with about 35 columns. The total data in the location is around 5Gb, with an average file size of 350mb. The reason I'm doing this, is that I continually add data and remove to the table on a rolling basis - to give me a view of the last 7 days of our activity.
When querying, if perform something simple like:
select * from X limit 10
everything works fine. It continues to work fine if you increase the limit up to 1 million rows. As soon as you up the limit to ten million:
select * from X limit 10000000
I end up with a tableUnavailable error "Something went wrong with the table you queried. Contact the table owner for assistance. (error code: tableUnavailable)"
Now according to to any literature on this, this usually results from using some externally owned table (I'm not). I can't find any other enlightening information for this error code.
Basically, If I do anything slightly complex on the data, I get the same result. There's a column called event that has maybe a couple hundred of different values in the entire dataset. If I perform the following:
select eventType, count(1) from X group by eventType
I get the same error.
I'm getting the feeling that this might be related to limits on external tables? Can anybody clarify or shed any light on this?
Thanks in advance!
Doug
I'm trying to run a pretty simple query but it's failing with an Resources exceeded error.
I read in another post that the heuristic used to allocate the number of mixers could fail from time to time.
SELECT
response.auctionId,
response.scenarioId,
ARRAY_AGG(response) AS responses
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
Is there a way to fix my query knowing that:
a response is composed of 38 fields (most of them being short strings)
the max(count()) of a single response is kind of low (165)
Query Failed
Error: Resources exceeded during query execution.
Job ID: teads-1307:bquijob_257ce97b_1566a6a3f27
It's a current limitation that arrays (produced by ARRAY_AGG or other means) must fit in the memory of a single machine. We've made a couple of recent improvements that should help to reduce the resources required for queries such as this, however. To confirm whether this is the issue, you could try a query such as:
SELECT
SUM(LENGTH(FORMAT("%t", response))) AS total_response_size
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
ORDER BY total_response_size DESC LIMIT 1;
This formats the structs as strings as a rough heuristic of how much memory they would take to represent. If the result is very large, then perhaps we can restructure the query to use less memory. If the result is not very large, then some other issue is at play, and we'll look into getting it fixed :) Thanks!
This query runs fails with resources exceeded:
SELECT
*,
DAY(event_timestamp) as whywontitwork,
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
But this one works fine:
SELECT
*
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
The source table is 14m rows but I've run similar queries on much larger datasets before. We have large results enabled and have tried both flattened results and not (though there are no nested fields anyway). The error also occurs if you use the DATE() function instead of DAY(), or a REGEXP_EXTRACT() function
The job id is realself-main:bquijob_69e3a888_152f1fdc205.
You've hit an internal error in BigQuery. We tweaked our query engine's configuration at around 3pm (US Pacific Time) in an effort to prevent the error.
Update: After observing the error rate, it looks like this change has fixed the problem. If you see any other issues, please let us know. Note that StackOverflow is best for usage questions, but if you suspect a bug, you can file an issue at our public issue tracker.