Debugging what row is causing "Resources exceeded" error - sql

How does one debug Resources exceeded error messages from BigQuery, that are non-descript and do not point to any specific rows as causing the issue?
e.g.
bigquery error: Resources exceeded during query execution: ST_GeogFromGeoJSON failed: resources exhausted. Try simplifying the geometry.
I would love to simply/fix/remove the input to this function that's causing this issue, however, BigQuery doesn't give me ANY ideas as to how to debug this other than continuously re-query my dataset with different filters until I've eliminated the problematic row.
Any insights into how to debug these in the future would be much appreciated!
Or any BQ support engineers who can have the SAFE version of ST_GeogFromGeoJSON actually handle resource exceeding exceptions safely rather than throw this error message.
Reported the issue through here as well.
Note: I also have this issue when trying to debug which rows are causing resources to be exceeded when using JS UDFs.
Edit:
An example query:
SELECT * EXCEPT(decoded_line),
ST_SIMPLIFY(SAFE.ST_GEOGFROMGEOJSON(decoded_line), 30) AS simple_line
FROM polylines.decoded_lines
I'll try to find an example of a polyline that generates such an error, though...as I said above, it's been difficult to identify the exact one. Though I know this query works when down-sampling my dataset to 95% but excluding key timeframes.

Related

BQ Switching to TIMESTAMP Partitioned Table

I'm attempting to migrate IngestionTime (_PARTITIONTIME) to TIMESTAMP partitioned tables in BQ. In doing so, I also need to add several required columns. However, when I flip the switch and redirect my dataflow to the new TIMESTAMP partitioned table, it breaks. Things to note:
Approximately two million rows (likely one batch) is successfully inserted. The job continues to run but doesn't insert anything after that.
The job runs in batches.
My project is entirely in Java
When I run it as streaming, it appears to work as intended. Unfortunately, it's not practical for my use case and batch is required.
I've been investigating the issue for a couple of days and tried to break down the transition into the smallest steps possible. It appears that the step responsible for the error is introducing REQUIRED variables (it works fine when the same variables are NULLABLE). To avoid any possible parsing errors, I've set default values for all of the REQUIRED variables.
At the moment, I get the following combination of errors and I'm not sure how to address any of them:
The first error, repeats infrequently but usually in groups:
Profiling Agent not found. Profiles will not be available from this
worker
Occurs a lot and in large groups:
Can't verify serialized elements of type BoundedSource have well defined equals method. This may produce incorrect results on some PipelineRunner
Appears to be one very large group of these:
Aborting Operations. java.lang.RuntimeException: Unable to read value from state
Towards the end, this error appears every 5 minutes only surrounded by mild parsing errors described below.
Processing stuck in step BigQueryIO.Write/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at least 20m00s without outputting or completing in state finish
Due to the sheer volume of data my project parses, there are several parsing errors such as Unexpected character. They're rare but shouldn't break data insertion. If they do, I have a bigger problem as the data I collect changes frequently and I can adjust the parser only after I see the error, and therefore, see the new data format. Additionally, this doesn't cause the ingestiontime table to break (or my other timestamp partition tables to break). That being said, here's an example of a parsing error:
Error: Unexpected character (',' (code 44)): was expecting double-quote to start field name
EDIT:
Some relevant sample code:
public PipelineResult streamData() {
try {
GenericSection generic = new GenericSection(options.getBQProject(), options.getBQDataset(), options.getBQTable());
Pipeline pipeline = Pipeline.create(options);
pipeline.apply("Read PubSub Events", PubsubIO.readMessagesWithAttributes().fromSubscription(options.getInputSubscription()))
.apply(options.getWindowDuration() + " Windowing", generic.getWindowDuration(options.getWindowDuration()))
.apply(generic.getPubsubToString())
.apply(ParDo.of(new CrowdStrikeFunctions.RowBuilder()))
.apply(new BigQueryBuilder().setBQDest(generic.getBQDest())
.setStreaming(options.getStreamingUpload())
.setTriggeringFrequency(options.getTriggeringFrequency())
.build());
return pipeline.run();
}
catch (Exception e) {
LOG.error(e.getMessage(), e);
return null;
}
Writing to BQ. I did try to set the partitoning field here directly, but it didn't seem to affect anything:
BigQueryIO.writeTableRows()
.to(BQDest)
.withMethod(Method.FILE_LOADS)
.withNumFileShards(1000)
.withTriggeringFrequency(this.triggeringFrequency)
.withTimePartitioning(new TimePartitioning().setType("DAY"))
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER);
}
After a lot of digging, I found the error. I had parsing logic (a try/catch) that returned nothing (essentially a null row) in the event there was a parsing error. This would break BigQuery as my schema had several REQUIRED rows.
Since my job ran in batches, even one null row would cause the entire batch job to fail and not insert anything. This also explains why streaming inserted just fine. I'm surprised that BigQuery didn't throw an error claiming that I was attempting to insert a null into a required field.
In reaching this conclusion, I also realized that setting the partition field in my code was also necessary as opposed to just in the schema. It could be done using
.setField(partitionField)

ARRAY_AGG leading to OOM

I'm trying to run a pretty simple query but it's failing with an Resources exceeded error.
I read in another post that the heuristic used to allocate the number of mixers could fail from time to time.
SELECT
response.auctionId,
response.scenarioId,
ARRAY_AGG(response) AS responses
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
Is there a way to fix my query knowing that:
a response is composed of 38 fields (most of them being short strings)
the max(count()) of a single response is kind of low (165)
Query Failed
Error: Resources exceeded during query execution.
Job ID: teads-1307:bquijob_257ce97b_1566a6a3f27
It's a current limitation that arrays (produced by ARRAY_AGG or other means) must fit in the memory of a single machine. We've made a couple of recent improvements that should help to reduce the resources required for queries such as this, however. To confirm whether this is the issue, you could try a query such as:
SELECT
SUM(LENGTH(FORMAT("%t", response))) AS total_response_size
FROM
rtb_response_logs.2016080515
GROUP BY
response.auctionId,
response.scenarioId
ORDER BY total_response_size DESC LIMIT 1;
This formats the structs as strings as a rough heuristic of how much memory they would take to represent. If the result is very large, then perhaps we can restructure the query to use less memory. If the result is not very large, then some other issue is at play, and we'll look into getting it fixed :) Thanks!

Resources Exceeded on simple query when using DAY() or DATE() functions

This query runs fails with resources exceeded:
SELECT
*,
DAY(event_timestamp) as whywontitwork,
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
But this one works fine:
SELECT
*
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
The source table is 14m rows but I've run similar queries on much larger datasets before. We have large results enabled and have tried both flattened results and not (though there are no nested fields anyway). The error also occurs if you use the DATE() function instead of DAY(), or a REGEXP_EXTRACT() function
The job id is realself-main:bquijob_69e3a888_152f1fdc205.
You've hit an internal error in BigQuery. We tweaked our query engine's configuration at around 3pm (US Pacific Time) in an effort to prevent the error.
Update: After observing the error rate, it looks like this change has fixed the problem. If you see any other issues, please let us know. Note that StackOverflow is best for usage questions, but if you suspect a bug, you can file an issue at our public issue tracker.

How to use BigQuery Slots

Hi,there.
Recently,I want to run a query in bigquery web UI by using "group by" over some tables(tables' name suits xxx_mst_yyyymmdd).The rows will be over 10 million. Unhappily,the query failed with this error:
Query Failed
Error: Resources exceeded during query execution.
I did some improvements with my query language,the error may not happen for this time.But with the increasement of my data, the Error will also appear in the future.So I checked the latest release of Bigquery,maybe there two ways to solve this:
1.After 2016/01/01,Bigquery will change the Query pricing tiers to satisfy the "High Compute Tiers" so that the "resourcesExceeded error" will not happen again.
2.BigQuery Slots.
I checked some documents in Google and didn't find a way on how to use BigQuery Slots.Is there any sample or usecase of BigQuery Slots?Or I have to contact with BigQuery Team to open the function?
Hope someone can help me to answer this question,thanks very much!
A couple of points:
I'm surprised that a GROUP BY with a cardinality of 10M failed with resources exceeded. Can you provide a job id of the failed query so we can investigate? You mention that you're concerned about hitting these errors more often as your data size increases; you should likely be able to increase your data size by a few more orders of magnitude without seeing this; likely you've encountered either a bug or something was strange with either your query or your data.
"High Compute Tiers" won't necessarily get rid of resourcesExceeded. For the most part, resourcesExceeded means that BigQuery ran into memory limitations; high compute tiers only address CPU usage. (and note, they haven't been enabled yet).
BigQuery slots enable you to process data faster and with more reliable performance. For the most part, they also wouldn't help prevent resourcesExceeded errors.
There is currently (as of Nov 5) a bug where you may need to provide an EACH keyword with a GROUP BY. Recent changes should enable BigQuery to automatically select the execution strategy, so EACH shouldn't be needed, but there are a couple of cases where it doesn't pick the right one. When in doubt, add an EACH to your JOIN and GROUP BY operations.
To get your project eligible for using slots you need to contact support.

BigQuery maximum row size

Recently we've started to get errors about "Row larger than the maximum allowed size".
Although documentation states that limitation in 2MB from JSON, we have successfully loaded 4MB (and larger) records also (see job job_Xr8vR3Fyp6rlH4zYaZFbZSyQsyI for an example of a 4.6MB record).
Has there been any change in the maximum allowed row size?
Erroneous job is job_qt_sCwokO2PWKNZsGNx6mK3cCWs. Unfortunately the error messages produced doesn't specify what record(s) is the problematic one.
There hasn't been a change in the maximum row size (I double checked and went back through change lists and didn't see anything that could affect this). The maximum is computed from the encoded row, rather than the raw row, which is why you sometimes can get larger rows than the specified maximum into the system.
From looking at your failed job in the logs, it looks like the error was on line 1. Did that information not get returned in the job errors? Or is that line not the offending one?
It did look like there was a repeated field with a lot of entries that looked like "Person..durable".
Please let me know if you think that you received this in error or what we can do to make the error messages better.