We've got error during tabledata.list with message:
API limit exceeded: Unable to return a row that exceeds the API limits. To retrieve the row, export the table.
It is not listed at https://cloud.google.com/bigquery/troubleshooting-errors#errortable .
This error occurs every time.
We can export this table into GCS normally. Result looks normal (there are no extremely large rows).
We manage to retrieve several result pages before the actual error occurs.
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "API limit exceeded: Unable to return a row that exceeds the API limits. To retrieve the row, export the table.",
"reason" : "apiLimitExceeded"
} ],
"message" : "API limit exceeded: Unable to return a row that exceeds the API limits. To retrieve the row, export the table."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145) ~[com.google.api-client.google-api-client-1.21.0.jar:1.21.0]
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) ~[com.google.api-client.google-api-client-1.21.0.jar:1.21.0]
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) ~[com.google.api-client.google-api-client-1.21.0.jar:1.21.0]
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) ~[com.google.api-client.google-api-client-1.21.0.jar:1.21.0]
What does it mean? How can we resolve this error?
Sorry about the inconvenience.
This is a known issue of tabledata.list method.
The problem is that we have some infrastructure limitations that it is currently not possible to return very large row from tabledata.list.
large is a relative word. Unfortunately some row has small size when represented in json, but can consume lots of memory when represented in our internal format.
The current workaround is as mentioned in the error message: to export the table.
For the long-term, we are actively working on improving our system to overcome this limitation. Stay tuned :)
There are two row related limits for tabledata.list:
The byte size of the proto message has to be less than 10M. If one row is larger than this size, we cannot retrieve it.
The max field value in the row has to be less than 350,000, that is the number of leaf fields of a row.
If you hit that problem, it usually only means the first row in your request is too large to return, if you skip that row, the following row retrieval might work. You may try look at the particular row closer to see why.
In the future, it is likely the field value limit can be removed, but we will still have the 10M size limit due to API server limitations.
Related
In loading data into BigQuery, I get the following error (copied from Job History in BigQuery web console).
Errors:
query: Failed to load FileDescriptorProto for '_GEN_DREMEL_ONESTORE_METADATA_SCHEMA_': (error code: invalidQuery)
Field numbers 19000 through 19999 are reserved for the protocol buffer library implementation.
Field numbers 19000 through 19999 are reserved for the protocol buffer library implementation.
[... repeated a total of exactly 1000 times...]
Field numbers 19000 through 19999 are reserved for the protocol buffer library implementation.
(error code: invalidQuery)
The data is a Datastore Managed Backup. (The folks from that team sent me to BigQuery for help.)
The error occurs with one of six randomly selected Kinds; the others load successfully. In addition, loading another Kind gives the error "too many fields: 10693 (error code: invalid)".
Both the failed Kind and the successful ones have a similar size of ~15 gigabytes of data.
What can we do to load this data?
This was caused by BigQuery's limitation: A maximum of 10000 columns per table. So the utility for loading a Datastore backup simply does not work in this case.
I'm getting 403 rateLimitExceeded errors while doing streaming inserts into BigQuery. I'm doing many streaming inserts in parallel, so while I understand that this might be cause for some rate limiting, I'm not sure what rate limit in particular is the issue.
Here's what I get:
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Exceeded rate limits: Your table exceeded quota for rows. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",
"reason" : "rateLimitExceeded"
} ],
"message" : "Exceeded rate limits: Your table exceeded quota for rows. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
}
Based on BigQuery's troubleshooting docs, 403 rateLimitExceeded is caused by either concurrent rate limiting or API request limits, but the docs make it sound like neither of those apply to streaming operations.
However, the message in the error mentions table exceeded quota for rows, which sounds more like the 403 quotaExceeded error. The streaming quotas are:
Maximum row size: 1 MB - I'm under this - my average row size is in the KB and I specifically limit sizes to ensure they don't hit 1MB
HTTP request size limit: 10 MB - I'm under this - my average batch size is < 400KB and max is < 1MB
Maximum rows per second: 100,000 rows per second, per table. Exceeding this amount will cause quota_exceeded errors. - can't imagine I'd be over this - each batch is about 500 rows, and each batch takes about 500 milliseconds. I'm running in parallel but inserting across about 2,000 tables, so while it's possible (though unlikely) that I'm doing 100k rows/second, there's no way that's per table (more like 1,000 rows/sec per table max)
Maximum rows per request: 500 - I'm right at 500
Maximum bytes per second: 100 MB per second, per table. Exceeding this amount will cause quota_exceeded errors. - Again, my insert rates are not anywhere near this volume by table.
Any thoughts/suggestions as to what this rate limiting is would be appreciated!
I suspect you are occasionally submitting more than 100,000 rows per second to a single table. Might your parallel insert processes occasionally all line up on the same table?
The reason this is reported as a rate limit error is to give a push-back signal to slow down: to handle sporadic spikes of operations on a single table, you can back off and try again to spread the load out.
This is different from a quota failure which implies that retrying will still fail until the quota epoch rolls over (for ex, daily quota limits).
Recently we've started to get errors about "Row larger than the maximum allowed size".
Although documentation states that limitation in 2MB from JSON, we have successfully loaded 4MB (and larger) records also (see job job_Xr8vR3Fyp6rlH4zYaZFbZSyQsyI for an example of a 4.6MB record).
Has there been any change in the maximum allowed row size?
Erroneous job is job_qt_sCwokO2PWKNZsGNx6mK3cCWs. Unfortunately the error messages produced doesn't specify what record(s) is the problematic one.
There hasn't been a change in the maximum row size (I double checked and went back through change lists and didn't see anything that could affect this). The maximum is computed from the encoded row, rather than the raw row, which is why you sometimes can get larger rows than the specified maximum into the system.
From looking at your failed job in the logs, it looks like the error was on line 1. Did that information not get returned in the job errors? Or is that line not the offending one?
It did look like there was a repeated field with a lot of entries that looked like "Person..durable".
Please let me know if you think that you received this in error or what we can do to make the error messages better.
First let me explain the problem.
I have 500 unique users. The data from each of these users is split into smaller gzip files(lets say on an average 25 files per user). We have loaded each split gzip file as a separate table in BiqQuery. Therefore, our dataset has 13000 something tables in it.
Now, We have to run time range queries to retrieve some data from each user. We have around 500-1000 different time ranges from a single user. We would like to combine all these time ranges into a single query with logical OR and AND
WHERE (timestamp >2 and timestamp <3) OR (timestamp >4 and timestamp <5) OR .............. and so on 1000 times
and run them on 13000 tables
Our own tests show that Bigquery has query length limit of 10000 characters?
If we split the conditions into multiple queries we exceed 20,000 daily quota limit.
IS there any work around this, so that we could run these queries without hitting the daily quota limit?
Thanks
JR
I faced a similar issue of Big Query SQL Query length limit of 1024K characters when I am passing a big list of the array in WHERE condition.
To resolve it I used a parameterized query. https://cloud.google.com/bigquery/docs/parameterized-queries
I can think of two things:
Try reducing the number of tables in the dataset. If they share the same schema, would they be able to combined (denormalised) into a table, or at least less number of tables ?
I have loaded 500000+ JSON gzip files into one table, and querying is much easier.
With timestamp, you can try to use a shorter common denominator.
for example instead of
WHERE (timestamp > "2014-06-25:00:00:00" AND timestamp < "2014-06-26:00:00:00")
You could express
WHERE LEFT(timestamp,10) = "2014-06-25"
Hopefully this can reduce your character length limit as well.
The big query payload limit, when you use parameterized queries is increased to 10MB instead of 1MB. That helped me.
This is the error message that I got when I tried out to find the limit for payload size of parameterized queries:
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"reason" : "badRequest"
} ],
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"status" : "INVALID_ARGUMENT"
}
In the API Docs section Browsing Table Data, there is a reference to the "permitted response data size"; however, that link is dead. Experimentation revealed that requests with maxResults=50000 are usually successful, but as I near maxResults=100000 I begin to get errors from the BigQuery server.
This is happening while I page through a large table (or set of query results), so after each page is received, I request the next one; it thus doesn't matter to me what the page size is, but it does affect the communication with BigQuery.
What is the optimal value for this parameter?
Here is some explanations: https://developers.google.com/bigquery/docs/reference/v2/jobs/query?hl=en
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
To sum up: max size is 10MB, no row count limit.
You can choose value of maxResult parameter based on your usage of app.
If you want show data on the report, then you need to set low value for fast showing first page.
If you need to load data to other app, then you can use max possible value (record size * row count < 10MB).
As you say, you manually set maxResults = 100000 to page through result set, it will get errors from BigQuery server. What errors you will get? Could you paste the error message?