Why are my BigQuery streaming inserts being rate limited? - google-bigquery

I'm getting 403 rateLimitExceeded errors while doing streaming inserts into BigQuery. I'm doing many streaming inserts in parallel, so while I understand that this might be cause for some rate limiting, I'm not sure what rate limit in particular is the issue.
Here's what I get:
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Exceeded rate limits: Your table exceeded quota for rows. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",
"reason" : "rateLimitExceeded"
} ],
"message" : "Exceeded rate limits: Your table exceeded quota for rows. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
}
Based on BigQuery's troubleshooting docs, 403 rateLimitExceeded is caused by either concurrent rate limiting or API request limits, but the docs make it sound like neither of those apply to streaming operations.
However, the message in the error mentions table exceeded quota for rows, which sounds more like the 403 quotaExceeded error. The streaming quotas are:
Maximum row size: 1 MB - I'm under this - my average row size is in the KB and I specifically limit sizes to ensure they don't hit 1MB
HTTP request size limit: 10 MB - I'm under this - my average batch size is < 400KB and max is < 1MB
Maximum rows per second: 100,000 rows per second, per table. Exceeding this amount will cause quota_exceeded errors. - can't imagine I'd be over this - each batch is about 500 rows, and each batch takes about 500 milliseconds. I'm running in parallel but inserting across about 2,000 tables, so while it's possible (though unlikely) that I'm doing 100k rows/second, there's no way that's per table (more like 1,000 rows/sec per table max)
Maximum rows per request: 500 - I'm right at 500
Maximum bytes per second: 100 MB per second, per table. Exceeding this amount will cause quota_exceeded errors. - Again, my insert rates are not anywhere near this volume by table.
Any thoughts/suggestions as to what this rate limiting is would be appreciated!

I suspect you are occasionally submitting more than 100,000 rows per second to a single table. Might your parallel insert processes occasionally all line up on the same table?
The reason this is reported as a rate limit error is to give a push-back signal to slow down: to handle sporadic spikes of operations on a single table, you can back off and try again to spread the load out.
This is different from a quota failure which implies that retrying will still fail until the quota epoch rolls over (for ex, daily quota limits).

Related

Google BigQuery Payload size limit of 10485760 bytes

We encountered an error while trying to stream data into bigquery table, it says: payload size limit of 10485760 bytes, anyone has any idea of it? According to the third party integration vendor which we use to move data across from sql server to bigquery table, they advised it is an issue by bigquery?
Thanks.
Best regards,
BigQuery has some maximum limitations and also has some quotas policies as you can see here.
The limitations for Streaming are:
If you do not populate the insertId field when you insert rows:
Maximum rows per second: 1,000,000
Maximum bytes per second: 1 GB
If you populate the insertId field when you insert rows:
Maximum rows per second: 100,000
Maximum bytes per second: 100 MB
The following additional streaming quotas apply whether or not you populate the insertId field:
Maximum row size: 1 MB
HTTP request size limit: 10 MB
Maximum rows per request: 10,000 rows per request
insertId field length: 128
I hope it helps
Indeed the streaming limit is 10MB per request.
Row size is 1MB according to https://cloud.google.com/bigquery/quotas
What you need to do is parallelize the streaming jobs. BigQuery supports up to 1M/rows per second.

Unable to increase quota

The default daily quota limit is inadequate to properly use the https://www.googleapis.com/youtube/v3/liveChat/messages endpoint. A query that returns you the necessary information costs about 8 units, and you get 10,000 units per day. Even if you're only querying for new messages every few seconds, that's barely enough for an hour.
I tried to increase the quota limit, but it appears that the form to do this doesn't work:
https://support.google.com/youtube/contact/yt_api_form?hl=en
It simply gives me "There were problems sending your form. Please try again."

How to interpret query process GB in Bigquery?

I am using a free trial of Google bigquery. This is the query that I am using.
select * from `test`.events where subject_id = 124 and id = 256064 and time >= '2166-01-15T14:00:00' and time <='2166-01-15T14:15:00' and id_1 in (3655,223762,223761,678,211,220045,8368,8441,225310,8555,8440)
This query is expected to return at most 300 records and not more than that.
However I see a message like this as below
But the table on which this query operates is really huge. Does this indicate the table size? However, I ran this query multiple times a day
Due to this, it resulted in error below
Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
How long do I have to wait for this error to go-away? Is the daily limit 1TB? If yes, then I didn't not use close to 400 GB.
How to view my daily usage?
If I can edit quota, can you let me know which option should I be editing?
Can you help me with the above questions?
According to the official documentation
"BigQuery charges for queries by using one metric: the number of bytes processed (also referred to as bytes read)", regardless of how large the output size is. What this means is that if you do a count(*) on a 1TB table, you will supposedly be charged $5, even though the final output is very minimal.
Note that due to storage optimizations that BigQuery is doing internally, the bytes processed might not equal to the actual raw table size when you created it.
For the error you're seeing, browse the Google Console to "IAM & admin" then "Quotas", where you can then search for quotas specific to the BigQuery service.
Hope this helps!
Flavien

Bigquery maximum processing data size allowance?

My question is how much data are we allowed to process on bigquery. I am using stackoverflow's kaggle dataset to analyze the data, and the text I am analyzing is around 27gb. I just want to get the average length per entry, so I do
query_length_text = """
SELECT
AVG(CHAR_LENGTH(title)) AS avg_title_length,
AVG(CHAR_LENGTH(body)) AS avg_body_length
FROM
`bigquery-public-data.stackoverflow.stackoverflow_posts`
"""
however this says:
Query cancelled; estimated size of 26.847077486105263 exceeds limit of 1 GB
I am only returning one float, so i know that isn't the problem. Is the 1gb on the processing too? How do I process it in batches, so I do 1gb at a time?
So Kaggle by default sets a 1GB limit on requests (to prevent your monthly quota of 5TB to run out). This is what causes this to happen. To prevent this, you can override it by using the max_gb_scanned parameter like this:
df = bq_assistant.query_to_pandas_safe(QUERY, max_gb_scanned = N)
where N is the amount of data processed by your query, or any number higher than it.

BigQuery max query length characters work around

First let me explain the problem.
I have 500 unique users. The data from each of these users is split into smaller gzip files(lets say on an average 25 files per user). We have loaded each split gzip file as a separate table in BiqQuery. Therefore, our dataset has 13000 something tables in it.
Now, We have to run time range queries to retrieve some data from each user. We have around 500-1000 different time ranges from a single user. We would like to combine all these time ranges into a single query with logical OR and AND
WHERE (timestamp >2 and timestamp <3) OR (timestamp >4 and timestamp <5) OR .............. and so on 1000 times
and run them on 13000 tables
Our own tests show that Bigquery has query length limit of 10000 characters?
If we split the conditions into multiple queries we exceed 20,000 daily quota limit.
IS there any work around this, so that we could run these queries without hitting the daily quota limit?
Thanks
JR
I faced a similar issue of Big Query SQL Query length limit of 1024K characters when I am passing a big list of the array in WHERE condition.
To resolve it I used a parameterized query. https://cloud.google.com/bigquery/docs/parameterized-queries
I can think of two things:
Try reducing the number of tables in the dataset. If they share the same schema, would they be able to combined (denormalised) into a table, or at least less number of tables ?
I have loaded 500000+ JSON gzip files into one table, and querying is much easier.
With timestamp, you can try to use a shorter common denominator.
for example instead of
WHERE (timestamp > "2014-06-25:00:00:00" AND timestamp < "2014-06-26:00:00:00")
You could express
WHERE LEFT(timestamp,10) = "2014-06-25"
Hopefully this can reduce your character length limit as well.
The big query payload limit, when you use parameterized queries is increased to 10MB instead of 1MB. That helped me.
This is the error message that I got when I tried out to find the limit for payload size of parameterized queries:
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"reason" : "badRequest"
} ],
"message" : "Request payload size exceeds the limit: 10485760 bytes.",
"status" : "INVALID_ARGUMENT"
}