How to avoid Hitting the 10 sec limit per user - google-bigquery

We run multiple short queries in parallel, and hit the 10 sec limit.
According to the docs, throttling might occur if we hit a limit of 10 API requests per user per project.
We send a "start query job", and then we call the "getGueryResutls()" with timeoutMs of 60,000, however, we get a response after ~ 1 sec, we look for JOB Complete in the JSON response, and since it is not there, we need to send the GetQueryResults() again many times and hit the threshold, that is causing an error, not a slowdown. the sample code is below.
our questions are as such:
1. What is a "user" is it an appengine user, is it a user-id that we can put in the connection string or in the query itslef?
2. Is it really per API project of BigQuery?
3. What is the behavior?we got an error: "Exceeded rate limits: too many user/method api request limit for this user_method", and not a throttling behavior as the doc say and all of our process fails.
4. As seen below in the code, why we get the response after 1 sec & not according to our timeout? are we doing something wrong?
Thanks a lot
Here is the a sample code:
while (res is None or 'jobComplete' not in res or not res['jobComplete']) :
try:
res = self.service.jobs().getQueryResults(projectId=self.project_id,
jobId=jobId, timeoutMs=60000, maxResults=maxResults).execute()
except HTTPException:
if independent:
raise

Are you saying that even though you specify timeoutMs=60000, it is returning within 1 second but the job is not yet complete? If so, this is a bug.
The quota limits for getQueryResults are actually currently much higher than 10 requests per second. The reason the docs say only 10 is because we want to have the ability to throttle it down to that amount if someone is hitting us too hard. If you're currently seeing an error on this API, it is likely that you're calling it at a very high rate.
I'll try to reproduce the problem where we don't wait for the timeout ... if that is really what is happening it may be the root of your problems.

def query_results_long(self, jobId, maxResults, res=None):
start_time = query_time = None
while res is None or 'jobComplete' not in res or not res['jobComplete']:
if start_time:
logging.info('requested for query results ended after %s', query_time)
time.sleep(2)
start_time = datetime.now()
res = self.service.jobs().getQueryResults(projectId=self.project_id,
jobId=jobId, timeoutMs=60000, maxResults=maxResults).execute()
query_time = datetime.now() - start_time
return res
then in appengine log I had this:
requested for query results ended after 0:00:04.959110

Related

Sometimes the Google Geocoding API returns a 500 server error [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Sometimes the Google Maps API returns a 500 server error response according to German postal codes and i cannot understand why.
I hope it is specific enough.
Any ideas?
https://maps.googleapis.com/maps/api/geocode/json?key={api_key}&address={postal_code}&language=de&region=de&components=country:DE&sensor=false
Since you specify that the problem is not a given address but a seemingly "random" behavior, this may fall under a documented behavior of other "famous" API.
As for other cases, the recommended strategy is Exponential backoff for the Geocoding API, which basically means that you have to retry after a certain delay.
In case the above link goes down or changes, I'm quoting the article:
Exponential Backoff
In rare cases something may go wrong serving your request; you may receive a 4XX or 5XX HTTP response code, or the TCP connection may simply fail somewhere between your client and Google's server. Often it is worthwhile re-trying the request as the followup request may succeed when the original failed. However, it is important not to simply loop repeatedly making requests to Google's servers. This looping behavior can overload the network between your client and Google causing problems for many parties.
A better approach is to retry with increasing delays between attempts. Usually the delay is increased by a multiplicative factor with each attempt, an approach known as Exponential Backoff.
For example, consider an application that wishes to make this request to the Google Maps Time Zone API:
https://maps.googleapis.com/maps/api/timezone/json?location=39.6034810,-119.6822510&timestamp=1331161200&key=YOUR_API_KEY
The following Python example shows how to make the request with exponential backoff:
import json
import time
import urllib
import urllib2
def timezone(lat, lng, timestamp):
# The maps_key defined below isn't a valid Google Maps API key.
# You need to get your own API key.
# See https://developers.google.com/maps/documentation/timezone/get-api-key
maps_key = 'YOUR_KEY_HERE'
timezone_base_url = 'https://maps.googleapis.com/maps/api/timezone/json'
# This joins the parts of the URL together into one string.
url = timezone_base_url + '?' + urllib.urlencode({
'location': "%s,%s" % (lat, lng),
'timestamp': timestamp,
'key': maps_key,
})
current_delay = 0.1 # Set the initial retry delay to 100ms.
max_delay = 3600 # Set the maximum retry delay to 1 hour.
while True:
try:
# Get the API response.
response = str(urllib2.urlopen(url).read())
except IOError:
pass # Fall through to the retry loop.
else:
# If we didn't get an IOError then parse the result.
result = json.loads(response.replace('\\n', ''))
if result['status'] == 'OK':
return result['timeZoneId']
elif result['status'] != 'UNKNOWN_ERROR':
# Many API errors cannot be fixed by a retry, e.g. INVALID_REQUEST or
# ZERO_RESULTS. There is no point retrying these requests.
raise Exception(result['error_message'])
if current_delay > max_delay:
raise Exception('Too many retry attempts.')
print 'Waiting', current_delay, 'seconds before retrying.'
time.sleep(current_delay)
current_delay *= 2 # Increase the delay each time we retry.
tz = timezone(39.6034810, -119.6822510, 1331161200)
print 'Timezone:', tz
Of course this will not resolve the "false responses" you mention; I suspect that depends on data quality and does not happen randomly.

How to get the historical data from bitfinex.com with out a limit?

I am drawing a chart using the data pulled from bitfinex.com via a simple API query. As the result, i will need to render a chart which is going to show the historical data of BTCUSD for the past two years.
Docs are available right here: https://bitfinex.readme.io/v2/reference#rest-public-candles
Everything works fine except the limit of the retrieved data.
This is my request:
https://api.bitfinex.com/v2/candles/trade:1h:tBTCUSD/hist?start=1514764800000&sort=1
The result can be seen over here or you can copy the request to the browser: https://docs.google.com/document/d/1sG11Ro0X21_UFgUtdqrlitcCchoSh30NzGCgAe6M0u0/edit?usp=sharing
The problem is that I receive candles for only 5 days no matter what dates or parameters I use. I can get more candles if i add the limit parameter to the string. But still, I can not get more than 1100-1000 candles. I even get the 500 error from the server:
Server error: GET https://api.bitfinex.com/v2/candles/trade:1h:tBTCUSD/hist?limit=1100&start=1512086400000&end=1516233600000&sort=1 resulted in a 500 Internal Server Error response:\n ["error",10020,"limit: invalid"]. What should be the valid limit? There is no such information in the docs.
The author of this topic has the same question but no solutions are given. The last answer does not make big changes: Bitfinex data api
How can I get the desired amount of data for the two years period of time? I do not want to break my query down into smaller pieces and go step by step. It will look ugly.
From the looks of it the limit is set to 1000. If you need more then 1000 historical entries you could parse the last timestamp of the response and create another request till you reach the desired end time.
Keep in mind that you can only do 10-90 requests peer minute. So it's smart to make some kind of sleeping mechanism on every request for 6 seconds or something like that.
import json
import time
import requests
start = 1512086400000
end = 1516233600000
timestamp = start
last_timestamp = None
url = 'https://api.bitfinex.com/v2/trades/tBTCUSD/hist/'
historical_data = []
while timestamp <= end and timestamp != last_timestamp:
print("Requesting "+str(timestamp))
params = {'start': timestamp, 'limit': 1000, 'sort': 1}
response = requests.get(url, params=params)
trades = json.loads(response.content)
historical_data.extend(trades)
last_timestamp = timestamp
id, timestamp, amount, price = trades[-1]

Podio Create Item rate limit after 25 calls

I have to create items in podio using the api. When i let my program go full speed i noticed that after 5 - 6 items I get an error response from podio saying:
{
"error_propagate":false,
"error":"rate_limit",
"error_description":"You have hit the rate limit. Please wait 300 seconds before trying again",
"request":{
"url":"http://api.podio.com/oauth/token",
"query_string":"",
"method":"POST"
}
}
I tought the rate limit was 5000 calls/H and I get this error after 25 calls...
I added a thread.sleep in my code, and now it seems to be better, but even when I let the thread sleep for 10s I still get this error, I have now set the thread.sleep to 20 sec and it seems to work.
Is there a hidden rate limit to the number off calls per second ?
I think you are using Username password authentication here. The token request endpoint have lower limit from my experience. So the best way to solve this is to store and reuse the access tokens, instead of re-authenticating every time your program runs.
Podio API client libraries provide convenience methods to do this. See this links:
http://podio.github.io/podio-dotnet/sessions/
http://podio.github.io/podio-php/sessions
The rate limit is 1000 calls/H. so you can put sleep accordingly.

Bigquery Api Java client intermittently returning bad results

I am executing some long running quires using the big-query java client.
I construct a big-query job and execute like this
val queryRequest = new QueryRequest().setQuery(query)
val queryJob = client.jobs().query(ProjectId, queryRequest)
queryJob.execute()
The problem I am facing is the for the same query, the client returns before the job is complete i.e. the number of rows in result is zero.
I tried printing the response and it shows
{"jobComplete":false,"jobReference":{"jobId":"job_bTLRGrw5_xR26i9Li3a9EQvuA6c","projectId":"analytics-production"},"kind":"bigquery#queryResponse"}
From that I can see that the job is not complete. The why did the client return before the job is complete ?
While building the client, I use the HttpRequestInitializer and in the initialize method I provide the timeout parameters.
override def initialize(request: HttpRequest): Unit = {
request.setConnectTimeout(...)
request.setReadTimeout(...)
}
Tried giving high values for timeout like 240 seconds etc..but no luck. The behavior is still the same. It fails intermitently.
Make sure you set the timeout on the Bigquery request body, and not the HTTP object.
val queryRequest = new QueryRequest().setQuery(query).setTimeoutMs(10000) //10 seconds
The param is timeoutMs. This is documented here: https://cloud.google.com/bigquery/docs/reference/v2/jobs/query
Please also read the docs regarding this field: How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with the 'jobComplete' flag set to false. You can call GetQueryResults() to wait for the query to complete and read the results. The default value is 10000 milliseconds (10 seconds).
More about Synchronous queries here
https://cloud.google.com/bigquery/querying-data#syncqueries

Read Timed Out : sychronous query via Bigquery java API

We are using the big query JAVA API to retrieve results for our analytics reporting frontend. We are trying to retrieve the results synchronously. A lot of times we get Read timed out error, even before the query timeout as specified in the parameters. Here's the stack trace for a sample fail:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:830)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:787)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:460)
I am not able to retrieve the job id of the resulting job as the error occurs before I can retrieve a JobReference object. The timeout specified in this case was 300 sec. The query failed well before it. The query contains three JOIN's and several GROUP EACH BY clauses. Can you suggest us a possible way to debug this ?
Adding the code snippet:
QueryRequest queryInfo = new QueryRequest().setQuery(sql)
.setTimeoutMs(timeOutInSec * 1000);
// get project id
BQGameConnectionDetails details = Config
.getBQConnectionDetails(gameId);
String projectId = details.getProjectId();
Bigquery.Jobs.Query queryRequest = getInstance(gameId).jobs()
.query(projectId, queryInfo);
QueryResponse response = queryRequest.execute();
There are two timeouts involved. The first timeout is in the HTTP request you've sent to bigquery. The second is in the bigquery request timeout. It sounds like you've set the latter to a large value, but the former is likely the timeout that you're hitting. If the HTTP request times out before the BigQuery timeout, the connection will be closed and BigQuery won't have a chance to respond.
There are two options: First is to increase the HTTP request timeout (which depends on the libraries you're using, but this page here may be helpful). The second is to decrease the bigquery timeout. This means you'll have to use jobs.getQueryResults() to read the actual results, but this is a more robust method because it doesn't matter how long the query takes, you can just call getQueryResults() in a loop. I would post a link to a good java sample that does this, but I don't know that one exists, unfortunately.