HTTP error 500 when requesting google big query API using service account - google-bigquery

I have been using Big query to generate reports through a web service for a year now, however in the past month or so I have noticed HTTP 500 errors in response to most of my query requests even though no changes have been made to the web service. In my current setup I make 5 simultaneous queries and often 4 out of the 5 queries fail with 500 error. At times all 5 queries are returned but in recent times this rarely happens rendering my application almost unusable.
I use server to server authentication using my service account token and my big query client app is closely modeled on the example given here -
https://developers.google.com/bigquery/articles/dashboard#class
Here is the full error message -
HttpError: https://www.googleapis.com/bigquery/v2/projects/1021946877460/queries?alt=json returned "Unexpected. Please try again.">
Snippet of my bigquery client -
def generateToken():
"""
generates OAuth2.0 token/credentials for login to google big query
"""
credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
KEY,
"https://www.googleapis.com/auth/bigquery")
return credentials
class BigQueryClient(object):
def authenticate(self, credentials):
http = httplib2.Http(proxy_info = httplib2.ProxyInfo(
socks.PROXY_TYPE_HTTP,
PROXY_IP,
PROXY_PORT))
http = credentials.authorize(http)
return http
def __init__(self, credentials, project):
http = self.authenticate(credentials)
self.service = build('bigquery', 'v2', http=http)
Please let me know if I am doing something incorrectly here or if anything has changed on the bigquery backend such as limits to the number of query requests allowed over a certain period of time.
Thanks.

500s are always BigQuery bugs. I believe I've tracked down one of your errors in the BigQuery server logs, and am investigating.

Related

Can application in public cloud be authorized to fetch data from government tenant via graph api?

I'm trying to fetch email list from government tenant via graph api and it worked fine until last week. I'm using client credentials flow. Last week i started to get the following error when trying to authorize my app in government tenants:
oauthlib.oauth2.rfc6749.errors.InvalidClientIdError: (invalid_request) AADSTS900441: Requests to applications hosted in the public cloud are not supported for USGov tenants.
Is there a way to authorize application from public azure cloud to read data from government tenant?
EDIT: code example and debug logs
from oauthlib.oauth2 import BackendApplicationClient
client = BackendApplicationClient(client_id=config.CLIENT_ID)
MSGRAPH = requests_oauthlib.OAuth2Session(
client=client
)
token = MSGRAPH.fetch_token(
'https://login.microsoftonline.us' + '/<tenant>' + config.TOKEN_ENDPOINT,
client_id=config.CLIENT_ID,
client_secret=config.CLIENT_SECRET,
include_client_id=True,
scope=['https://graph.microsoft.us/.default'])
endpoint = config.RESOURCE + config.API_VERSION + '/users'
graphdata = MSGRAPH.get(endpoint).json()
DEBUG:requests_oauthlib.oauth2_session:Requesting url https://login.microsoftonline.us/<tenant-id>/oauth2/v2.0/token using method POST.
DEBUG:requests_oauthlib.oauth2_session:Supplying headers {u'Content-Type': u'application/x-www-form-urlencoded;charset=UTF-8', u'Accept': u'application/json'} and data {u'client_secret': u'...', u'grant_type': u'client_credentials', u'client_id': u'...', u'scope': u'https://graph.microsoft.us/.default'}
DEBUG:requests_oauthlib.oauth2_session:Passing through key word arguments {'verify': True, 'json': None, 'proxies': None, 'timeout': None, 'auth': None}.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): login.microsoftonline.us:443
DEBUG:urllib3.connectionpool:https://login.microsoftonline.us:443 "POST /<tenant-id>/oauth2/v2.0/token HTTP/1.1" 400 522
DEBUG:requests_oauthlib.oauth2_session:Prepared fetch token request body grant_type=client_credentials&client_id=...&client_secret=...&scope=https%3A%2F%2Fgraph.microsoft.us%2F.default
DEBUG:requests_oauthlib.oauth2_session:Request to fetch token completed with status 400.
Basically i see this error when i'm trying to fetch access token. Adminconsent was already given to my application by tenant admin.
This code worked for Gov tenants for month or so and suddenly stopped to work.
AAD started enforcing this about a month ago, GCC High/DoD tenants cannot use confidential apps published in commercial cloud. You need to publish your app from a GCC High/DoD tenant.

Http status code when data not found in Database

I'm trying to understand which Http Status Code to use in the following use case
The user tries to do a GET on an endpoint with an input ID.
The requested data is not available in the database.
Should the service send back:
404 - Not Found
As the data is NOT FOUND in the database
400 - Bad Request
As the data in the input request is not valid or present in the db
200 - OK with null response
200 - OK with an error message
In this case we can use a standard error message, with a contract that spans across all the 200 OK responses (like below).
BaseResponse {
Errors [{
Message: "Data Not Found"
}],
Response: null
}
Which is the right (or standard) approach to follow?
Thanks in advance.
Which is the right (or standard) approach to follow?
If you are following the REST API Architecture, you should follow these guidelines:
400 The request could not be understood by the server due to incorrect syntax. The client SHOULD NOT repeat the request without modifications.
It means that you received a bad request data, like an ID in alphanumeric format when you want only numeric IDs. Typically it refers to bad input formats or security checks (like an input array with a maxLength)
404 The server can not find the requested resource.
The ID format is valid and you can't find the resource in the data source.
If you don't follow any standard architecture, you should define how you want to manage these cases and share your thought with the team and customers.
In many legacy applications, an HTTP status 200 with errors field is very common since very-old clients were not so good to manage errors.

how to figure out how to authenticate myself using http requests

I am trying to log in to a site using requests as follows:
s = requests.Session()
login_data = {"userName":"username", "password":"pass", "loginPath":"/d2l/login"}
resp = requests.post("https://d2l.pima.edu/d2l/login?login=1", login_data)
although I am getting a 200 response, when I say
print(resp.content)
b"<!DOCTYPE html><html><head><meta charset='utf-8' /><script>var hash = window.location.hash;if( hash ) hash = '%23' + hash.substring( 1 );window.location.replace('/d2l/login?sessionExpired=0&target=%2fd2l%2ferror%2f404%2flog%3ftargetUrl%3dhttp%253A%252F%252Fd2l.pima.edu%253A80%252Fd2l%252Flogin%253Flogin%253D1' + hash );</script><title></title></head><body></body></html>"
notice it says session expired.
What I've tried:
logging back out and in in the actual browser, no success.
http basic auth, no success.
I'm thinking maybe I need to authenticate myself to this site using cookies?
If so how do I determine which cookies to send it?
I tried figuring this out by saying
resp.cookies
Out[4]: <RequestsCookieJar[]>
shouldn't this be giving me names of cookies? I'm not sure what to do with such output.
Main Point: HOW DO I FIGURE OUT HOW TO AUTHENTICATE MYSLEF TO THIS WEBSITE?
Help is appreciated.
I would rather not use selenium.
From loading this page https://d2l.pima.edu/d2l/login and viewing its source, you'll notice the POST target path is /d2l/lp/auth/login/login.d2l. Try using that as your POST path. Your other fields look consistent with the form's expectations.
Note: with python requests if you create a session object use it to make your requests:
resp = s.post(<blah blah>, login_data)
The session will hold any cookies set by the login server, and you can continue to use the s object to make requests in the authenticated session.

API of Polarion ALM occasionally does not authorize any request

I have wrote some Python code that logs in and reads some data from Polarion ALM server via API (more informarion about Polarion API: https://almdemo.polarion.com/polarion/sdk/index.html). In my code I have used zeep Python package to handle SOAP.
My algorithm is simple:
1) Log in via logIn web service (https://almdemo.polarion.com/polarion/sdk/doc/javadoc/com/polarion/alm/ws/client/session/SessionWebService.html#logIn-java.lang.String-java.lang.String-)
2) Add current session to header - so the current session remain alive.
3) Try to read some data, for example via getRootProjectGroup web service (https://almdemo.polarion.com/polarion/sdk/doc/javadoc/com/polarion/alm/ws/client/projects/ProjectWebService.html#getRootProjectGroup--).
4) Regardless of what is happening I close the current session via endSession web service (https://almdemo.polarion.com/polarion/sdk/doc/javadoc/com/polarion/alm/ws/client/session/SessionWebService.html#endSession--).
What I observed:
Ocassionally, at point 3 I receive response with Authorization Error (snippet with response):
<soapenv:Fault>\n <faultcode>soapenv:Server.generalException</faultcode>\n <faultstring>Not authorized.</faultstring>\n <detail>\n <ns1:stackTrace xmlns:ns1="http://xml.apache.org/axis/">Not authorized.\n\tat com.polarion.alm.ws.providers.DoAsUserWrapper.invoke(DoAsUserWrapper.java:37)\n\tat org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)\n\t..
or everything is good and I receive:
{
'groupURIs': {
'SubterraURI': [
'subterra:data-service:objects:/default/${ProjectGroup}Group'
]
},
'location': None,
'name': 'ROOT_CTX_NAME',
'parentURI': None,
'projectIDs': None,
'uri': 'subterra:data-service:objects:${ProjectGroup}Group',
'unresolvable': False
}
What surprises me the most:
- I always uses the same credential (username and password)
- the session ID of the in request (point 3) is the same as in the server response during log in (point 1) so the session shall remain alive
- if I put my code in the loop (for example 1000 executions), the result for all attempts is always the same (1000 successes or 1000 failures), even if I add a wait (e.g. 1s) between the attemps
I would like to know why server rejects some of the requests. Is it some kind of Polarion server issue? How could I make a work around to somehow connect with the server and be able to read some data from the server even if it reject my first request.
It appears that it is issue with SOAP client (and relatively popular one). To fix it, I have turned off TLS verification. More details in:
https://python-zeep.readthedocs.io/en/master/transport.html

Batch Request to Google API making calls before the batch executes in Python?

I am working on sending a large batch request to the Google API through Admin SDK which will add members to certain groups based upon the groups in our in-house servers(a realignment script). I am using Python and to access the Google API I am using the [apiclient library][1]. When I create my service and batch object, the creation of the service object requests a URL.
batch_count = 0
batch = BatchHttpRequest()
service = build('admin', 'directory_v1')
logs
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/discovery/v1/apis/admin/directory_v1/rest
which makes sense as the JSON object returned by that HTTP call is used to build the service object.
Now I want to add multiple requests into the batch object, so I do this;
for email in add_user_list:
if batch_count != 999:
add_body = dict()
add_body[u'email'] = email.lower()
for n in range(0, 5):
try:
batch.add(service.members().insert(groupKey=groupkey, body=add_body), callback=batch_callback)
batch_count += 1
break
except HttpError, error:
logging_message('Quota exceeded, waiting to retry...')
time.sleep((2 ** n)
continue
Every time it iterates through the outermost for loop it logs (group address redacted)
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/admin/directory/v1/groups/GROUPADDRESS/members?alt=json
Isn't the point of the batch to not send out any calls to the API until the batch request has been populated with all the individual requests? Why does it make the call to the API shown above every time? Then when I batch.execute() it carries out all the requests (which is the only area I thought would be making calls to the API?)
For a second I thought that the call was to construct the object that is returned by .members() but I tried these changes:
service = build('admin', 'directory_v1').members()
batch.add(service.insert(groupKey=groupkey, body=add_body), callback=batch_callback)
and got the same result still. The reason this is an issue is because it is doubling the number of requests to the API and I have real quota concerns at the scale this is running at.