Batch Request to Google API making calls before the batch executes in Python? - api

I am working on sending a large batch request to the Google API through Admin SDK which will add members to certain groups based upon the groups in our in-house servers(a realignment script). I am using Python and to access the Google API I am using the [apiclient library][1]. When I create my service and batch object, the creation of the service object requests a URL.
batch_count = 0
batch = BatchHttpRequest()
service = build('admin', 'directory_v1')
logs
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/discovery/v1/apis/admin/directory_v1/rest
which makes sense as the JSON object returned by that HTTP call is used to build the service object.
Now I want to add multiple requests into the batch object, so I do this;
for email in add_user_list:
if batch_count != 999:
add_body = dict()
add_body[u'email'] = email.lower()
for n in range(0, 5):
try:
batch.add(service.members().insert(groupKey=groupkey, body=add_body), callback=batch_callback)
batch_count += 1
break
except HttpError, error:
logging_message('Quota exceeded, waiting to retry...')
time.sleep((2 ** n)
continue
Every time it iterates through the outermost for loop it logs (group address redacted)
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/admin/directory/v1/groups/GROUPADDRESS/members?alt=json
Isn't the point of the batch to not send out any calls to the API until the batch request has been populated with all the individual requests? Why does it make the call to the API shown above every time? Then when I batch.execute() it carries out all the requests (which is the only area I thought would be making calls to the API?)
For a second I thought that the call was to construct the object that is returned by .members() but I tried these changes:
service = build('admin', 'directory_v1').members()
batch.add(service.insert(groupKey=groupkey, body=add_body), callback=batch_callback)
and got the same result still. The reason this is an issue is because it is doubling the number of requests to the API and I have real quota concerns at the scale this is running at.

Related

Using conditional while loop until getting all API requests a result python

I am using pandarallel and requests to get results from a website but its responses to my API doesn't cover my whole request. So I want to do a while loop that takes unanswered rows nder to_doDf and sends requests to API for to_doDf while deleting rows from to_doDf and as I get responses from API. The basic logic is that while there are empty rows under results columns unless length of it is zero send a request to API until it receives an answer. But I can't write the code to delete the rows of answered API's.
doneDf = pd.DataFrame()
to_doDf = macmapDf_concat[macmapDf_concat['results'].isna()]
while len(to_doDf) != 0:
doneDf['results'] = to_doDf.parallel_apply(lambda x:
requests.get(f'https://www.macmap.org/api/results/customduties?reporter=
{x["destination"]}&partner={x["source"]}&product={x["hs10"]}').json(),
axis=1)
doneDf

API of Polarion ALM occasionally does not authorize any request

I have wrote some Python code that logs in and reads some data from Polarion ALM server via API (more informarion about Polarion API: https://almdemo.polarion.com/polarion/sdk/index.html). In my code I have used zeep Python package to handle SOAP.
My algorithm is simple:
1) Log in via logIn web service (https://almdemo.polarion.com/polarion/sdk/doc/javadoc/com/polarion/alm/ws/client/session/SessionWebService.html#logIn-java.lang.String-java.lang.String-)
2) Add current session to header - so the current session remain alive.
3) Try to read some data, for example via getRootProjectGroup web service (https://almdemo.polarion.com/polarion/sdk/doc/javadoc/com/polarion/alm/ws/client/projects/ProjectWebService.html#getRootProjectGroup--).
4) Regardless of what is happening I close the current session via endSession web service (https://almdemo.polarion.com/polarion/sdk/doc/javadoc/com/polarion/alm/ws/client/session/SessionWebService.html#endSession--).
What I observed:
Ocassionally, at point 3 I receive response with Authorization Error (snippet with response):
<soapenv:Fault>\n <faultcode>soapenv:Server.generalException</faultcode>\n <faultstring>Not authorized.</faultstring>\n <detail>\n <ns1:stackTrace xmlns:ns1="http://xml.apache.org/axis/">Not authorized.\n\tat com.polarion.alm.ws.providers.DoAsUserWrapper.invoke(DoAsUserWrapper.java:37)\n\tat org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)\n\t..
or everything is good and I receive:
{
'groupURIs': {
'SubterraURI': [
'subterra:data-service:objects:/default/${ProjectGroup}Group'
]
},
'location': None,
'name': 'ROOT_CTX_NAME',
'parentURI': None,
'projectIDs': None,
'uri': 'subterra:data-service:objects:${ProjectGroup}Group',
'unresolvable': False
}
What surprises me the most:
- I always uses the same credential (username and password)
- the session ID of the in request (point 3) is the same as in the server response during log in (point 1) so the session shall remain alive
- if I put my code in the loop (for example 1000 executions), the result for all attempts is always the same (1000 successes or 1000 failures), even if I add a wait (e.g. 1s) between the attemps
I would like to know why server rejects some of the requests. Is it some kind of Polarion server issue? How could I make a work around to somehow connect with the server and be able to read some data from the server even if it reject my first request.
It appears that it is issue with SOAP client (and relatively popular one). To fix it, I have turned off TLS verification. More details in:
https://python-zeep.readthedocs.io/en/master/transport.html

Akka HTTP Source Streaming vs regular request handling

What is the advantage of using Source Streaming vs the regular way of handling requests? My understanding that in both cases
The TCP connection will be reused
Back-pressure will be applied between the client and the server
The only advantage of Source Streaming I can see is if there is a very large response and the client prefers to consume it in smaller chunks.
My use case is that I have a very long list of users (millions), and I need to call a service that performs some filtering on the users, and returns a subset.
Currently, on the server side I expose a batch API, and on the client, I just split the users into chunks of 1000, and make X batch calls in parallel using Akka HTTP Host API.
I am considering switching to HTTP streaming, but cannot quite figure out what would be the value
You are missing one other huge benefit: memory efficiency. By having a streamed pipeline, client/server/client, all parties safely process data without running the risk of blowing up the memory allocation. This is particularly useful on the server side, where you always have to assume the clients may do something malicious...
Client Request Creation
Suppose the ultimate source of your millions of users is a file. You can create a stream source from this file:
val userFilePath : java.nio.file.Path = ???
val userFileSource = akka.stream.scaladsl.FileIO(userFilePath)
This source can you be use to create your http request which will stream the users to the service:
import akka.http.scaladsl.model.HttpEntity.{Chunked, ChunkStreamPart}
import akka.http.scaladsl.model.{RequestEntity, ContentTypes, HttpRequest}
val httpRequest : HttpRequest =
HttpRequest(uri = "http://filterService.io",
entity = Chunked.fromData(ContentTypes.`text/plain(UTF-8)`, userFileSource))
This request will now stream the users to the service without consuming the entire file into memory. Only chunks of data will be buffered at a time, therefore, you can send a request with potentially an infinite number of users and your client will be fine.
Server Request Processing
Similarly, your server can be designed to accept a request with an entity that can potentially be of infinite length.
Your questions says the service will filter the users, assuming we have a filtering function:
val isValidUser : (String) => Boolean = ???
This can be used to filter the incoming request entity and create a response entity which will feed the response:
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.model.HttpResponse
import akka.http.scaladsl.model.HttpEntity.Chunked
val route = extractDataBytes { userSource =>
val responseSource : Source[ByteString, _] =
userSource
.map(_.utf8String)
.filter(isValidUser)
.map(ByteString.apply)
complete(HttpResponse(entity=Chunked.fromData(ContentTypes.`text/plain(UTF-8)`,
responseSource)))
}
Client Response Processing
The client can similarly process the filtered users without reading them all into memory. We can, for example, dispatch the request and send all of the valid users to the console:
import akka.http.scaladsl.Http
Http()
.singleRequest(httpRequest)
.map { response =>
response
.entity
.dataBytes
.map(_.utf8String)
.foreach(System.out.println)
}

Mule Salesforce Create

With Mule Salesforce Connector using sfdc:create, the document says we can send up to 200 records at a time (single round trip call). If that's the case, what benefit do we get using Mule Batch flow with Batch Commit and Salesforce (sfdc:create) within the Batch Commit?
Example create below.
<xml<sfdc:create type="Account">
<sfdc:objects>
<sfdc:object>
<Name>MuleSoft</Name>
<BillingStreet>I live here </BillingStreet>
<BillingCity>My City</BillingCity>
<BillingState>MA</BillingState>
<BillingPostalCode>32423</BillingPostalCode>
<BillingCountry>US</BillingCountry>
</sfdc:object>
.......200 such objects
</sfdc:objects>
</sfdc:create>
Please keep in mind that in Salesforce, the SOAP API Call Limit for a client application is up to 200 records in a single create() call. If a create request exceeds 200 objects, then the entire operation fails.
Please refer reference from Salesforce.

HTTP error 500 when requesting google big query API using service account

I have been using Big query to generate reports through a web service for a year now, however in the past month or so I have noticed HTTP 500 errors in response to most of my query requests even though no changes have been made to the web service. In my current setup I make 5 simultaneous queries and often 4 out of the 5 queries fail with 500 error. At times all 5 queries are returned but in recent times this rarely happens rendering my application almost unusable.
I use server to server authentication using my service account token and my big query client app is closely modeled on the example given here -
https://developers.google.com/bigquery/articles/dashboard#class
Here is the full error message -
HttpError: https://www.googleapis.com/bigquery/v2/projects/1021946877460/queries?alt=json returned "Unexpected. Please try again.">
Snippet of my bigquery client -
def generateToken():
"""
generates OAuth2.0 token/credentials for login to google big query
"""
credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
KEY,
"https://www.googleapis.com/auth/bigquery")
return credentials
class BigQueryClient(object):
def authenticate(self, credentials):
http = httplib2.Http(proxy_info = httplib2.ProxyInfo(
socks.PROXY_TYPE_HTTP,
PROXY_IP,
PROXY_PORT))
http = credentials.authorize(http)
return http
def __init__(self, credentials, project):
http = self.authenticate(credentials)
self.service = build('bigquery', 'v2', http=http)
Please let me know if I am doing something incorrectly here or if anything has changed on the bigquery backend such as limits to the number of query requests allowed over a certain period of time.
Thanks.
500s are always BigQuery bugs. I believe I've tracked down one of your errors in the BigQuery server logs, and am investigating.