Using conditional while loop until getting all API requests a result python - api

I am using pandarallel and requests to get results from a website but its responses to my API doesn't cover my whole request. So I want to do a while loop that takes unanswered rows nder to_doDf and sends requests to API for to_doDf while deleting rows from to_doDf and as I get responses from API. The basic logic is that while there are empty rows under results columns unless length of it is zero send a request to API until it receives an answer. But I can't write the code to delete the rows of answered API's.
doneDf = pd.DataFrame()
to_doDf = macmapDf_concat[macmapDf_concat['results'].isna()]
while len(to_doDf) != 0:
doneDf['results'] = to_doDf.parallel_apply(lambda x:
requests.get(f'https://www.macmap.org/api/results/customduties?reporter=
{x["destination"]}&partner={x["source"]}&product={x["hs10"]}').json(),
axis=1)
doneDf

Related

Scraping Twitter Data

I tried to scrape Twitter Data with an academic twitter API. The code worked for almost every case, even though there are a few cases, where it doesn't.
I used the code tweets <- get_all_tweets("#Honda lang: en -is:retweet", "2018-08-06T00:00:00Z", "2018-08-26T00:00:00Z", n = Inf)
but after scraping 4 pages of tweets, the following error occurred:
Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, :
Too many errors.
In addition: Warning messages:
1: Recommended to specify a data path in order to mitigate data loss when ingesting large amounts of data.
2: Tweets will not be stored as JSONs or as a .rds file and will only be available in local memory if assigned to an object.
I actually don't get what the problem is because the code works even for cases with more than 35.000 Tweets. Therefore, I don't think the number of tweets is the reason.
`

TIKTOK API - Requesting more responses from hashtag API class

I am running the following code to retrieve account info by hashtag from the unofficial TikTok Api.
API Repository - https://github.com/davidteather/TikTok-Api
Class Definition - https://dteather.com/TikTok-Api/docs/TikTokApi/tiktok.html#TikTokApi.hashtag
However it seems the maximum number of responses I can get on any hashtag is 500 or so.
Is there a way I can request more? Say...10K lines of account info?
from TikTokApi import TikTokApi
import pandas as pd
hashtag = "ugc"
count = 50000
with TikTokApi() as api:
tag = api.hashtag(name=hashtag)
print(tag.info())
lst = []
for video in tag.videos(count=count):
lst.append(video.author.as_dict)
df = pd.DataFrame(lst)
print(df)
The above code for hashtag "ugc" produces only 482 results. Whereas I know there is significantly more results available from TikTok.
It’s ratelimit, TikTok will block you after a certain amount of requests.
Add proxy support like this:
proxies = open('proxies.txt', 'r').read().splitlines()
proxy = random.choice(proxies)
proxies = {'http': f'http://{proxy}', 'https': f'https://{proxy}'}
Could be more precise but it's a way to rotate them

Coinbase Pro - Get Account Hold Pagination Requests

With reference to https://docs.pro.coinbase.com/#get-account-history
HTTP REQUEST
GET /accounts//holds
I am struggling to produce python code to get the account holds via API pagination request and I could not find any example of implementing it.
Could you please advise me on how to proceed with this one?
According to Coinbase Pro documentation, pagination works like so (example with the holds endpoint):
import requests
account_id = ...
url = f'https://api.pro.coinbase.com/accounts/{account_id}/holds'
response = requests.get(url)
assert response.ok
first_page = response.json() # here is the first page
cursor = response.headers['CB-AFTER']
response = requests.get(url, params={'after': cursor})
assert response.ok
second_page = response.json() # then the next one
cursor = response.headers['CB-AFTER']
# and so on, repeat until response.json() is an empty list
You should properly wrap this into helper functions or class, or, even better, use an existing library and save dramatic time.

How to use data in csv file row wise to be used per request via newman?

I have bunch of requests in my postman collection for example :-
Request 1
Request 2
...
Request N
For each of these requests , I want to pass a client id for which is unique per request. I have created a data file with those client ids. So the data in CSV file is as follows : -
Client Id
1
2
..
N
My requirement is to use Client ID 1 in Request 1 , Client ID 2 in Request 2 instead of iterating Client ID 1 though the entire collection.
So basically data in CSV file to be used row wise in all the requests.
Would really appreciate suggestions on how this can be achieved.
I tried using Runner but it doesn't fit my requirement
Maybe it would be easier not to use .csv file here, but Postman Environment Variables.
If you're having the number of ClientIDs matches the number of request, you can do something like this:
In the Pre-Request Script of first request you have to initiate an array of clientIDs:
const clientIdArr = [1,2,3,4,5,6,7,8,9,10];
pm.environment.set('clientIdArr', clientIdArr);
Then we will shift the first value of array of clientID in every subsequent Postman Collection request:
const currentArr = pm.environment.get('clientIdArr');
const currentValue = currentArr.shift();
pm.environment.set('clientIdArr', currentArr);
pm.environment.set('currentClientId', currentValue);
Then you can use {{currentClientId}} environment variable in your actual request and exectute the Postman Collection via Collection Runner.
For more details how Array.prototype.shift() works please refer to the following link.
If you have a large amount of requests in your Postman Collection you might consider having those scripts as Postman Global Functions.

Batch Request to Google API making calls before the batch executes in Python?

I am working on sending a large batch request to the Google API through Admin SDK which will add members to certain groups based upon the groups in our in-house servers(a realignment script). I am using Python and to access the Google API I am using the [apiclient library][1]. When I create my service and batch object, the creation of the service object requests a URL.
batch_count = 0
batch = BatchHttpRequest()
service = build('admin', 'directory_v1')
logs
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/discovery/v1/apis/admin/directory_v1/rest
which makes sense as the JSON object returned by that HTTP call is used to build the service object.
Now I want to add multiple requests into the batch object, so I do this;
for email in add_user_list:
if batch_count != 999:
add_body = dict()
add_body[u'email'] = email.lower()
for n in range(0, 5):
try:
batch.add(service.members().insert(groupKey=groupkey, body=add_body), callback=batch_callback)
batch_count += 1
break
except HttpError, error:
logging_message('Quota exceeded, waiting to retry...')
time.sleep((2 ** n)
continue
Every time it iterates through the outermost for loop it logs (group address redacted)
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/admin/directory/v1/groups/GROUPADDRESS/members?alt=json
Isn't the point of the batch to not send out any calls to the API until the batch request has been populated with all the individual requests? Why does it make the call to the API shown above every time? Then when I batch.execute() it carries out all the requests (which is the only area I thought would be making calls to the API?)
For a second I thought that the call was to construct the object that is returned by .members() but I tried these changes:
service = build('admin', 'directory_v1').members()
batch.add(service.insert(groupKey=groupkey, body=add_body), callback=batch_callback)
and got the same result still. The reason this is an issue is because it is doubling the number of requests to the API and I have real quota concerns at the scale this is running at.