Scraping Twitter Data - api

I tried to scrape Twitter Data with an academic twitter API. The code worked for almost every case, even though there are a few cases, where it doesn't.
I used the code tweets <- get_all_tweets("#Honda lang: en -is:retweet", "2018-08-06T00:00:00Z", "2018-08-26T00:00:00Z", n = Inf)
but after scraping 4 pages of tweets, the following error occurred:
Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, :
Too many errors.
In addition: Warning messages:
1: Recommended to specify a data path in order to mitigate data loss when ingesting large amounts of data.
2: Tweets will not be stored as JSONs or as a .rds file and will only be available in local memory if assigned to an object.
I actually don't get what the problem is because the code works even for cases with more than 35.000 Tweets. Therefore, I don't think the number of tweets is the reason.
`

Related

How to get user online status throught browser (URL needed)

I used the below link.
https://api.roblox.com/users/**$UserID**/onlinestatus
for example:
https://api.roblox.com/users/543226965/onlinestatus
I have been receiving an error message from last time. The error message is given below.
{"errors":[{"code":404,"message":"NotFound"}]}
I heard the roblox API have been changed, but I can not find the right solutions, so I will be grateful for any answer.
Thanks.
It appears that you're trying to access the user's online status. As of right now, you can access this information by querying https://api.roblox.com/users/<user_id>, which returns a JSON object. Looking up IsOnline in the dictionary should return what you're trying to get
Here's an example I coded in Python:
import requests
res = requests.get(url='https://api.roblox.com/users/543226965')
res = res.json()
print(res['IsOnline'])
>>> True/False

amadeus.shopping.flight_dates.get(origin='JTR', destination='SFO', ...) fails

response = amadeus.shopping.flight_dates.get(origin='JTR', destination='SFO', oneWay='true', departureDate='2019-05-01,2019-06-01', nonStop=False)
This returns an error.
*** amadeus.client.errors.ServerError: [500]
This is not an auth error or some other parameter error, since the exact same code with different airport codes works.
response = amadeus.shopping.flight_dates.get(origin='NYC', destination='SFO', oneWay='true', departureDate='2019-05-01,2019-06-01', nonStop=False)
The client is using a production key and has hostname set to production.
Client(client_id=get_api_key(), client_secret=get_api_secret(), hostname='production')
The Flight Cheapest Date Search API is built on top of a pre-computed cache, it doesn't contain all the origins and destinations. The Flight Low-fare Search will provide you a coverage of (almost) all airports in the world.
We will update soon the data coverage we have for this API to drastically improve the list of origins and destinations.
That's why:
response = amadeus.shopping.flight_dates.get(origin='JTR', destination='SFO', oneWay='true', departureDate='2019-05-01,2019-06-01', nonStop=False)
Doesn't return any data.
For:
response = amadeus.shopping.flight_dates.get(origin='NYC', destination='SFO', oneWay='true', departureDate='2019-05-01,2019-06-01', nonStop=False)
It works, in production it returns a list of flight-dates.

How can i know that my Youtube API Data is correct?

I having some trouble to understand something related to the API of youtube
So my code is basically very simple:
name = input("enter the username: ")
key = "MY API KEY"
data = urllib.request.urlopen("https://www.googleapis.com/youtube/v3/channels?
part=statistics&forUsername="+name+"&key="+key).read()
subs = json.loads(data)["items"][0]["statistics"]["subscriberCount"]
print(name + " has " + "{:,d}".format(int(subs)) + " subscribers!🎉")
just yelling the number of subscribers after giving specific YouTube Username:
The thing is that some Usernames(for example: Vsuase/Veritasium/Unbox Therapy ) which have many subs and the API-URL giving me wrong Data
Vsause - in return giving me back 72 subs
Veritasium/Unbox Therapy - not giving my any number at all
BUT, a channel "Computerphile" giving me that exact same subs they have.
How come that few Usernames work and few do not??
I tested in both, using the try-it functionality available in the YouTube Data API - Official Documentation and in the Google API Explorer and in both sites the results are closely1 similar.
For example, when the statistics of the YouTube user vsauce is requested vía YouTube API, the value in subscriberCount is 14220819 and checking his YouTube channel it says: 14,220,825.
Here is the example for request the statistics of the YouTube user vsauce (using the try-it)
And here is the demo for request the statistics of the YouTube user vsauce (using the Google API Explorer).
I didn't see any differences in the values in subscriberCount by requesting the other channels you mentioned in your question.
1 You need consider that some channels has more changes in the quantity of subscribers than others and such results vary too in the responses of the API.
For some reason, if you change in the URL from forUsernae= --> id=
it gives you the correct numbers.
TED channel:
https://www.googleapis.com/youtube/v3/channels?part=statistics&id=UCAuUUnT6oDeKwE6v1NGQxug&key=AIzaSyDjnINqqxQlIg4kbXoPDVYOhHNfdmDbb9I

Oauth2client error when trying Apache Beams example code

I am trying to run the Apache Beam Python examples (for example https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/user_score.py ). When I run the script I get
(py27) XXX-MBP:my_path$ python user_score.py --output /my_path/user_score_output
No handlers could be found for logger "oauth2client.contrib.multistore_file"
INFO:root:Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
/my_path_2/miniconda3/envs/py27/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py:121: DeprecationWarning: object() takes no parameters
super(GcsIO, cls).__new__(cls, storage_client))
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.453105211258 seconds
/my_path_2/miniconda3/envs/py27/lib/python2.7/site-packages/apache_beam/coders/typecoders.py:135: UserWarning: Using fallback coder for typehint: Any.
warnings.warn('Using fallback coder for typehint: %r.' % typehint)
/my_path_2/miniconda3/envs/py27/lib/python2.7/site-packages/apache_beam/coders/typecoders.py:135: UserWarning: Using fallback coder for typehint: <type 'NoneType'>.
warnings.warn('Using fallback coder for typehint: %r.' % typehint)
INFO:root:Running pipeline with DirectRunner.
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Finished the size estimation of the input at 2 files. Estimation took 0.477814912796 seconds
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
The problem seems to be with Oauth2client and getting an access token. I googled a bit and people are talking about setting consent='prompt' but it is in the domain of asking users for consent. Here I am just downloading a dataset. Is it access to my Google Cloud service that might be struggling? Other programs can write programs to Google Cloud Storage. I am looking at https://developers.google.com/identity/protocols/OAuth2 but I don't really understand what I am supposed to be doing ...

Batch Request to Google API making calls before the batch executes in Python?

I am working on sending a large batch request to the Google API through Admin SDK which will add members to certain groups based upon the groups in our in-house servers(a realignment script). I am using Python and to access the Google API I am using the [apiclient library][1]. When I create my service and batch object, the creation of the service object requests a URL.
batch_count = 0
batch = BatchHttpRequest()
service = build('admin', 'directory_v1')
logs
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/discovery/v1/apis/admin/directory_v1/rest
which makes sense as the JSON object returned by that HTTP call is used to build the service object.
Now I want to add multiple requests into the batch object, so I do this;
for email in add_user_list:
if batch_count != 999:
add_body = dict()
add_body[u'email'] = email.lower()
for n in range(0, 5):
try:
batch.add(service.members().insert(groupKey=groupkey, body=add_body), callback=batch_callback)
batch_count += 1
break
except HttpError, error:
logging_message('Quota exceeded, waiting to retry...')
time.sleep((2 ** n)
continue
Every time it iterates through the outermost for loop it logs (group address redacted)
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/admin/directory/v1/groups/GROUPADDRESS/members?alt=json
Isn't the point of the batch to not send out any calls to the API until the batch request has been populated with all the individual requests? Why does it make the call to the API shown above every time? Then when I batch.execute() it carries out all the requests (which is the only area I thought would be making calls to the API?)
For a second I thought that the call was to construct the object that is returned by .members() but I tried these changes:
service = build('admin', 'directory_v1').members()
batch.add(service.insert(groupKey=groupkey, body=add_body), callback=batch_callback)
and got the same result still. The reason this is an issue is because it is doubling the number of requests to the API and I have real quota concerns at the scale this is running at.