I am new in twitter development. I am trying to download tweets of important news agency. I used the guidelines provided in http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively. to download the tweets. I know that twitter api has some limitations on the number of requests (180 req per 15 min) and each request can fetch at most 100 tweets. So I expect the following code to get 18K tweets when I run it for the first time. However, I can only get arround 3000 tweets for each news agency. For example nytimes 3234 tweets, cnn 3207.
I'll be thankful if you can take a look at my code and let me know the problem.
def get_tweets(api, username, sinceId):
max_id = -1L
maxTweets = 1000000 # Some arbitrary large number
tweetsPerReq = 100 # the max the API permits
tweetCount = 0
print "writing to {0}_tweets.txt".format(username)
with open("{0}_tweets.txt".format(username) , 'w') as f:
while tweetCount < maxTweets:
try:
if (max_id <= 0):
if (not sinceId):
new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq)
else:
new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq, since_id = sinceId)
else:
if (not sinceId):
new_tweets = api.user_timeline(screen_name = username, count= tweetsPerReq, max_id=str(max_id - 1))
else:
new_tweets = api.search(screen_name = username, count= tweetsPerReq, max_id=str(max_id - 1), since_id=sinceId)
if not new_tweets:
print "no new tweet"
break
#create array of tweet information: username, tweet id, date/time, text
for tweet in new_tweets:
f.write(jsonpickle.encode(tweet._json, unpicklable=False) +'\n')
tweetCount += len(new_tweets)
print("Downloaded {0} tweets".format(tweetCount))
max_id = new_tweets[-1].id
except tweepy.TweepError as e:
# Just exit if any error
print("some error : " + str(e))
break
print ("Downloaded {0} tweets, Saved to {1}_tweets.txt".format(tweetCount, username))
Those are the limitations imposed by the API.
If you read the documentation, you will see that it says
This method can only return up to 3,200 of a user’s most recent Tweets.
So, the answer is - normal API users cannot access that data.
Related
I am currently trying to implement a login to Shopify over the Storefront API via Multipass.
However, what it isn't clear to me from the Documentation on that Page, how the "created_at" Field is used. Since it states that this field should be filled with the current timestamp.
But what if the same users logs in a second time via Multipass, should it be filled with the timestamp of the second login.
Or should the original Multipass token be stored somewhere, and reused at a second login, instead of generating a new one?
Yes you need to set it always to the current time. I guess it stands for "token created at".
This is the code I use in Python:
class Multipass:
def __init__(self, secret):
key = SHA256.new(secret.encode('utf-8')).digest()
self.encryptionKey = key[0:16]
self.signatureKey = key[16:32]
def generate_token(self, customer_data_hash):
customer_data_hash['created_at'] = datetime.datetime.utcnow().isoformat()
cipher_text = self.encrypt(json.dumps(customer_data_hash))
return urlsafe_b64encode(cipher_text + self.sign(cipher_text))
def generate_url(self, customer_data_hash, url):
token = self.generate_token(customer_data_hash).decode('utf-8')
return '{0}/account/login/multipass/{1}'.format(url, token)
def encrypt(self, plain_text):
plain_text = self.pad(plain_text)
iv = get_random_bytes(AES.block_size)
cipher = AES.new(self.encryptionKey, AES.MODE_CBC, iv)
return iv + cipher.encrypt(plain_text.encode('utf-8'))
def sign(self, secret):
return HMAC.new(self.signatureKey, secret, SHA256).digest()
#staticmethod
def pad(s):
return s + (AES.block_size - len(s) % AES.block_size) * chr(AES.block_size - len(s) % AES.block_size)
And so
...
customer_object = {
**user,# customer data
"verified_email": True
}
multipass = Multipass(multipass_secret)
return multipass.generate_url(customer_object, environment["url"])
How can someone login a second time? If they are already logged in, they would not essentially be able to re-login without logging out. If they logged out, the multi-pass would assign a new timestamp. When would this flow occur of a user logging in a second time and not being issued a brand new login? How would they do this?
I want to get user's nickname and after that to get user's screenshot. Than all the data will be send to Google Sheet. But I have the problem. When multiple users(at least 2) are using bot at the same time, their data are mixed up. For example first user's nickname is second user's nickname or first user's id is second user's id. Can I make a unique user session to store data in unique user class. It means for each user it will be created their own user class. Here is the code:
#bot.message_handler(commands=['start'])
def send_hello(message):
bot.send_message(message.chat.id, "Hi! Let`s start verification.")
msg = bot.send_message(message.chat.id, "Enter your nickname:")
bot.register_next_step_handler(msg, process_nickname)
def process_nickname(message):
user.name=message.text
user.id=message.from_user.id
msg = bot.send_message(message.chat.id, 'Super! Now send screenshot:')
bot.register_next_step_handler(msg, process_screenshot)
def process_screenshot(message):
fileID = message.photo[-1].file_id
file = bot.get_file(fileID)
file_path=file.file_path
metadata = {
'name': user.name,
'parents':[folder_id]
}
url=(f"https://api.telegram.org/file/bot{token}/{file_path}")
response = requests.get(url)
image_data = BytesIO(response.content)
media=MediaIoBaseUpload(image_data, 'image/jpeg')
serviceDrive.files().create(body=metadata,media_body=media,fields='id').execute()
In Shopify ORDERS API, I use
/admin/api/2021-01/orders/count.json
In order to get the orders count, so I wanted to get all the orders. And by followign the REST API Documentation, I used two endpoints to do this.
/admin/api/2021-01/orders.json?status=any
/admin/api/2021-01/orders.json?limit=250&status=any; rel=next
First I would request the orders using the first endpoint where I get up to 50 orders/items in a list.
Then by using the counter as a limit, lets say I have 550 orders that I got from the response of orders/count.json
I do:
accumulated = []
iter = 0
while True:
if len(accumulated) > count:
break
if iter != 1:
url = #user first url
else:
url = $use second url that has next
items = #make a request here for that url and save each order item
accumulated+=items #this saves each list to the accumulated list so we know that we hit the count
But for some reason im only getting a fraction of the count. Lets say out of 550 on count, I only get 350 that are not duplicates of each other. Im thinking that maybe the second url, only requests the second page and doesnt proceed to the third page. Hence I was doing
first iteration = first page
second iteration = second page
third iteration = second page
all those gets into the accumulated list and stops the loop because of the condition that when accumulated exceeds count the loop will stop.
How can I make it so when I request ORDERS Endpoint in shopify. I go from the next pages properly?
I tried following shopify's tutorial in making paginated requests, but its unclear for me. on how to use it. THeres this page_info variable thats hard to me to understand where to find it and how to use it.
Hy! In Shopify REST api you can get max 250 orders per api call and if there are more orders you get LINK from header which contain URL for your next page request like
Here you can see I have LINK variable in my response headers you just need to get this LINK and check for rel='next' flag
But keep in mind when you hit the new URL and still you have more orders to fetch then the Header sent LINK with two URL 1 for Previous and 1 for the next.
run this snippet to get LINK from headers
var flag = false;
var next_url = order.headers.link;
if(next_url){
flag = true
next_url = next_url.replace("<","");
next_url = next_url.replace(">", "");
var next_url_array = next_url.split('; ');
var link_counter_start = next_url_array[0].indexOf("page_info=")+10;
var link_counter_length = (next_url_array[0].length);
var next_cursor="";
var link_counter;
for(link_counter=link_counter_start; link_counter<link_counter_length; link_counter++){
next_cursor+=(next_url_array[0][link_counter])
}
}
for the very first api But if you have more then two pages use the following code to seprate next link from previos and next flag
next_url = order.headers.link;
var next_url_array,
link_counter_start, link_counter_length,
link_counter;
if(next_url.includes(',')){
next_url = next_url.split(',');
next_url = next_url[1];
}
next_url = next_url.replace("<","");
next_url = next_url.replace(">", "");
next_url_array = next_url.split('; ');
link_counter_start = next_url_array[0].indexOf("page_info=")+10;
link_counter_length = (next_url_array[0].length);
next_cursor="";
for(link_counter=link_counter_start; link_counter<link_counter_length; link_counter++){
next_cursor+=(next_url_array[0][link_counter])
}
if(next_url_array[1] != 'rel="next"'){
flag = false;
}
I'm sending InlineQueryResultArticle to clients and i'm wondering how to get chosen result and it's data (like result_id,...).
here is the code to send results:
token = 'Bot token'
bot = telegram.Bot(token)
updater = Updater(token)
dispatcher = updater.dispatcher
def get_inline_results(bot, update):
query = update.inline_query.query
results = list()
results.append(InlineQueryResultArticle(id='1000',
title="Book 1",
description='Description of this book, author ...',
thumb_url='https://fakeimg.pl/100/?text=book%201',
input_message_content=InputTextMessageContent(
'chosen book:')))
results.append(InlineQueryResultArticle(id='1001',
title="Book 2",
description='Description of the book, author...',
thumb_url='https://fakeimg.pl/300/?text=book%202',
input_message_content=InputTextMessageContent(
'chosen book:')
))
update.inline_query.answer(results)
inline_query_handler = InlineQueryHandler(get_inline_results)
dispatcher.add_handler(inline_query_handler)
I'm looking for a method like on_inline_chosen(data) to get id of the chosen item. (1000 or 1001 for snippet above) and then send the appropriate response to user.
You should set /setinlinefeedback in #BotFather, then you will get this update
OK, i got my answer from here
Handling user chosen result:
from telegram.ext import ChosenInlineResultHandler
def on_result_chosen(bot, update):
print(update.to_dict())
result = update.chosen_inline_result
result_id = result.result_id
query = result.query
user = result.from_user.id
print(result_id)
print(user)
print(query)
print(result.inline_message_id)
bot.send_message(user, text='fetching book data with id:' + result_id)
result_chosen_handler = ChosenInlineResultHandler(on_result_chosen)
dispatcher.add_handler(result_chosen_handler)
I've hit a wall with the way I would like to use the YouTube data API. I have a user account that is trying to act as an 'aggregator', by adding videos from various other channels into one of about 15 playlists, based on categories. My problem is, I can't get all these videos into a single feed, because they belong to various YouTube users. I'd like to get them all into a single list, so I could sort that master list by most recent and most popular, to populate different views in my web app.
How can I get a list of all the videos that a user has added to any of their playlists?
YouTube must track this kind of stuff, because if you go into the "Feed" section of any user's page at `http://www.youtube.com/' it gives you a stream of activity that includes videos added to playlists.
To be clear, I don't want to fetch a list of videos uploaded by just this user, so http://gdata.../<user>/uploads won't work. Since there are a number of different playlists, http://gdata.../<user>/playlists won't work either, because I would need to make about 15 requests each time I wanted to check for new videos.
There seems to be no way to retrieve a list of all videos that a user has added to all of their playlists. Can somebody think of a way to do this that I might have overlooked?
Something like this for retrieving youtube links from playlist. It still need improvements.
import urllib2
import xml.etree.ElementTree as et
import re
import os
more = 1
id_playlist = raw_input("Enter youtube playlist id: ")
number_of_iteration = input("How much video links: ")
number = number_of_iteration / 50
number2 = number_of_iteration % 50
if (number2 != 0):
number3 = number + 1
else:
number3 = number
start_index = 1
while more <= number3:
#reading youtube playlist page
if (more != 1):
start_index+=50
str_start_index = str(start_index)
req = urllib2.Request('http://gdata.youtube.com/feeds/api/playlists/'+ id_playlist + '?v=2&&start-index=' + str_start_index + '&max-results=50')
response = urllib2.urlopen(req)
the_page = response.read()
#writing page in .xml
dat = open("web_content.xml","w")
dat.write(the_page)
dat.close()
#searching page for links
tree = et.parse('web_content.xml')
all_links = tree.findall('*/{http://www.w3.org/2005/Atom}link[#rel="alternate"]')
#writing links + attributes to .txt
if (more == 1):
till_links = 50
else:
till_links = start_index + 50
str_till_links = str(till_links)
dat2 = open ("links-"+ str_start_index +"to"+ str_till_links +".txt","w")
for links in all_links:
str1 = (str(links.attrib) + "\n")
dat2.write(str1)
dat2.close()
#getting only links
f = open ("links-"+ str_start_index +"to"+ str_till_links +".txt","r")
link_all = f.read()
new_string = link_all.replace("{'href': '","")
new_string2 = new_string.replace("', 'type': 'text/html', 'rel': 'alternate'}","")
f.close()
#writing links to .txt
f = open ("links-"+ str_start_index +"to"+ str_till_links +".txt","w")
f.write(new_string2)
f.close()
more+=1
os.remove('web_content.xml')
print "Finished!"