Twitter REST API: tweet extraction - api

The following code has been written by me to extract tweets with specific hashtags.
import json
import oauth2
import time
import io
Consumer_Key = ""
Consumer_Secret = ""
access_token = ""
access_token_secret = ""
def oauth_req(url, key, secret, http_method="GET", post_body="", http_headers=None):
consumer = oauth2.Consumer(key="", secret="")
token = oauth2.Token(key=key, secret=secret)
client = oauth2.Client(consumer, token)
content = client.request( url, method=http_method, body=post_body, headers=http_headers )
return content
tweet_url = 'https://twitter.com/search.json?q=%23IPv4%20OR%20%23ISP%20OR%20%23WiFi%20OR%20%23Modem%20OR%20%23Internet%20OR%20%23IPV6'
jsn = oauth_req( tweet_url, access_token, access_token_secret )
print jsn
My hashtags are: IPv4, IPv6, ISP, Internet, Modem. I want my code to see if a tweet has at least one of the hashtags that tweet should be written to my file.
But, unfortunately it is returning the html tags instead.
The output is as follows:
({'content-length': '338352', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff',........................
.............................-post-iframe" name="tweet-post-iframe"></iframe>\n <iframe aria-hidden="true" class="dm-post-iframe" name="dm-post-iframe"></iframe>\n\n</div>\n\n </body>\n</html>\n')
Any lead in this regard will be appreciated.

Take a look at your tweet url which is
tweet_url = 'https://twitter.com/search.json?q=%23IPv4%20OR%20%23ISP%20OR%20%23WiFi%20OR%20%23Modem%20OR%20%23Internet%20OR%20%23IPV6'
which is the url of website.
But if you are trying to extract tweets through Twitter API just replace above url with this url :
tweet_url = 'https://api.twitter.com/1.1/search/tweets.json?q=%23IPv4%20OR%20%23ISP%20OR%20%23WiFi%20OR%20%23Modem%20OR%20%23Internet%20OR%20%23IPV6'

Related

Encrypted access token request to google api failed with 400 code

Recently I come up a scenario where I need to encrypt a WEB API request and response using PyCryptodome inside Synapse notebook activity. I am trying to make a call to Google API, but the request should be encrypted and similarly response should be encrypted. After making the call with encrypted data, I am getting below error.
Error:
error code: 400, message: Invalid JSON Payload received. Unexpected Token, Status: Invalid argument.
I have written below code:-
import os
import requests
import json
import base64
from Crypto import Random
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
from Crypto.Util.padding import pad,unpad
import secrets
key= os.urandom(16)
iv = Random.new().read(AES.block_size)
def encrypt_data(key, data):
BS = AES.block_size
pad = lambda s: s + ((BS - len(s) % BS) * chr(BS - len(s) % BS)).encode()
cipher = AES.new(key, AES.MODE_CBC, iv)
encrypted_data = base64.b64encode(cipher.encrypt(pad(data)))
return encrypted_data
url = "https://accounts.google.com/o/oauth2/token"
client_Id = "XXXXX"
client_secret = "YYYYY"
grant_type = "refresh_token"
refresh_token = "ZZZZZZ"
access_type="offline"
data = {"grant_type":grant_type,
"client_id":client_Id,
"client_secret":client_secret,
"refresh_token":refresh_token,
"access_type":access_type
}
encode_data = json.dumps(data).encode("utf-8")
encrypt_data = encrypt_data(key,encode_data)
response = requests.post(url, data = encrypt_data)
print(response.content)
It would be really helpful if someone can give me idea or guide me on how I can achieve this.
Thank You!

How to extract all data from a XHR request

I found a XHR request with all the street address info I want to scrape within it.
However, I do not know how to extract it to a pandas dataframe or a python list. Any ideas? Thank you very much!
Since it's graphql, you can formulate the query string however you like, but here I've written it the same way it's sent when the browser makes a request:
def main():
import requests
url = "https://api-endpoint.cons-prod-us-central1.kw.com/graphql"
headers = {
"x-shared-secret": "MjFydHQ0dndjM3ZAI0ZHQCQkI0BHIyM="
}
query = """{
ListOfficeQuery {
id
name
address
subAddress
phone
fax
lat
lng
url
contacts {
name
email
phone
__typename
}
__typename
}
}
"""
payload = {
"operationName": None,
"variables": {},
"query": query
}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
offices = response.json()["data"]["ListOfficeQuery"]
print(f"There are {len(offices)} offices, and the first one's address is \"{offices[0]['address']}\"")
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
There are 1173 offices, and the first one's address is "1801 South Mo-Pac Expressway, Suite 100"
>>>
You can replicate the XHR using python's requests library using the information in the headers tab (more info here). Then parse the data using json library, and extract information.

Get list of Spaces where the Google hangout chat bot is installed?

https://developers.google.com/hangouts/chat/reference/rest/v1/spaces/list
I am using this to get a list of all spaces where the bot is installed, but am not able to get a response:
Here is the code snippet.
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'service-account.json', scopes)
http = Http()
credentials.authorize(http)
chat = build('chat', 'v1', http=http)
res = chat.spaces().list()
print res.body. ----- this gives null
am I missing something here ?
For anyone still facing this issue,
Use res = chat.spaces().list().execute()
instead of res = chat.spaces().list()

Sending form data with an HTTP PUT request using Grinder API

I'm trying to replicate the following successful cURL operation with Grinder.
curl -X PUT -d "title=Here%27s+the+title&content=Here%27s+the+content&signature=myusername%3A3ad1117dab0ade17bdbd47cc8efd5b08" http://www.mysite.com/api
Here's my script:
from net.grinder.script import Test
from net.grinder.script.Grinder import grinder
from net.grinder.plugin.http import HTTPRequest
from HTTPClient import NVPair
import hashlib
test1 = Test(1, "Request resource")
request1 = HTTPRequest(url="http://www.mysite.com/api")
test1.record(request1)
log = grinder.logger.info
test1.record(log)
m = hashlib.md5()
class TestRunner:
def __call__(self):
params = [NVPair("title","Here's the title"),NVPair("content", "Here's the content")]
params.sort(key=lambda param: param.getName())
ps = ""
for param in params:
ps = ps + param.getValue() + ":"
ps = ps + "myapikey"
m.update(ps)
params.append(NVPair("signature", ("myusername:" + m.hexdigest())))
request1.setFormData(tuple(params))
result = request1.PUT()
The test runs okay, but it seems that my script doesn't actually send any of the params data to the API, and I can't work out why. There are no errors generated, but I get a 401 Unauthorized response from the API, indicating that a successful PUT request reached it, but obviously without a signature the request was rejected.
This isn't exactly an answer, more of a workaround that I came up with, that I've decided to post since this question hasn't yet received any responses, and it may help anyone else trying to achieve the same thing.
The workaround is basically to use the httplib and urllib modules to build and make the PUT request instead of the HTTPClient module.
import hashlib
import httplib, urllib
....
params = [("title", "Here's the title"),("content", "Here's the content")]
params.sort(key=lambda param: param[0])
ps = ""
for param in params:
ps = ps + param[1] + ":"
ps = ps + "myapikey"
m = hashlib.md5()
m.update(ps)
params.append(("signature", "myusername:" + m.hexdigest()))
params = urllib.urlencode(params)
print params
headers = {"Content-type": "application/x-www-form-urlencoded"}
conn = httplib.HTTPConnection("www.mysite.com:80")
conn.request("PUT", "/api", params, headers)
response = conn.getresponse()
print response.status, response.reason
print response.read()
conn.close()
(Based on the example at the bottom of this documentation page.)
You have to refer to the multi-form posting example in Grinder script gallery, but changing the Post to Put. It works for me.
files = ( NVPair("self", "form.py"), )
parameters = ( NVPair("run number", str(grinder.runNumber)), )
# This is the Jython way of creating an NVPair[] Java array
# with one element.
headers = zeros(1, NVPair)
# Create a multi-part form encoded byte array.
data = Codecs.mpFormDataEncode(parameters, files, headers)
grinder.logger.output("Content type set to %s" % headers[0].value)
# Call the version of POST that takes a byte array.
result = request1.PUT("/upload", data, headers)

Google Reader API Unread Count

Does Google Reader have an API and if so, how can I get the count of the number of unread posts for a specific user knowing their username and password?
This URL will give you a count of unread posts per feed. You can then iterate over the feeds and sum up the counts.
http://www.google.com/reader/api/0/unread-count?all=true
Here is a minimalist example in Python...parsing the xml/json and summing the counts is left as an exercise for the reader:
import urllib
import urllib2
username = 'username#gmail.com'
password = '******'
# Authenticate to obtain SID
auth_url = 'https://www.google.com/accounts/ClientLogin'
auth_req_data = urllib.urlencode({'Email': username,
'Passwd': password,
'service': 'reader'})
auth_req = urllib2.Request(auth_url, data=auth_req_data)
auth_resp = urllib2.urlopen(auth_req)
auth_resp_content = auth_resp.read()
auth_resp_dict = dict(x.split('=') for x in auth_resp_content.split('\n') if x)
auth_token = auth_resp_dict["Auth"]
# Create a cookie in the header using the SID
header = {}
header['Authorization'] = 'GoogleLogin auth=%s' % auth_token
reader_base_url = 'http://www.google.com/reader/api/0/unread-count?%s'
reader_req_data = urllib.urlencode({'all': 'true',
'output': 'xml'})
reader_url = reader_base_url % (reader_req_data)
reader_req = urllib2.Request(reader_url, None, header)
reader_resp = urllib2.urlopen(reader_req)
reader_resp_content = reader_resp.read()
print reader_resp_content
And some additional links on the topic:
http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI
How do you access an authenticated Google App Engine service from a (non-web) python client?
http://blog.gpowered.net/2007/08/google-reader-api-functions.html
It is there. Still in Beta though.
Here is an update to this answer
import urllib
import urllib2
username = 'username#gmail.com'
password = '******'
# Authenticate to obtain Auth
auth_url = 'https://www.google.com/accounts/ClientLogin'
#auth_req_data = urllib.urlencode({'Email': username,
# 'Passwd': password})
auth_req_data = urllib.urlencode({'Email': username,
'Passwd': password,
'service': 'reader'})
auth_req = urllib2.Request(auth_url, data=auth_req_data)
auth_resp = urllib2.urlopen(auth_req)
auth_resp_content = auth_resp.read()
auth_resp_dict = dict(x.split('=') for x in auth_resp_content.split('\n') if x)
# SID = auth_resp_dict["SID"]
AUTH = auth_resp_dict["Auth"]
# Create a cookie in the header using the Auth
header = {}
#header['Cookie'] = 'Name=SID;SID=%s;Domain=.google.com;Path=/;Expires=160000000000' % SID
header['Authorization'] = 'GoogleLogin auth=%s' % AUTH
reader_base_url = 'http://www.google.com/reader/api/0/unread-count?%s'
reader_req_data = urllib.urlencode({'all': 'true',
'output': 'xml'})
reader_url = reader_base_url % (reader_req_data)
reader_req = urllib2.Request(reader_url, None, header)
reader_resp = urllib2.urlopen(reader_req)
reader_resp_content = reader_resp.read()
print reader_resp_content
Google Reader removed SID auth around June, 2010 (I think), using new Auth from ClientLogin is the new way and it's a bit simpler (header is shorter). You will have to add service in data for requesting Auth, I noticed no Auth returned if you don't send the service=reader.
You can read more about the change of authentication method in this thread.
In the API posted in [1], the "token" field should be "T"
[1] http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI