Oauth2client error when trying Apache Beams example code - apache

I am trying to run the Apache Beam Python examples (for example https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/game/user_score.py ). When I run the script I get
(py27) XXX-MBP:my_path$ python user_score.py --output /my_path/user_score_output
No handlers could be found for logger "oauth2client.contrib.multistore_file"
INFO:root:Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
/my_path_2/miniconda3/envs/py27/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py:121: DeprecationWarning: object() takes no parameters
super(GcsIO, cls).__new__(cls, storage_client))
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.453105211258 seconds
/my_path_2/miniconda3/envs/py27/lib/python2.7/site-packages/apache_beam/coders/typecoders.py:135: UserWarning: Using fallback coder for typehint: Any.
warnings.warn('Using fallback coder for typehint: %r.' % typehint)
/my_path_2/miniconda3/envs/py27/lib/python2.7/site-packages/apache_beam/coders/typecoders.py:135: UserWarning: Using fallback coder for typehint: <type 'NoneType'>.
warnings.warn('Using fallback coder for typehint: %r.' % typehint)
INFO:root:Running pipeline with DirectRunner.
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Finished the size estimation of the input at 2 files. Estimation took 0.477814912796 seconds
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
The problem seems to be with Oauth2client and getting an access token. I googled a bit and people are talking about setting consent='prompt' but it is in the domain of asking users for consent. Here I am just downloading a dataset. Is it access to my Google Cloud service that might be struggling? Other programs can write programs to Google Cloud Storage. I am looking at https://developers.google.com/identity/protocols/OAuth2 but I don't really understand what I am supposed to be doing ...

Related

Scraping Twitter Data

I tried to scrape Twitter Data with an academic twitter API. The code worked for almost every case, even though there are a few cases, where it doesn't.
I used the code tweets <- get_all_tweets("#Honda lang: en -is:retweet", "2018-08-06T00:00:00Z", "2018-08-26T00:00:00Z", n = Inf)
but after scraping 4 pages of tweets, the following error occurred:
Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, :
Too many errors.
In addition: Warning messages:
1: Recommended to specify a data path in order to mitigate data loss when ingesting large amounts of data.
2: Tweets will not be stored as JSONs or as a .rds file and will only be available in local memory if assigned to an object.
I actually don't get what the problem is because the code works even for cases with more than 35.000 Tweets. Therefore, I don't think the number of tweets is the reason.
`

Getting started with OFX, How to download transaction statements

I would like to upgrade a macOS application I built to allow users to directly download their transaction statements via OFX data direct connect.
So far, my research has found two good sources of info
OfxTools: https://ofxtools.readthedocs.io/en/latest/index.html
OfxHome: https://www.ofxhome.com/index.php/home/ofxget
I've downloaded and installed both (command line from OfxHome and Python from OfxTools). Then tried to download statements from two of my accounts (TD Ameritrade and Fidelity). Both failed.
TD Ameritrade fails with "Signon Invalid"
Fidelity fails with "Access Denied" cannot access from this server
I initially thought the issue is that I am overseas and running from my own computer. So I installed the same two packages on a hosted server running out of San Francisco. Yet, both failed with the same errors.
I happen to be a member of an application (Profit.ly) that allows you to download your transaction statements (they use OFX, just like Quicken, etc). I can enter my TD Ameritrade and Fidelity credentials on Profit.ly, and it can download my transaction data successfully.
So what am I missing? Below is my request from OfxHome cmd line for fidelity.
./ofxget -institution 449 -request accounts.txt
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:20221211035100.000
<OFX>
<SIGNONMSGSRQV1>
<SONRQ>
<DTCLIENT>20221211035100.000
<USERID>MyUserName
<USERPASS>MyPassword
<LANGUAGE>ENG
<FI>
<ORG>fidelity.com
<FID>7776
</FI>
<APPID>QBW
<APPVER>1800
</SONRQ>
</SIGNONMSGSRQV1>
<SIGNUPMSGSRQV1>
<ACCTINFOTRNRQ>
<TRNUID>20221211035100.000
<CLTCOOKIE>1
<ACCTINFORQ>
<DTACCTUP>19691231
</ACCTINFORQ>
</ACCTINFOTRNRQ>
</SIGNUPMSGSRQV1>
</OFX>
Fidelity Response
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access "http://ofx.fidelity.com/ftgw/OFX/clients/download" on this server.<P>
Reference #18.65f466ab.1670748661.20bf6f3d
</BODY>
</HTML>
TD Ameritrade OFX request
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:20221211040449.000
<OFX>
<SIGNONMSGSRQV1>
<SONRQ>
<DTCLIENT>20221211040449.000
<USERID>MyUserName
<USERPASS>MyPassword
<LANGUAGE>ENG
<FI>
<ORG>ameritrade.com
<FID>5024
</FI>
<APPID>QBW
<APPVER>1800
</SONRQ>
</SIGNONMSGSRQV1>
<INVSTMTMSGSRQV1>
<INVSTMTTRNRQ>
<TRNUID>20221211040449.000
<CLTCOOKIE>4
<INVSTMTRQ>
<INVACCTFROM>
<BROKERID>ameritrade.com
<ACCTID>MyAcctNumber
</INVACCTFROM>
<INCTRAN>
<DTSTART>19000101
<INCLUDE>Y
</INCTRAN>
<INCOO>Y
<INCPOS>
<DTASOF>20221211040449.000
<INCLUDE>Y
</INCPOS>
<INCBAL>Y
</INVSTMTRQ>
</INVSTMTTRNRQ>
</INVSTMTMSGSRQV1>
</OFX>
TD Ameritrade Response
OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:20221211040449.000
<OFX><SIGNONMSGSRSV1><SONRS><STATUS><CODE>15500</CODE><SEVERITY>ERROR</SEVERITY><MESSAGE>Signon invalid</MESSAGE></STATUS><DTSERVER>20221211040450</DTSERVER><LANGUAGE>ENG</LANGUAGE><FI><ORG>ameritrade.com</ORG><FID>5024</FID></FI></SONRS></SIGNONMSGSRSV1><INVSTMTMSGSRSV1><INVSTMTTRNRS><TRNUID>20221211040449.000</TRNUID><STATUS><CODE>15500</CODE><SEVERITY>ERROR</SEVERITY><MESSAGE>pr-txlvofx-pp02-clientsys Signon invalid</MESSAGE></STATUS><CLTCOOKIE>4</CLTCOOKIE></INVSTMTTRNRS></INVSTMTMSGSRSV1></OFX>
Why can I not use my own computer with my own bank and credentials to download my data via command line using ofxtools or ofxget?
Is there some sort of authorization token, SSL cert, or something I am missing?
Is there a list somewhere of servers that banks trust to fetch OFX data?
Am I asking the wrong questions? lol
Possible deprecation: OFX merging with FDX (https://financialdataexchange.org/ofx)
Update
I got TD Ameritrade to response successfully by changing the APPID and APPVER
From
<APPID>QBW
<APPVER>1800
To
<APPID>MDNC
<APPVER>2021
Using OfxHome commands, I was able to fetch my TD Ameritrade transaction data however, still stuck on Fidelity.
This raises a bigger issue, which is probably core to the whole OFX thing. I'm trying to test OFX data fetching by using two brokers that I have. Straight off the bat I ran into problems and have spent all day just to figure out how to get one broker working. This is not scalable.
I guess the core of the problem is to find a database that contains the most update-to-date data for all the financial institutions. Then, be able to read the database, parse the data into the correct OFX format (w/ correct tags, version #, data), then send the request. This is the most challenging part.
I've reviewed 5 OFX libraries and tested 3 of them. All failed out-of-the-box to format a proper OFX request for TD and Fidelity. After hours of work I was able to piece together what is required by TD.
So the question is
Who has the most updated OFX request data for each financial institution?
Code to compile the OFX data into a proper request for each OFX version

Credentials Error when integrating Google Drive with

I am using Google Big Query, I want to integrate Google Big Query to Google Drive. In Big query I am giving the Google spread sheet url to upload my data It is updating well, but when I write the query in google Add-on(OWOX BI Big Query Reports):
Select * from [datasetName.TableName]
I am getting an error:
Query failed: tableUnavailable: No suitable credentials found to access Google Drive. Contact the table owner for assistance.
I just faced the same issue in a some code I was writing - it might not directly help you here since it looks like you are not responsible for the code, but it might help someone else, or you can ask the person who does write the code you're using to read this :-)
So I had to do a couple of things:
Enable the Drive API for my Google Cloud Platform project in addition to BigQuery.
Make sure that your BigQuery client is created with both the BigQuery scope AND the Drive scope.
Make sure that the Google Sheets you want BigQuery to access are shared with the "...#appspot.gserviceaccount.com" account that your Google Cloud Platform identifies itself as.
After that I was able to successfully query the Google Sheets backed tables from BigQuery in my own project.
What was previously said is right:
Make sure that your dataset in BigQuery is also shared with the Service Account you will use to authenticate.
Make sure your Federated Google Sheet is also shared with the service account.
The Drive Api should as well be active
When using the OAuthClient you need to inject both scopes for the Drive and for the BigQuery
If you are writing Python:
credentials = GoogleCredentials.get_application_default() (can't inject scopes #I didn't find a way :D at least
Build your request from scratch:
scopes = (
'https://www.googleapis.com/auth/drive.readonly', 'https://www.googleapis.com/auth/cloud-platform')
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'/client_secret.json', scopes)
http = credentials.authorize(Http())
bigquery_service = build('bigquery', 'v2', http=http)
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * FROM [test.federated_sheet]')
}
query_response = query_request.query(
projectId='hello_world_project',
body=query_data).execute()
print('Query Results:')
for row in query_response['rows']:
print('\t'.join(field['v'] for field in row['f']))
This likely has the same root cause as:
BigQuery Credential Problems when Accessing Google Sheets Federated Table
Accessing federated tables in Drive requires additional OAuth scopes and your tool may only be requesting the bigquery scope. Try contacting your vendor to update their application?
If you're using pd.read_gbq() as I was, then this would be the best place to get your answer: https://github.com/pydata/pandas-gbq/issues/161#issuecomment-433993166
import pandas_gbq
import pydata_google_auth
import pydata_google_auth.cache
# Instead of get_user_credentials(), you could do default(), but that may not
# be able to get the right scopes if running on GCE or using credentials from
# the gcloud command-line tool.
credentials = pydata_google_auth.get_user_credentials(
scopes=[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform',
],
# Use reauth to get new credentials if you haven't used the drive scope
# before. You only have to do this once.
credentials_cache=pydata_google_auth.cache.REAUTH,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as with Google Colab.
auth_local_webserver=True,
)
sql = """SELECT state_name
FROM `my_dataset.us_states_from_google_sheets`
WHERE post_abbr LIKE 'W%'
"""
df = pandas_gbq.read_gbq(
sql,
project_id='YOUR-PROJECT-ID',
credentials=credentials,
dialect='standard',
)
print(df)

How do I upload a video to Youtube directly from my server?

I'm setting up a (headless) web server that lets people build their own custom time-lapse movies.
Several people want to upload the time-lapse videos they make to YouTube.
Rather than download the video to that person's laptop,
and the that person manually uploads it to YouTube,
is there a way I can write some software on my web server to take that video file on my web server and upload it directly to that user's account on YouTube?
I've been told that asking my users for their YouTube handle and password is the Wrong Thing To Do, and I should be using the YouTube V3 API with Oauth.
I tried the techniques listed at
" I want to upload a video from my web page to youtube by using javascript youtube API ",
which seems to "work", but every time I had to download the video to that person's laptop and then uploading from the laptop to YouTube. Is there a way to tweak that system to upload directly from my server to YouTube?
I found some python code that (after I set up my client_secrets.json) lets me upload videos directly from my server directly to someone's YouTube account after that person did the Oauth authentication.
But the first time some new person tries to upload a video to some new YouTube account that my server has never dealt with before, it either
(a) pops open a web browser on my server, and then if I VNC to the server and type in a YouTube handle and password into that web browser, it gets authenticated -- but I'd rather not do that for every user.
(b) with the "--noauth_local_webserver" option, spits out a URL on the command line and waits. Then if I manually copy that URL and paste it into a web browser, log in to YouTube, copy-and-paste the token back into this application that is still waiting for input on the command line, that person gets authenticated. But I'd rather not do that for every user. I guess that would be OK if I could capture that URL in my cgi-bin script and stick it in a web page, and then later somehow get the authentication response and cram it back into this program, but how? I don't even see that print statement or the raw_input statement in this code.
#!/usr/bin/python
# https://developers.google.com/youtube/v3/code_samples/python#upload_a_video
# which is identical to the code sample at
# https://developers.google.com/youtube/v3/docs/videos/insert
import httplib
import httplib2
import os
import random
import sys
import time
from apiclient.discovery import build
from apiclient.errors import HttpError
from apiclient.http import MediaFileUpload
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow
# Explicitly tell the underlying HTTP transport library not to retry, since
# we are handling retry logic ourselves.
httplib2.RETRIES = 1
# Maximum number of times to retry before giving up.
MAX_RETRIES = 10
# Always retry when these exceptions are raised.
RETRIABLE_EXCEPTIONS = (httplib2.HttpLib2Error, IOError, httplib.NotConnected,
httplib.IncompleteRead, httplib.ImproperConnectionState,
httplib.CannotSendRequest, httplib.CannotSendHeader,
httplib.ResponseNotReady, httplib.BadStatusLine)
# Always retry when an apiclient.errors.HttpError with one of these status
# codes is raised.
RETRIABLE_STATUS_CODES = [500, 502, 503, 504]
# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains
# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the Google Developers Console at
# https://console.developers.google.com/.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
# https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
# https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"
# This OAuth 2.0 access scope allows an application to upload files to the
# authenticated user's YouTube channel, but doesn't allow other types of access.
YOUTUBE_UPLOAD_SCOPE = "https://www.googleapis.com/auth/youtube.upload"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0
To make this sample run you will need to populate the client_secrets.json file
found at:
%s
with information from the Developers Console
https://console.developers.google.com/
For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
CLIENT_SECRETS_FILE))
VALID_PRIVACY_STATUSES = ("public", "private", "unlisted")
def get_authenticated_service(args):
flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE,
scope=YOUTUBE_UPLOAD_SCOPE,
message=MISSING_CLIENT_SECRETS_MESSAGE)
storage = Storage("%s-oauth2.json" % sys.argv[0])
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run_flow(flow, storage, args)
return build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
http=credentials.authorize(httplib2.Http()))
def initialize_upload(youtube, options):
tags = None
if options.keywords:
tags = options.keywords.split(",")
body=dict(
snippet=dict(
title=options.title,
description=options.description,
tags=tags,
categoryId=options.category
),
status=dict(
privacyStatus=options.privacyStatus
)
)
# Call the API's videos.insert method to create and upload the video.
insert_request = youtube.videos().insert(
part=",".join(body.keys()),
body=body,
# The chunksize parameter specifies the size of each chunk of data, in
# bytes, that will be uploaded at a time. Set a higher value for
# reliable connections as fewer chunks lead to faster uploads. Set a lower
# value for better recovery on less reliable connections.
#
# Setting "chunksize" equal to -1 in the code below means that the entire
# file will be uploaded in a single HTTP request. (If the upload fails,
# it will still be retried where it left off.) This is usually a best
# practice, but if you're using Python older than 2.6 or if you're
# running on App Engine, you should set the chunksize to something like
# 1024 * 1024 (1 megabyte).
media_body=MediaFileUpload(options.file, chunksize=-1, resumable=True)
)
resumable_upload(insert_request)
# This method implements an exponential backoff strategy to resume a
# failed upload.
def resumable_upload(insert_request):
response = None
error = None
retry = 0
while response is None:
try:
print "Uploading file..."
status, response = insert_request.next_chunk()
if 'id' in response:
print "Video id '%s' was successfully uploaded." % response['id']
else:
exit("The upload failed with an unexpected response: %s" % response)
except HttpError, e:
if e.resp.status in RETRIABLE_STATUS_CODES:
error = "A retriable HTTP error %d occurred:\n%s" % (e.resp.status,
e.content)
else:
raise
except RETRIABLE_EXCEPTIONS, e:
error = "A retriable error occurred: %s" % e
if error is not None:
print error
retry += 1
if retry > MAX_RETRIES:
exit("No longer attempting to retry.")
max_sleep = 2 ** retry
sleep_seconds = random.random() * max_sleep
print "Sleeping %f seconds and then retrying..." % sleep_seconds
time.sleep(sleep_seconds)
if __name__ == '__main__':
argparser.add_argument("--file", required=True, help="Video file to upload")
argparser.add_argument("--title", help="Video title", default="Test Title")
argparser.add_argument("--description", help="Video description",
default="Test Description")
argparser.add_argument("--category", default="22",
help="Numeric video category. " +
"See https://developers.google.com/youtube/v3/docs/videoCategories/list")
argparser.add_argument("--keywords", help="Video keywords, comma separated",
default="")
argparser.add_argument("--privacyStatus", choices=VALID_PRIVACY_STATUSES,
default=VALID_PRIVACY_STATUSES[0], help="Video privacy status.")
args = argparser.parse_args()
if not os.path.exists(args.file):
exit("Please specify a valid file using the --file= parameter.")
youtube = get_authenticated_service(args)
try:
initialize_upload(youtube, args)
except HttpError, e:
print "An HTTP error %d occurred:\n%s" % (e.resp.status, e.content)
use "client_secrets.json"
configure credentials to generate it
https://console.developers.google.com/apis/credentials
{
"web":
{
"client_id":"xxxxxxxxxxxxxx",
"project_id":"xxxxxxxxxxxxxx",
"auth_uri":"https://accounts.google.com/o/oauth2/auth",
"token_uri":"https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
"client_secret":"xxxxxxxxxxxxxxxx",
"redirect_uris":["http://localhost:8090/","http://localhost:8090/Callback"],
"javascript_origins":["http://localhost"]
}
}
Very useful step-by-step guide about how to get access and fresh tokens and save them for future use using YouTube OAuth API v3. PHP server-side YouTube V3 OAuth API video upload guide.
https://www.domsammut.com/code/php-server-side-youtube-v3-oauth-api-video-upload-guide

Batch Request to Google API making calls before the batch executes in Python?

I am working on sending a large batch request to the Google API through Admin SDK which will add members to certain groups based upon the groups in our in-house servers(a realignment script). I am using Python and to access the Google API I am using the [apiclient library][1]. When I create my service and batch object, the creation of the service object requests a URL.
batch_count = 0
batch = BatchHttpRequest()
service = build('admin', 'directory_v1')
logs
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/discovery/v1/apis/admin/directory_v1/rest
which makes sense as the JSON object returned by that HTTP call is used to build the service object.
Now I want to add multiple requests into the batch object, so I do this;
for email in add_user_list:
if batch_count != 999:
add_body = dict()
add_body[u'email'] = email.lower()
for n in range(0, 5):
try:
batch.add(service.members().insert(groupKey=groupkey, body=add_body), callback=batch_callback)
batch_count += 1
break
except HttpError, error:
logging_message('Quota exceeded, waiting to retry...')
time.sleep((2 ** n)
continue
Every time it iterates through the outermost for loop it logs (group address redacted)
INFO:apiclient.discovery:URL being requested:
https://www.googleapis.com/admin/directory/v1/groups/GROUPADDRESS/members?alt=json
Isn't the point of the batch to not send out any calls to the API until the batch request has been populated with all the individual requests? Why does it make the call to the API shown above every time? Then when I batch.execute() it carries out all the requests (which is the only area I thought would be making calls to the API?)
For a second I thought that the call was to construct the object that is returned by .members() but I tried these changes:
service = build('admin', 'directory_v1').members()
batch.add(service.insert(groupKey=groupkey, body=add_body), callback=batch_callback)
and got the same result still. The reason this is an issue is because it is doubling the number of requests to the API and I have real quota concerns at the scale this is running at.