Yahoo API into Pandas Dataframe - pandas

I am trying to pull stock data from yahoo and put it into a pandas dataframe. If I put the api call into a web browswer I return results ... so I think the API still works but I don't know how to get it into pandas.
Thx
input
import pandas as pd
api = 'https://query1.finance.yahoo.com/v8/finance/chart/AAPL?interval=5m'
df = pd.read_csv(api, skiprows=8, header=None)

I think yahoo finance already shutdown api service to prevent data abuse ,a valid token is required to initiate data transfer,you need to exlplore other services provider like alpha
vantage as alternative
https://www.alphavantage.co/

Related

Parsing Requests Module Response JSON

I am trying to use the [OpenMeteo API][1] to download CSVs via script. Using their API Builder, and their click-able site, I am able to do it for a single site just fine.
In trying to write a function to accomplish this in a loop via the requests library, I am stumped. I CANNOT get the json data OUT of the type requests.model.Response and into a JSON or even better CSV/ Pandas format.
Cannot parse the requests module response to anything. It appears to succeed ("200" response), and have data, but I cannot get it out of the response format!
latitude= 60.358
longitude= -148.939
request_txt=r"https://archive-api.open-meteo.com/v1/archive?latitude="+ str(latitude)+"&longitude="+ str(longitude)+ "&start_date=1959-01-01&end_date=2023-02-10&models=era5_land&daily=temperature_2m_max,temperature_2m_min,temperature_2m_mean,shortwave_radiation_sum,precipitation_sum,rain_sum,snowfall_sum,et0_fao_evapotranspiration&timezone=America%2FAnchorage&windspeed_unit=ms"
r=requests.get(request_txt).json()
[1]: https://open-meteo.com/en/docs/historical-weather-api
The url needs format=csv added as a parameter. StringIO is used to download as an object into memory.
from io import StringIO
import pandas as pd
import requests
latitude = 60.358
longitude = -148.939
url = ("https://archive-api.open-meteo.com/v1/archive?"
f"latitude={latitude}&"
f"longitude={longitude}&"
"start_date=1959-01-01&"
"end_date=2023-02-10&"
"models=era5_land&"
"daily=temperature_2m_max,temperature_2m_min,temperature_2m_mean,"
"shortwave_radiation_sum,precipitation_sum,rain_sum,snowfall_sum,"
"et0_fao_evapotranspiration&"
"timezone=America%2FAnchorage&"
"windspeed_unit=ms&"
"format=csv")
with requests.Session() as request:
response = request.get(url, timeout=30)
if response.status_code != 200:
print(response.raise_for_status())
df = pd.read_csv(StringIO(response.text), sep=",", skiprows=2)
print(df)

TIKTOK API - Requesting more responses from hashtag API class

I am running the following code to retrieve account info by hashtag from the unofficial TikTok Api.
API Repository - https://github.com/davidteather/TikTok-Api
Class Definition - https://dteather.com/TikTok-Api/docs/TikTokApi/tiktok.html#TikTokApi.hashtag
However it seems the maximum number of responses I can get on any hashtag is 500 or so.
Is there a way I can request more? Say...10K lines of account info?
from TikTokApi import TikTokApi
import pandas as pd
hashtag = "ugc"
count = 50000
with TikTokApi() as api:
tag = api.hashtag(name=hashtag)
print(tag.info())
lst = []
for video in tag.videos(count=count):
lst.append(video.author.as_dict)
df = pd.DataFrame(lst)
print(df)
The above code for hashtag "ugc" produces only 482 results. Whereas I know there is significantly more results available from TikTok.
It’s ratelimit, TikTok will block you after a certain amount of requests.
Add proxy support like this:
proxies = open('proxies.txt', 'r').read().splitlines()
proxy = random.choice(proxies)
proxies = {'http': f'http://{proxy}', 'https': f'https://{proxy}'}
Could be more precise but it's a way to rotate them

What data can I save from the spotify API?

I'm building a website and I'm using the Spotify API as a music library. I would like to add more filters and order options to search traks than the api allows me to so I was wondering what track/song data can I save to my DB from the API, like artist name or popularity.
I would like to save: Name, Artists, Album and some other stuff. Is that possible or is it against the terms and conditions?
Thanks in advance!
Yes, it is possible.
Data is stored in Spotify API in endpoints.
Spotify API endpoint reference here.
Each endpoint deals with the specific kind of data being requested by the client (you).
I'll give you one example. The same logic applies for all other endpoints.
import requests
"""
Import library in order to make api calls.
Alternatively, ou can also use a wrapper like "Spotipy"
instead of requesting directely.
"""
# hit desired endpoint
SEARCH_ENDPOINT = 'https://api.spotify.com/v1/search'
# define your call
def search_by_track_and_artist(artist, track):
path = 'token.json' # you need to get a token for this call
# endpoint reference page will provide you with one
# you can store it in a file
with open(path) as t:
token = json.load(t)
# call API with authentication
myparams = {'type': 'track'}
myparams['q'] = "artist:{} track:{}".format(artist,track)
resp = requests.get(SEARCH_ENDPOINT, params=myparams, headers={"Authorization": "Bearer {}".format(token)})
return resp.json()
try it:
search_by_track_and_artist('Radiohead', 'Karma Police')
Store the data and process it as you wish. But you must comply with Spotify terms in order to make it public.
sidenote: Spotipy docs.

Pyspark: how to streaming Data from a given API Url

I was given an API url, and a method getUserPost() which returns the data needed for my data processing function. I am able to get the data by using Client from suds.client as follow:
from suds.client import Client
from suds.xsd.doctor import ImportDoctor, Import
url = 'url'
imp = Import('http://schemas.xmlsoap.org/soap/encoding/')
imp.filter.add('filter')
d = ImportDoctor(imp)
client = Client(url, doctor=d)
tempResult = client.service.getUserPosts(user_ids = '',date_from='2016-07-01 03:19:57', date_to='2016-08-01 03:19:57', limit=100, offset=0)
Now, each tempResult will contain 100 records. I want to stream the data from given API url to RDD for parallelized processing. However, after reading the pySpark.Streaming documentation I can't find a streaming method for customized data source. Could anyone give me an ideal how to do so?
Thank you.
After a while digging, I found out how to solve the problem. I employed the use of Kafka Streaming. Basically you need to create a producer from given API, specify topic and Port for communication. Then a consumer to listen to that specific topic and Port to start streaming the data.
Note that the Producer and Consumer must be working as different threads in order to archive real-time streaming.

Credentials Error when integrating Google Drive with

I am using Google Big Query, I want to integrate Google Big Query to Google Drive. In Big query I am giving the Google spread sheet url to upload my data It is updating well, but when I write the query in google Add-on(OWOX BI Big Query Reports):
Select * from [datasetName.TableName]
I am getting an error:
Query failed: tableUnavailable: No suitable credentials found to access Google Drive. Contact the table owner for assistance.
I just faced the same issue in a some code I was writing - it might not directly help you here since it looks like you are not responsible for the code, but it might help someone else, or you can ask the person who does write the code you're using to read this :-)
So I had to do a couple of things:
Enable the Drive API for my Google Cloud Platform project in addition to BigQuery.
Make sure that your BigQuery client is created with both the BigQuery scope AND the Drive scope.
Make sure that the Google Sheets you want BigQuery to access are shared with the "...#appspot.gserviceaccount.com" account that your Google Cloud Platform identifies itself as.
After that I was able to successfully query the Google Sheets backed tables from BigQuery in my own project.
What was previously said is right:
Make sure that your dataset in BigQuery is also shared with the Service Account you will use to authenticate.
Make sure your Federated Google Sheet is also shared with the service account.
The Drive Api should as well be active
When using the OAuthClient you need to inject both scopes for the Drive and for the BigQuery
If you are writing Python:
credentials = GoogleCredentials.get_application_default() (can't inject scopes #I didn't find a way :D at least
Build your request from scratch:
scopes = (
'https://www.googleapis.com/auth/drive.readonly', 'https://www.googleapis.com/auth/cloud-platform')
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'/client_secret.json', scopes)
http = credentials.authorize(Http())
bigquery_service = build('bigquery', 'v2', http=http)
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * FROM [test.federated_sheet]')
}
query_response = query_request.query(
projectId='hello_world_project',
body=query_data).execute()
print('Query Results:')
for row in query_response['rows']:
print('\t'.join(field['v'] for field in row['f']))
This likely has the same root cause as:
BigQuery Credential Problems when Accessing Google Sheets Federated Table
Accessing federated tables in Drive requires additional OAuth scopes and your tool may only be requesting the bigquery scope. Try contacting your vendor to update their application?
If you're using pd.read_gbq() as I was, then this would be the best place to get your answer: https://github.com/pydata/pandas-gbq/issues/161#issuecomment-433993166
import pandas_gbq
import pydata_google_auth
import pydata_google_auth.cache
# Instead of get_user_credentials(), you could do default(), but that may not
# be able to get the right scopes if running on GCE or using credentials from
# the gcloud command-line tool.
credentials = pydata_google_auth.get_user_credentials(
scopes=[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform',
],
# Use reauth to get new credentials if you haven't used the drive scope
# before. You only have to do this once.
credentials_cache=pydata_google_auth.cache.REAUTH,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as with Google Colab.
auth_local_webserver=True,
)
sql = """SELECT state_name
FROM `my_dataset.us_states_from_google_sheets`
WHERE post_abbr LIKE 'W%'
"""
df = pandas_gbq.read_gbq(
sql,
project_id='YOUR-PROJECT-ID',
credentials=credentials,
dialect='standard',
)
print(df)