pandas gbq - DistributionNotFound error

pandas gbq - DistributionNotFound error - pandas

I am having trouble using the google bigquery package in pandas. I have installed the google-api-python-client as well as the pandas-gbq packages. But for some reason when I go to query a table I get a DistributionNotFound: The 'google-api-python-client' distribution was not found and is required by the application error. Here is a snippet of my code:
import pandas as pd
from pandas.io import gbq
count_block = gbq.read_gbq('SELECT count(int64_field_0) as count_blocks FROM Data.bh', projectid)

Using a virtual environment in this scenario can allow you to rule out problems with your library installations

Related

pandas_datareader.yahoo.daily not working suddenly [duplicate]

This question already has answers here:
"TypeError: string indices must be integers" when getting data of a stock from Yahoo Finance using Pandas Datareader
(5 answers)
Closed 2 months ago.
One week ago, I ran the following code and did not get an error.
import datetime as dt
import pandas_datareader.yahoo.daily as yd
df1 = yd.YahooDailyReader("SPY", interval='d', start=dt.date(2022,7,1),end=dt.date.today()).read()
However, when I tried the same code today, I got the following error message:
Does anyone know how to solve this problem?

It seems yahoo finance has changed it API or the service is down.
You can use Tiingo API instead (you need to make an account to get an API token)
import pandas_datareader as web
r = web.get_data_tiingo("SPY", api_key=ENV('NEWS_TOKEN'))
see: Remote Data Access#Tiingo

I have the same problem, yfinance is still working
import yfinance as yf
from yahoofinancials import YahooFinancials
my_data = yf.download('TSLA', start='2021-12-17', end='2022-12-17', progress=False)

Nobe at SQL scripting. Please haalp

I am trying to access my table in SQL database. However, I am getting an unusual error. Can someone please help me I am very new at this.
import sqlite3
import pandas as pd
com = sqlite3.connect('Reporting.db')
Note: Panda dataframe is already defined above that's why I am not including this over here.
df.to_sql('tblReporting', com, index=False, if_exists='replace')
print('tblReporting loaded \n')```
%load_ext sql
%sql sqlite:///Reporting.db
%%sql
SELECT *
FROM tblReporting
This is the error I am getting
SELECT *
^
SyntaxError: invalid syntax
Note #2: I am using Anaconda Navigator for writing scripts

Solved it!! that's my Syntax
import sqlite3
import pandas as pd
com = sqlite3.connect('Reporting.db')
df.to_sql('tblReporting', com, index=False, if_exists='replace')
print('tblReporting loaded \n')
org_query = '''SELECT * FROM tblReporting'''
df = pd.read_sql_query(org_query, com)
df.head()
Note: added ''' before and after my org_query helped me resolved this

Making a Google BigQuery from Python on Windows

I am trying to do something which is very simple in other data services. I am trying to make a relatively simple SQL query and return it as a dataframe in python. I am on Windows 10 and using Phython 2.7 (specifically Canopy 1.7.4)
Typically this would be done with pandas.read_sql_query but due to some specifics with BigQuery they require a different method pandas.io.gbq.read_gbq
This method works fine unless you want to make a Big Query. If you make a Big Query on BigQuery you get the error
GenericGBQException: Reason: responseTooLarge, Message: Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
This was asked and answered before in this ticket but neither of the solutions are relevant for my case
Python BigQuery allowLargeResults with pandas.io.gbq
One solution is for python 3 so it is a nonstarter. The other is giving an error due to me being unable to set my credentials as an environment variable in windows.
ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
I was able to download the JSON credentials file and I have set it as an environment variable in the few ways I know how but I still get the above error. Do I need to load this in some way in python? It seems to be looking for it but unable to find is correctly. Is there a special way to set it as an environment variable in this case?

You can do it in Python 2.7 by changing the default dialect from legacy to standard in pd.read_gbq function.
pd.read_gbq(query, 'my-super-project', dialect='standard')
Indeed, you can read in Big Query documentation for the parameter AllowLargeResults:
AllowLargeResults: For standard SQL queries, this flag is
ignored and large results are always allowed.

I have found two ways of directly importing the JSON credentials file. Both based on the original answer in Python BigQuery allowLargeResults with pandas.io.gbq
1) Credit to Tim Swast
First
pip install google-api-python-client
pip install google-auth
pip install google-cloud-core
then
replace
credentials = GoogleCredentials.get_application_default()
in create_service() with
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path/file.json')
2)
Set the environment variable manually in the code like
import os,os.path
os.environ['GOOGLE_APPLICATION_CREDENTIALS']=os.path.expanduser('path/file.json')
I prefer method 2 since it does not require new modules to be installed and is also closer to the intended use of the JSON credentials.
Note:
You must create a destinationTable and add the information to run_query()

Here is a code that fully works within python 2.7 on Windows:
import pandas as pd
my_qry="<insert your big query here>"
### Here Put the data from your credentials file of the service account - all fields are available from there###
my_file="""{
"type": "service_account",
"project_id": "cb4recs",
"private_key_id": "<id>",
"private_key": "<your private key>\n",
"client_email": "<email>",
"client_id": "<id>",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "<x509 url>"
}"""
df=pd.read_gbq(qry,project_id='<your project id>',private_key=my_file)
That's it :)

Why is the plotting-functions of package "ControlSystems" in Julia giving me a "UndefVarError: subplot not defined"?

I use Julia 0.4.3 and I have updated all packages using Pkg.update().
From the "ControlSystems" documentation, it is explicitly stated that plotting requires extra care in that the user is free to choose plotting back-end (I guess back-end means which plotting-package is used by "ControlSystems"). I have installed and like to use pyplot - hence I try the following code:
using ControlSystems
Plots.pyplot()
s = tf("s");
G = 1/(s+1);
stepplot(G);
Gives the error-message
ERROR: UndefVarError: subplot not defined
in stepplot at C:\folder\.julia\v0.4\ControlSystems\src\plotting.jl81
in stepplot at C:\folder\.julia\v0.4\ControlSystems\src\plotting.jl103
I have also tried the same code without the "Plots.pyplot()"-command.

Pandas in python 2.7 for ArcGIS

I have found that pandas v13.0 for Python 2.7 win32 works for most codes I have written in which I want to use both arcpy and pandas. I put that pandas version into the C:\Python27\ArcGIS10.2\Lib\site-packages directory. I tried other versions, but got miscellaneous errors when trying to run them.
I wrote a new code today, however, that manages to not work. It gives the error:
Access violation at address 1E0ACF39 in module 'python27.dll'. Read of
address 9807D3AF.
with the following code:
cond = dfDSS['OBSERVATION NAME']=='A413011CC1'
dfDSS['GROUP'][cond]='HA273UTheads'
All the code before this to create dfDSS using pd.read_csv and inserting the column 'GROUP' with a value of 'other' everywhere is fine. Only when I try to reset the values using the conditional statement.
The code to this point was written in iPython Notebook using Anaconda, but I now want to do some arcpy stuff with it.
Any suggestions in getting the different versions of Python to work together are appreciated.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas