I am trying to access my table in SQL database. However, I am getting an unusual error. Can someone please help me I am very new at this.
import sqlite3
import pandas as pd
com = sqlite3.connect('Reporting.db')
Note: Panda dataframe is already defined above that's why I am not including this over here.
df.to_sql('tblReporting', com, index=False, if_exists='replace')
print('tblReporting loaded \n')```
%load_ext sql
%sql sqlite:///Reporting.db
%%sql
SELECT *
FROM tblReporting
This is the error I am getting
SELECT *
^
SyntaxError: invalid syntax
Note #2: I am using Anaconda Navigator for writing scripts
Solved it!! that's my Syntax
import sqlite3
import pandas as pd
com = sqlite3.connect('Reporting.db')
df.to_sql('tblReporting', com, index=False, if_exists='replace')
print('tblReporting loaded \n')
org_query = '''SELECT * FROM tblReporting'''
df = pd.read_sql_query(org_query, com)
df.head()
Note: added ''' before and after my org_query helped me resolved this
Related
I am trying to make sense of the following error that I started getting when I setup my python code to run on a VM server, which has 3.9.5 installed instead of 3.8.5 on my desktop. Not sure that matters, but it could be part of the reason.
The error
C:\ProgramData\Miniconda3\lib\site-packages\pandas\io\sql.py:758: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) or
database string URI or sqlite3 DBAPI2 connection
other DBAPI2 objects are not tested, please consider using SQLAlchemy
warnings.warn(
This is within a fairly simple .py file that imports pyodbc & sqlalchemy fwiw. A fairly generic/simple version of sql calls that yields the warning is:
myserver_string = "xxxxxxxxx,nnnn"
db_string = "xxxxxx"
cnxn = "Driver={ODBC Driver 17 for SQL Server};Server=tcp:"+myserver_string+";Database="+db_string +";TrustServerCertificate=no;Connection Timeout=600;Authentication=ActiveDirectoryIntegrated;"
def readAnyTable(tablename, date):
conn = pyodbc.connect(cnxn)
query_result = pd.read_sql_query(
'''
SELECT *
FROM [{0}].[dbo].[{1}]
where Asof >= '{2}'
'''.format(db_string,tablename,date,), conn)
conn.close()
return query_result
All the examples I have seen using pyodbc in python look fairly similar. Is pyodbc becoming deprecated? Is there a better way to achieve similar results without warning?
Is pyodbc becoming deprecated?
No. For at least the last couple of years pandas' documentation has clearly stated that it wants either a SQLAlchemy Connectable (i.e., an Engine or Connection object) or a SQLite DBAPI connection. (The switch-over to SQLAlchemy was almost universal, but they continued supporting SQLite connections for backwards compatibility.) People have been passing other DBAPI connections (like pyodbc Connection objects) for read operations and pandas hasn't complained … until now.
Is there a better way to achieve similar results without warning?
Yes. You can take your existing ODBC connection string and use it to create a SQLAlchemy Engine object as described in the SQLAlchemy 1.4 documentation:
from sqlalchemy.engine import URL
connection_string = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=dagger;DATABASE=test;UID=user;PWD=password"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
from sqlalchemy import create_engine
engine = create_engine(connection_url)
Then pass engine to the pandas methods you need to use.
It works for me.
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import pyodbc
import sqlalchemy as sa
import urllib
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
server = 'IP ADDRESS or Server Name'
database = 'AdventureWorks2014'
username = 'xxx'
password = 'xxx'
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER="+server+";"
"DATABASE="+database+";"
"UID="+username+";"
"PWD="+password+";")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
qry = "SELECT t.[group] as [Region],t.name as [Territory],C.[AccountNumber]"
qry = qry + "FROM [Sales].[Customer] C INNER JOIN [Sales].SalesTerritory t on t.TerritoryID = c.TerritoryID "
qry = qry + "where StoreID is not null and PersonID is not null"
with engine.connect() as con:
rs = con.execute(qry)
for row in rs:
print (row)
You can use the SQL Server name or the IP address, but this requires a basic DNS listing. Most corporate servers should already have this listing though. You can check the server name or IP address using the nslookup command in the command prompt followed by the server name or IP address.
I'm using SQL 2017 on Ubuntu server running on VMWare. I'm connecting with IP Address here as part of a wider "running MSSQL on Ubuntu" project.
If you are connecting with your Windows credentials, you can replace the params with the trusted_connection parameter.
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER="+server+";"
"DATABASE="+database+";"
"trusted_connection=yes")
since its a warning, I suppressed the message using the warnings python library. Hope this helps
import warnings
with warnings.catch_warnings(record=True):
warnings.simplefilter("always")
#your code goes here
My company doesn't use SQLAlchemy, preferring to use postgres connections based on pscycopg2 and incorporating other features. If you can run your script directly from a command line, then turning warnings off will solve the problem: start it with python3 -W ignore
The correct way to import for SQLAlchemy 1.4.36 is using:
import pandas as pd
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
#...
conn_str = set_db_info() # see above
conn_url = URL.create("mssql+pyodbc", query={"odbc_connect": conn_str})
engine = create_engine(conn_url)
df = pd.read_sql(SQL, engine)
df.head()
This question already has answers here:
"TypeError: string indices must be integers" when getting data of a stock from Yahoo Finance using Pandas Datareader
(5 answers)
Closed 2 months ago.
One week ago, I ran the following code and did not get an error.
import datetime as dt
import pandas_datareader.yahoo.daily as yd
df1 = yd.YahooDailyReader("SPY", interval='d', start=dt.date(2022,7,1),end=dt.date.today()).read()
However, when I tried the same code today, I got the following error message:
Does anyone know how to solve this problem?
It seems yahoo finance has changed it API or the service is down.
You can use Tiingo API instead (you need to make an account to get an API token)
import pandas_datareader as web
r = web.get_data_tiingo("SPY", api_key=ENV('NEWS_TOKEN'))
see: Remote Data Access#Tiingo
I have the same problem, yfinance is still working
import yfinance as yf
from yahoofinancials import YahooFinancials
my_data = yf.download('TSLA', start='2021-12-17', end='2022-12-17', progress=False)
I'm trying to execute the following DAG in Airflow Composer on google cloud and I keep getting the same error:
The conn_id hard_coded_project_name isn't defined
Maybe someone can point me to the right direction?
from airflow.models import DAG
import os
from airflow.operators.dummy import DummyOperator
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
import datetime
import pandas as pd
from airflow.contrib.operators.gcs_to_bq import GoogleCloudStorageToBigQueryOperator
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from airflow.providers.google.cloud.operators import bigquery
from airflow.contrib.hooks.bigquery_hook import BigQueryHook
default_args = {
'start_date': datetime.datetime(2020, 1, 1),
}
PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "hard_coded_project_name")
def list_dates_in_df():
hook = BigQueryHook(bigquery_conn_id=PROJECT_ID,
use_legacy_sql=False)
bq_client = bigquery.Client(project = hook._get_field("project"),
credentials = hook._get_credentials())
query = "select count(*) from LP_RAW.DIM_ACCOUNT;"
df = bq_client.query(query).to_dataframe()
with DAG(
'df_test',
schedule_interval=None,
catchup = False,
default_args=default_args
) as dag:
list_dates = PythonOperator(
task_id ='list_dates',
python_callable = list_dates_in_df
)
list_dates
It means that PROJECT_ID as seen in line
PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "hard_coded_project_name")
was assigned value hard_coded_project_name since GCP_PROJECT_ID has no value.
Then at line
hook = BigQueryHook(bigquery_conn_id=PROJECT_ID...
the string hard_coded_project_name is automatically associated with a connection id in Airflow and it does not have a value or it does not exist.
To avoid this error you can do either steps to fix this.
Create a connection id for both GCP_PROJECT_ID and hard_coded_project_name just so we are sure that both have values. But if we don't want to create a connection for GCP_PROJECT_ID, make sure that hard_coded_project_name has a value so there will be a fallback option. You can do this by
Opening your Airflow instance.
Click "Admin" > "Connections"
Click "Create"
Fill up "Conn Id", "Conn Type" as "hard_coded_project_name" and "Google Cloud Platform" respectively.
Fill up "Project Id" with your actual project id value
Do these steps another time to create GCP_PROJECT_ID
The connection should look like this (at minimum, providing the projectID will work. But feel free to add the keyfile or its content and scope so you won't be having problems on authentication moving forward):
You can use bigquery_default instead of hard_coded_project_name so by default it will point to the project that runs the Airflow instance.
Your updated PROJECT_ID assignment code will be
PROJECT_ID = os.environ.get("GCP_PROJECT_ID", "bigquery_default")
Also when testing your code you might encounter an error at line
bq_client = bigquery.Client(project = hook._get_field("project")...
since Client() does not exist on from airflow.providers.google.cloud.operators import bigquery you should use from google.cloud import bigquery instead.
Here is a snippet of the test where I only created hard_coded_project_name so PROJECT_ID will use this connection.I got the count of a table of mine and it worked:
Here is a snippet of the test I made when I used bigquery_default where I got the count of a table of mine and it worked:
I am new to postgresql. I would like to insert information from .json and create a new table in Postgresql using python/psycopg2. I have looked over some StackOverflow posts and psychopg2 documentation without getting much further.
The closest question is here, from which I derived the following:
The test .json file is as follows (which only has 1-level i.e. no nested .json structure):
[{"last_update": "2019-02-01"}]
Attempted python code:
import psycopg2
from psycopg2.extras import Json
from psycopg2 import Error
from unipath import Path
import io
def insert_into_table(json_data):
try:
with psycopg2.connect( user = "thisuser",
password = "somePassword",
host = "127.0.0.654165",
port = '5455',
database = "SqlTesting") as conn:
cursor = conn.cursor()
read_json = io.open(data_path, encoding='utf-8')
read_json_all = read_json.readlines()
query = "INSERT INTO new_table VALUES (%s)"
cursor.executemany(query, (read_json_all,))
conn.commit()
print("Json data import successful")
except (Exception, psycopg2.Error) as error:
print("Failed json import: {}".format(error))
insert_into_table(data_path)
The above code didn't work regardless whether new_table didn't exist or if it was created manually as a place-holder.
Rather, it produced the following error message:
Failed json import: relation "new_table" does not exist
LINE 1: INSERT INTO new_table VALUES ('[{"last_update": "2019-02-01"...
During debugging, I saw:
for i in read_json:
print (i)
# will result
# [{"last_update": "2019-02-01"}]
And
print (read_json_all)
# Will result
# ['[{"last_update": "2019-02-01"}]']
I think you might want to use sqlalchemy to put your data into the postgres DB. Below, I used a very simple json file, and created a Pandas DataFrame. I then used sqlalchemy to place it into the DB. Check the code here. It should get you where you want to go.
import psycopg2
import pandas as pd
import sqlalchemy
from sqlalchemy import create_engine
import json
from pandas.io.json import json_normalize
with open('example_1.json') as data_file:
d = json.load(data_file)
def create_table():
conn=psycopg2.connect("dbname='SqlTesting' user='thisuser' password='somePassword' host='localhost' port='5432' ")
cur=conn.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS new_table (color TEXT, fruit TEXT, size TEXT)")
conn.commit()
conn.close()
create_table()
df = json_normalize(d)
engine = create_engine("postgresql+psycopg2://thisuser:somePassword#localhost:5432/SqlTesting")
df.to_sql("new_table", engine, index=False, if_exists='append')
print("Done")
I am having trouble using the google bigquery package in pandas. I have installed the google-api-python-client as well as the pandas-gbq packages. But for some reason when I go to query a table I get a DistributionNotFound: The 'google-api-python-client' distribution was not found and is required by the application error. Here is a snippet of my code:
import pandas as pd
from pandas.io import gbq
count_block = gbq.read_gbq('SELECT count(int64_field_0) as count_blocks FROM Data.bh', projectid)
Using a virtual environment in this scenario can allow you to rule out problems with your library installations