How do I pass a pandas dataframe to a PLSQL procedure as a parameter using cx_Oracle - pandas

I need to pass the pandas dataframe to the plsql procedure as sysrefcursor type input. I tried the below code but I get an error.
import cx_Oracle
import pandas as pd
import numpy as np
# create a Pandas DataFrame
df = pd.DataFrame({'column1': [1, 2, 3], 'column2': ['a', 'b', 'c']})
# establish a connection to Oracle
dsn_tns = cx_Oracle.makedsn('hostname', '1521', service_name='servicename') # if needed, place an 'r' before any parameter in order to address special characters such as '\'.
conn = cx_Oracle.connect(user=r'username', password='password', dsn=dsn_tns) # if needed, place an 'r' before any parameter in order to address special characters such as '\'. For example, if your user name contains '\', you'll need to place 'r' before the user name: user=r'User Name'
cur = conn.cursor()
cur.execute("""
CREATE OR REPLACE PROCEDURE my_proc (
p_data IN SYS_REFCURSOR
) AS
BEGIN
null;
END;
""")
cur.callproc('my_proc', df)
conn.commit()
cur.close()
conn.close()
I get an error NotSupportedError: Python value of type DataFrame not supported.
So if this is not supported. How do I pass the dataframe to the Oracle Procedure?
I tried to save the pandas dataframe to some temp table using
df.to_sql('my_temp_table', con=conn, index=False, if_exists="replace")
so that i can just call the procedure and use this table inside the proc.
But looks like i have to pass sqlalchemy connection to to_sql and not cx_Oracle's
Any help is greatly appreciated!

Related

Writing a scalable INSERT statement using cx_Oracle

I am attempting to write a script that will allow me to insert values from an uploaded dataframe into a table inside of an Oracle DB; but my issue lies with
too many columns to hard-code
columns aren't one-to-one
What I'm hoping for is a way to write out the columns, check to see if they sync with the columns of my dataframe and from there use an INSERT VALUES sql statement to input the values from the dataframe to the ODS table.
so far these are the important parts of my script:
import pandas as pd
import cx_Oracle
import config
df = pd.read_excel("Employee_data.xlsx")
conn = None
try:
conn = cx_Oracle.connect(config.username, config.password, config.dsn, encoding=config.encoding)
except cx_Oracle.Error as error:
print(error)
finally:
cursor = conn.cursor
sql = "SELECT * FROM ODSMGR.EMPLOYEE_TABLE"
cursor.execute(sql)
data = cursor.fetchall()
col_names = []
for i in range(0, len(cursor.description)):
col_names.append(cursor.description[i][0])
#instead of using df.columns I use:
rows = [tuple(x) for x in df.values]
which prints my ODS column names, and allows me to conveniently store my rows from the df in an array but I'm at a loss for how to import these to the ODS. I found something like:
cursor.execute("insert into ODSMGR.EMPLOYEE_TABLE(col1,col2) values (:col1, :col2)", {":col1df":df, "col2df:df"})
but that'll mean I'll have to hard-code everything which wouldn't be scalable. I'm hoping I can get some sort of insight to help. It's just difficult since the columns aren't 1-to-1 and that there is some compression/collapsing of columns from the DF to the ODS but any help is appreciated.
NOTE: I've also attempted to use SQLalchemy but I am always given an error "ORA-12505: TNS:listener does not currently know of SID given in connect descriptor" which is really strange given that I am able to connect with cx_Oracle
EDIT 1:
I was able to get a list of columns that share the same name; so after running:
import numpy as np
a = np.intersect1d(df.columns, col_names)
print("common columns:", a)
I was able to get a list of columns that the two datasets share.
I also tried to use this as my engine:
engine = create_engine("oracle+cx_oracle://username:password#ODS-test.domain.com:1521/?ODS-Test")
dtyp = {c:types.VARCHAR(df[c].str.len().max())
for c in df.columns[df.dtypes=='object'].tolist()}
df.to_sql('ODS.EMPLOYEE_TABLE', con = engine, dtype=dtyp, if_exists='append')
which has given me nothing but errors.

snowflake.connector SQL compilation error invalid identifier from pandas dataframe

I'm trying to ingest a df I created from a json response into an existing table (the table is currently empty because I can't seem to get this to work)
The df looks something like the below table:
index
clicks_affiliated
0
3214
1
2221
but I'm seeing the following error:
snowflake.connector.errors.ProgrammingError: 000904 (42000): SQL
compilation error: error line 1 at position 94
invalid identifier '"clicks_affiliated"'
and the column names in snowflake match to the columns in my dataframe.
This is my code:
import pandas as pd
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
import snowflake.connector
from snowflake.connector.pandas_tools import write_pandas, pd_writer
from pandas import json_normalize
import requests
df_norm = json_normalize(json_response, 'reports')
#I've tried also adding the below line (and removing it) but I see the same error
df = df_norm.reset_index(drop=True)
def create_db_engine(db_name, schema_name):
engine = URL(
account="ab12345.us-west-2",
user="my_user",
password="my_pw",
database="DB",
schema="PUBLIC",
warehouse="WH1",
role="DEV"
)
return engine
def create_table(out_df, table_name, idx=False):
url = create_db_engine(db_name="DB", schema_name="PUBLIC")
engine = create_engine(url)
connection = engine.connect()
try:
out_df.to_sql(
table_name, connection, if_exists="append", index=idx, method=pd_writer
)
except ConnectionError:
print("Unable to connect to database!")
finally:
connection.close()
engine.dispose()
return True
print(df.head)
create_table(df, "reporting")
So... it turns out I needed to change my columns in my dataframe to uppercase
I've added this after the dataframe creation to do so and it worked:
df.columns = map(lambda x: str(x).upper(), df.columns)

psycopg2.errors.InvalidTextRepresentation while using COPY in postgresql

I am using a custom callable to pandas.to_sql(). The below snippet is from pandas documentation for using it
import csv
from io import StringIO
def psql_insert_copy(table, conn, keys, data_iter):
"""
Execute SQL statement inserting data
Parameters
----------
table : pandas.io.sql.SQLTable
conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
keys : list of str
Column names
data_iter : Iterable that iterates the values to be inserted
"""
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)
columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name)
else:
table_name = table.name
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
but while using this copy functionality, I am getting the error
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for integer: "3.0"
This is not a problem with input as this table schemas and values where working initially, when I have used to_sql() function without using the custom callable 'psql_insert_copy()'. I am using sqlalchemy engine for getting the connection cursor
I would recommend using string fields in the table for such actions, or writing the entire (sql) script manually, indicating the types of table fields

The type of <field> is not a SQLAlchemy type with Pandas to_sql to an Oracle database

I have a pandas dataframe that has several categorical fields.
SQLAlchemy throws a exception "The type of is not a SQLAlchemy type".
I've tried converting the object fields back to string, but get the same error.
dfx = pd.DataFrame()
for col_name in df.columns:
if(df[col_name].dtype == 'object'):
dfx[col_name] = df[col_name].astype('str').copy()
else:
dfx[col_name] = df[col_name].copy()
print(col_name, dfx[col_name].dtype)
.
dfx.to_sql('results', con=engine, dtype=my_dtypes, if_exists='append', method='multi', index=False)
the new dfx seems to have the same categoricals despite creating a new table with .copy()
Also, as a side note, why does to_sql() generate a CREATE TABLE with CLOBs?
No need to use the copy() function here, and you should not have to convert from 'object' to 'str' either.
Are you writing to an Oracle database? The default output type for text data (including 'object') is CLOB. You can get around it by specifying the dtype to use. For example:
import pandas as pd
from sqlalchemy import types, create_engine
from sqlalchemy.exc import InvalidRequestError
conn = create_engine(...)
testdf = pd.DataFrame({'pet': ['dog','cat','mouse','dog','fish','pony','cat']
, 'count': [2,6,12,1,45,1,3]
, 'x': [105.3, 98.7, 112.4, 3.6, 48.9, 208.9, -1.7]})
test_types = dict(zip(
testdf.columns.tolist(),
(types.VARCHAR(length=20), types.Integer(), types.Float()) ))
try:
testdf.to_sql( name="test", schema="myschema"
, con=conn
, if_exists='replace' #'append'
, index=False
, dtype = test_types)
print (f"Wrote final input dataset to table {schema}.{table2}")
except (ValueError, InvalidRequestError):
print ("Could not write to table 'test'.")
If you are not writing to Oracle, please specify your target database - perhaps someone else with experience in that DBMS can advise you.
What #eknumbat is absolutely correct. For AWS Redshift, you can do the following. Note you can find all of the sqlalchemy datatypes here (https://docs.sqlalchemy.org/en/14/core/types.html)
import pandas as pd
from sqlalchemy.types import VARCHAR, INTEGER, FLOAT
from sqlalchemy import create_engine
conn = create_engine(...)
testdf = pd.DataFrame({'pet': ['dog','cat','mouse','dog','fish','pony','cat'],
'count': [2,6,12,1,45,1,3],
'x': [105.3, 98.7, 112.4, 3.6, 48.9, 208.9, -1.7]})
test_types = {'pet': VARCHAR, 'count': Integer, 'x': Float}
testdf.to_sql(name="test",
schema="myschema".
con=conn,
if_exists='replace',
index=False,
dtype = test_types)

HOW - to convert a python generator to pandas dataframe

Im very new to python and pandas dataframe and im struggling to wrap my head around how to convert a python generator to a pandas dataframe.
What i want to do is to fetch a large table into chunks with this function that yields a generator:
def fetch_data_into_chunks(cursor, arraysize=10**5):
while True:
results = cursor.fetchmany(arraysize)
if not results:
break
for result in results:
yield result
Then i want to append or concat the result to a pandas dataframe:
for data in fetch_data_into_chunks(cursor):
df.append(data)
But this doesnt works and give me the error message:
TypeError: cannot concatenate object of type "<class 'pyodbc.Row'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
Thanks for the help!
Assuming you have a connection to a sql database, you can use Pandas's built-in read_sql method and specify a chunksize. This is in itself a generator, which you can iterate through to create a single dataframe.
In this example, sql is your sql query and conn is the connection to your database.
def fetch_data(sql, chunksize=10**5):
df = pd.DataFrame()
reader = pd.read_sql(sql,
conn,
chunksize=chunksize)
for chunk in reader:
df = pd.concat([df, chunk], ignore_index=True)
return df