Load Teradata table from Python Pandas Dataframe - dataframe

I am getting below error while trying to load Teradata table from Python Pandas, Any idea ?
teradatasql and pandas - writing dataframe into TD table - Error 3707 - Syntax error, expected something like '(' between the 'type' keyword and '='
import teradatasql
import pandas as pd
conTD = teradatasql.connect(host=Host, user=User, password=Passwd, logmech="LDAP", encryptdata="true")
df.to_sql(tableName, conTD, schema=schemaName, if_exists='fail', index=False)

Related

How to migrate pandas read_sql from psycopg2 to sqlalchemy with a tuple as one of the query params

With pandas=1.4.0, it emits a Warning about not using psycopg2 directly within read_sql, but to use sqlalchemy. While attempting to do such a migration, I can not resolve how to pass a tuple as one of the query parameters. For example, this presently works:
import pandas as pd
import psycopg2
read_sql(
"SELECT * from news where id in %s",
psycopg2.connect("dbname=mydatabase"),
params=[(1, 2, 3),],
)
attempting to migrate this to sqlalchemy like so:
import pandas as pd
read_sql(
"SELECT * from news where id in %s",
"postgresql://localhost/mydatabase",
params=[(1, 2, 3),],
)
results in
...snipped...
File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
TypeError: not all arguments converted during string formatting
So how do I pass a tuple as a params argument within pandas read_sql?
Wrap your query with a SQLAlchemy text object, use named parameters and pass the parameter values as a dictionary:
import pandas as pd
from sqlalchemy import text
read_sql(
text("SELECT * from news where id in :ids"),
"postgresql://localhost/mydatabase",
params={'id': (1, 2, 3),},
)

snowflake.connector SQL compilation error invalid identifier from pandas dataframe

I'm trying to ingest a df I created from a json response into an existing table (the table is currently empty because I can't seem to get this to work)
The df looks something like the below table:
index
clicks_affiliated
0
3214
1
2221
but I'm seeing the following error:
snowflake.connector.errors.ProgrammingError: 000904 (42000): SQL
compilation error: error line 1 at position 94
invalid identifier '"clicks_affiliated"'
and the column names in snowflake match to the columns in my dataframe.
This is my code:
import pandas as pd
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
import snowflake.connector
from snowflake.connector.pandas_tools import write_pandas, pd_writer
from pandas import json_normalize
import requests
df_norm = json_normalize(json_response, 'reports')
#I've tried also adding the below line (and removing it) but I see the same error
df = df_norm.reset_index(drop=True)
def create_db_engine(db_name, schema_name):
engine = URL(
account="ab12345.us-west-2",
user="my_user",
password="my_pw",
database="DB",
schema="PUBLIC",
warehouse="WH1",
role="DEV"
)
return engine
def create_table(out_df, table_name, idx=False):
url = create_db_engine(db_name="DB", schema_name="PUBLIC")
engine = create_engine(url)
connection = engine.connect()
try:
out_df.to_sql(
table_name, connection, if_exists="append", index=idx, method=pd_writer
)
except ConnectionError:
print("Unable to connect to database!")
finally:
connection.close()
engine.dispose()
return True
print(df.head)
create_table(df, "reporting")
So... it turns out I needed to change my columns in my dataframe to uppercase
I've added this after the dataframe creation to do so and it worked:
df.columns = map(lambda x: str(x).upper(), df.columns)

Unable to write dataframe of pyspark into mysql database [duplicate]

I am attempting to insert records into a MySql table. The table contains id and name as columns.
I am doing like below in a pyspark shell.
name = 'tester_1'
id = '103'
import pandas as pd
l = [id,name]
df = pd.DataFrame([l])
df.write.format('jdbc').options(
url='jdbc:mysql://localhost/database_name',
driver='com.mysql.jdbc.Driver',
dbtable='DestinationTableName',
user='your_user_name',
password='your_password').mode('append').save()
I am getting the below attribute error
AttributeError: 'DataFrame' object has no attribute 'write'
What am I doing wrong? What is the correct method to insert records into a MySql table from pySpark
Use Spark DataFrame instead of pandas', as .write is available on Spark Dataframe only
So the final code could be
data =['103', 'tester_1']
df = sc.parallelize(data).toDF(['id', 'name'])
df.write.format('jdbc').options(
url='jdbc:mysql://localhost/database_name',
driver='com.mysql.jdbc.Driver',
dbtable='DestinationTableName',
user='your_user_name',
password='your_password').mode('append').save()
Just to add #mrsrinivas answer's.
Make sure that you have jar location of sql connector available in your spark session. This code helps:
spark = SparkSession\
.builder\
.config("spark.jars", "/Users/coder/Downloads/mysql-connector-java-8.0.22.jar")\
.master("local[*]")\
.appName("pivot and unpivot")\
.getOrCreate()
otherwise it will throw an error.

Merge multiple csv files in python

Need help with merging multiple csv file
import pandas as pd
import glob
import csv
r1=glob.glob("path/*.csv")
wr1 = csv.writer(open("path/merge.csv",'wb'),delimiter = ',')
for files in r1:
rd=csv.reader(open(files,'r'), delimiter=',')
for row in rd:
print(row)
wr1.writerow(row)
I am getting a type error
TypeError: a bytes-like object is required, not 'str' not sure how to resolve this
Using pandas you can do it like this:
dfs = glob.glob('path/*.csv')
result = pd.concat([pd.read_csv(df) for df in dfs], ignore_index=True)
result.to_csv('path/merge.csv', ignore_index=True)

Performing math operations when plotting Pandas Dataframe columns

I'd like to plot products, ratios, etc of columns in a Pandas Data Frame without first creating a new column containing that product, ratio, etc. E.g.,
[df['A']/df['A']].plot()
doesn't work. For the following code:
x = np.array([[1,2,3],[4,5,6]])
df = pd.DataFrame(x,columns=['A','B','C'])
[df['A']/df['B']].plot()
I get the following error message: "AttributeError: 'list' object has no attribute 'plot' "
The division operation which you are doing in this line:
[df['A']/df['B']].plot()
returns a python list object instead of pandas object.
If you want to plot a particular column first without adding it to the dataframe, you can try this:
import pandas as pd
import numpy as np
x = np.array([[1,2,3],[4,5,6]])
df = pd.DataFrame(x,columns=['A','B','C'])
df['A'].div(df['B']).plot()
which returns a <matplotlib.axes._subplots.AxesSubplot> object