Unable to Insert Dataframe into Database Table - sql

I am trying to insert my dataframe into a newly created table in Teradata. My connection and creating the table using SQLAchmey works, but I am unable to insert the data. I keep getting the same error that the schemy columns do not exist.
Here is my code:
username = '..'
password= '..'
server ='...'
database ='..'
driver = 'Aster ODBC Driver'
engine_stmt = ("mssql+pyodbc://%s:%s#%s/%s?driver=%s" % (username, password, server, database, driver ))
engine = sqlalchemy.create_engine(engine_stmt)
conn = engine.raw_connection()
#create tble function
def create_sql_tbl_schema(conn):
#tbl_cols_sql = gen_tbl_cols_sql(df)
sql = "CREATE TABLE so_sandbox.mn_testCreation3 (A INTEGER NULL,B INTEGER NULL,C INTEGER NULL,D INTEGER NULL) DISTRIBUTE BY HASH (A) STORAGE ROW COMPRESS LOW;"
cur = conn2.cursor()
cur.execute('rollback')
cur.execute(sql)
cur.close()
conn.commit()
create_mysql_tbl_schema(conn) #this works and the table is created
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('abcd'))
df.to_sql('mn_testCreation3', con=engine,
schema='so_sandbox', index=False, if_exists='append') #this is giving me problems
Error message returned is:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [AsterData][nCluster] (34) ERROR: relation "INFORMATION_SCHEMA"."COLUMNS" does not exist. (34) (SQLPrepare)') [SQL: 'SELECT [INFORMATION_SCHEMA].[COLUMNS].[TABLE_SCHEMA], [INFORMATION_SCHEMA].[COLUMNS].[TABLE_NAME], [INFORMATION_SCHEMA].[COLUMNS].[COLUMN_NAME], [INFORMATION_SCHEMA].[COLUMNS].[IS_NULLABLE], [INFORMATION_SCHEMA].[COLUMNS].[DATA_TYPE], [INFORMATION_SCHEMA].[COLUMNS].[ORDINAL_POSITION], [INFORMATION_SCHEMA].[COLUMNS].[CHARACTER_MAXIMUM_LENGTH], [INFORMATION_SCHEMA].[COLUMNS].[NUMERIC_PRECISION], [INFORMATION_SCHEMA].[COLUMNS].[NUMERIC_SCALE], [INFORMATION_SCHEMA].[COLUMNS].[COLUMN_DEFAULT], [INFORMATION_SCHEMA].[COLUMNS].[COLLATION_NAME] \nFROM [INFORMATION_SCHEMA].[COLUMNS] \nWHERE [INFORMATION_SCHEMA].[COLUMNS].[TABLE_NAME] = ? AND [INFORMATION_SCHEMA].[COLUMNS].[TABLE_SCHEMA] = ?'] [parameters: ('mn_testCreation3', 'so_sandbox')] (Background on this error at: http://sqlalche.me/e/f405)

Related

pandas read_sql used for CREATE in Postgres db giving TypeError for SQLite

I am trying to use pandas to create functions and tables in postgres.
When I try this query:
conn = = psycopg2.connect(host='localhost', database='my_data',
port='5566', user='postgres', password='postgres')
create_example = '''
create temporary table example(id int primary key, str text, val integer);
insert into example values
(1, 'a', 1),
(2, 'a', 2),
(3, 'b', 2);
'''
pd.read_sql(create_example,conn)
I get the following error TypeError: 'NoneType' object is not iterable
The connection/read_sql works for simple queries but I get the error when I try to CREATE a function or table. Why am I getting the error and why does the error seem to be in SQLite when the connection is to a postgres database? tia
UPDATE: Per the observation by #AdrianKlaver I imported sqlalchemy as follows.
from sqlalchemy import create_engine
def db_connect():
db_connect_string = "postgresql+psycopg2://{user}:{passwd}#{server}:{port}/{db}" \
.format(user="postgres", passwd="postgres",
server="localhost", db="my_data", port="5566")
return create_engine(db_connect_string)
alchemy_conn = db_connect()
So using this alchemy connection:
pd.read_sql(create_example,alchemy_conn.raw_connection())
>>TypeError: 'NoneType' object is not iterable
If I try to use the cursor function I get the error AttributeError: 'psycopg2.extensions.cursor' object has no attribute 'cursor'

Unable to store bytes data to sqlserver

I am planning to store hashed value of password in SQL Server database when a user signs up and when the same user logs in, will compare user entered password with the stored hashed value.
I am using following piece of code to generate hashed value of password and want to insert same value in the database with column datatype varbinary(1000).
I have used following code snippets to insert into database and both options have failed.
insert into users.dbo.allusers values (123456789,
b'\xc8\xc2\x06\x9f\x8e\x96\xad\xb3\x14r\x97Rm"\'\xfdbt\x03\xc81F\xc59\xd03\xcfXs\x88\xff\x95bg\x7f\xd1\xf6\xfc\x98\xe5x~c\x9eb\x91\x89\x80{\x14i0\x99f&\xa5\\e?\xf2\xbd\x06\xf7\xd0',
'a#a.com',
'a',
'b'
)
insert into users.dbo.allusers values (123456789,
convert(varbinary(1000), b'\xc8\xc2\x06\x9f\x8e\x96\xad\xb3\x14r\x97Rm"\'\xfdbt\x03\xc81F\xc59\xd03\xcfXs\x88\xff\x95bg\x7f\xd1\xf6\xfc\x98\xe5x~c\x9eb\x91\x89\x80{\x14i0\x99f&\xa5\\e?\xf2\xbd\x06\xf7\xd0', 1),
'a#a.com',
'a',
'b'
)
The error I am getting is
SQL Error [102] [S0001]: Incorrect syntax near '\xc8\xc2\x06\x9f\x8e\x96\xad\xb3\x14r\x97Rm"'.
I am using cloudsql (gcp product) with SQL Server 2017 standard and dbeaver client to insert data. Any help is really appreciated.
Based on comments I am editing my question. Also used python to insert data to SQL Server using following flask code
def generate_password(password_value):
salt = os.urandom(32)
key = hashlib.pbkdf2_hmac('sha256', password_value.encode('utf-8'), salt, 100000)
# Store them as:
storage = salt + key
return storage
#app.route('/add_new_user', methods = ['POST'])
def add_new_user():
data = request.get_json(silent=True, force=True)
cpf = data.get('cpf')
password = data.get('password')
email = data.get('email')
fname = data.get('fname')
lname = data.get('lname')
password = generate_password(password)
mssqlhost = '127.0.0.1'
mssqluser = 'sqlserver'
mssqlpass = 'sqlserver'
mssqldb = 'users'
try:
# - [x] Establish Connection to db
mssqlconn = pymssql.connect(
mssqlhost, mssqluser, mssqlpass, mssqldb)
print("Connection Established to MS SQL server.")
cursor = mssqlconn.cursor()
stmt = "insert into users.dbo.allusers (cpf, password, email, fname, lname) values (%s,%s,%s,%s,%s)"
data = f'({cpf}, {password}, {email}, {fname}, {lname})'
print(data)
cursor.execute(stmt)
mssqlconn.commit()
mssqlconn.close()
return {"success":"true"}
except Exception as e:
print(e)
return {"success":"false"}
I get different error in command prompt
more placeholders in sql than params available
because data already has quotes because of hash value (printed data)
(123456789, b'6\x17DnOP\xbb\xd0\xdbL\xb6"}\xda6M\x1dX\t\xdd\x12\xec\x059\xbb\xe1/\x1c|\xea\x038\xfd\r\xd1\xcbt\xd6Pe\xcd<W\n\x9f\x89\xd7J\xc1\xbb\xe1\xd0\xd2n\xa7j}\xf7\xf5:\xba0\xab\xbe', a#a.com, a, b)
A binary literal in TSQL looks like 0x0A23...
insert into dbo.allusers(cpf, password, email, fname, lname)
values
(
123456789,
0xC8C2069F8E96. . .,
'a#a.com',
'a',
'b'
)

PySpark - Querying a Oracle Database Directly without loading it entirely beforehand

I am trying to query a database directly:
file_df.createOrReplaceTempView("file_contents")
QUERY = "SELECT * FROM TABLE1 INNER JOIN file_contents on TABLE1.ID = file_contents.ID"
df = sqlContext.read.format("jdbc").options(
url=URL,
driver=DRIVER,
query=QUERY,
user=USER,
password=PASSWORD
).load()
TABLE1 is in the Oracle Database.
However, this code results in the following error:
py4j.protocol.Py4JJavaError: An error occurred while calling o343.load.
: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
How can I fix this? That is I want to not load the large database table and instead query it directly and load only the contents that result from the inner join with the TempView file_contents.
You cannot do it without taking to same platform
Option 1 - Preferred
spark = SparkSession.builder.getOrCreate()
jdbcUrl = "jdbc:oracle:thin:#{0}:{1}/{2}".format("asdas", "1521", "asdasd")
connectionProperties = {
"user" : "asdasd",
"password" : "asdasda",
"driver" : "oracle.jdbc.driver.OracleDriver",
"fetchsize" : "100000"
}
pushdown_query = "(SELECT * FROM TABLE1 ) aliasname"
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
file_df.createOrReplaceTempView("file_contents")
df.createOrReplaceTempView("table1")
spark.sql("SELECT * FROM TABLE1 INNER JOIN file_contents on TABLE1.ID = file_contents.ID")
Option 2
You have to have temp table in the oracle end to load it say table is filecontent so this and then try extract the required.
file_df.write.format('jdbc').options(url=tgt_url,driver=tgtdriver, dbtable=filecontent,user=tgt_username,password=tgt_password).mode("overwrite).option("truncate","true").save()
Option3 - If the if file content is something collected as list and passed in clause
file_df_id= file_df.select("ID").rdd.flatMap(lambda x: x).collect()
query_param = ",".join(file_df_id)
query = f'select * from table1 where TABLE1.ID in ({query_param}) query_temp'
print(query)
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)

SQL error when using format() function with pyodbc in Django

I want to execute a command using pyodbc in my Django app. When I do simple update with one column it works great:
cursor.execute("UPDATE dbo.Table SET attr = 1 WHERE id = {}".format(id))
However when I try to use a string as a column value it throws error:
cursor.execute("UPDATE dbo.Table SET attr = 1, user = '{}' WHERE id = {}".format(id, str(request.user.username)))
Here's error message:
('42S22', "[42S22] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid column name 'Admin'. (207) (SQLExecDirectW)")
Suprisingly this method works:
cursor.execute("UPDATE dbo.Table SET attr = 1, user = 'Admin' WHERE id = {}".format(id))
What seems to be the problem? Why is sql mistaking column value for its name?
As mentioned above, you have your arguments backwards, but if you're going to use cursor.execute(), the far more important thing to do is use positional parameters (%s). This will pass the SQL and values separately to the database backend, and protect you from SQL injection:
from django.db import connection
cursor = connection.cursor()
cursor.execute("""
UPDATE dbo.Table
SET attr = 1,
user = %s
WHERE id = %s
""", [
request.user.username,
id,
])
You've got your format arguments backwards. You're passing id to user, and username to the id WHERE clause.

saved data frame is not shown correctly in sql server

I have data frame named distTest which have columns with UTF-8 format. I want to save the distTest as table in my sql database. My code is as follows;
library(RODBC)
load("distTest.RData")
Sys.setlocale("LC_CTYPE", "persian")
dbhandle <- odbcDriverConnect('driver={SQL Server};server=****;database=TestDB;
trusted_connection=true',DBMSencoding="UTF-8" )
Encoding(distTest$regsub)<-"UTF-8"
Encoding(distTest$subgroup)<-"UTF-8"
sqlSave(dbhandle,distTest,
tablename = "DistBars", verbose = T, rownames = FALSE, append = TRUE)
I considered DBMSencoding for my connection and encodings Encoding(distTest$regsub)<-"UTF-8"
Encoding(distTest$subgroup)<-"UTF-8"
for my columns. However, when I save it to sql the columns are not shown in correct format, and they are like this;
When I set fast in sqlSave function to FALSE, I got this error;
Error in sqlSave(dbhandle, Distbars, tablename = "DistBars", verbose =
T, : 22001 8152 [Microsoft][ODBC SQL Server Driver][SQL
Server]String or binary data would be truncated. 01000 3621
[Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been
terminated. [RODBC] ERROR: Could not SQLExecDirect 'INSERT INTO
"DistBars" ( "regsub", "week", "S", "A", "F", "labeled_cluster",
"subgroup", "windows" ) VALUES ( 'ظâ€', 5, 4, 2, 3, 'cl1', 'ط­ظ…ظ„
ط²ط¨ط§ظ„ظ‡', 1 )'
I also tried NVARCHAR(MAX) for utf-8 column in the design of table with fast=false the error gone, but the same error with format.
By the way, a part of data is exported as RData in here.
I want to know why the data format is not shown correctly in sql server 2016?
UPDATE
I am fully assured that there is something wrong with RODBC package.
I tried inserting to table by
sqlQuery(channel = dbhandle,"insert into DistBars
values(N'7من',NULL,NULL,NULL,NULL,NULL,NULL,NULL)")
as a test, and the format is still wrong. Unfortunately, adding CharSet=utf8; to connection string does not either work.
I had the same issue in my code and I managed to fix it eliminating rows_at_time = 1 from my connection configuration.