Execute SQL from file in SQLAlchemy - sql

How can I execute whole sql file into database using SQLAlchemy? There can be many different sql queries in the file including begin and commit/rollback.

sqlalchemy.text or sqlalchemy.sql.text
The text construct provides a straightforward method to directly execute .sql files.
from sqlalchemy import create_engine
from sqlalchemy import text
# or from sqlalchemy.sql import text
engine = create_engine('mysql://{USR}:{PWD}#localhost:3306/db', echo=True)
with engine.connect() as con:
with open("src/models/query.sql") as file:
query = text(file.read())
con.execute(query)
SQLAlchemy: Using Textual SQL
text()

I was able to run .sql schema files using pure SQLAlchemy and some string manipulations. It surely isn't an elegant approach, but it works.
# Open the .sql file
sql_file = open('file.sql','r')
# Create an empty command string
sql_command = ''
# Iterate over all lines in the sql file
for line in sql_file:
# Ignore commented lines
if not line.startswith('--') and line.strip('\n'):
# Append line to the command string
sql_command += line.strip('\n')
# If the command string ends with ';', it is a full statement
if sql_command.endswith(';'):
# Try to execute statement and commit it
try:
session.execute(text(sql_command))
session.commit()
# Assert in case of error
except:
print('Ops')
# Finally, clear command string
finally:
sql_command = ''
It iterates over all lines in a .sql file ignoring commented lines.
Then it concatenates lines that form a full statement and tries to execute the statement. You just need a file handler and a session object.

You can do it with SQLalchemy and psycopg2.
file = open(path)
engine = sqlalchemy.create_engine(db_url)
escaped_sql = sqlalchemy.text(file.read())
engine.execute(escaped_sql)

Unfortunately I'm not aware of a good general answer for this. Some dbapi's (psycopg2 for instance) support executing many statements at a time. If the files aren't huge you can just load them into a string and execute them on a connection. For others, I would try to use a command-line client for that db and pipe the data into that using the subprocess module.
If those approaches aren't acceptable, then you'll have to go ahead and implement a small SQL parser that can split the file apart into separate statements. This is really tricky to get 100% correct, as you'll have to factor in database dialect specific literal escaping rules, the charset used, any database configuration options that affect literal parsing (e.g. PostgreSQL standard_conforming_strings).
If you only need to get this 99.9% correct, then some regexp magic should get you most of the way there.

If you are using sqlite3 it has a useful extension to dbapi called conn.executescript(str), I've hooked this up via something like this and it seemed to work: (Not all context is shown but it should be enough to get the drift)
def init_from_script(script):
Base.metadata.drop_all(db_engine)
Base.metadata.create_all(db_engine)
# HACK ALERT: we can do this using sqlite3 low level api, then reopen session.
f = open(script)
script_str = f.read().strip()
global db_session
db_session.close()
import sqlite3
conn = sqlite3.connect(db_file_name)
conn.executescript(script_str)
conn.commit()
db_session = Session()
Is this pure evil I wonder? I looked in vain for a 'pure' sqlalchemy equivalent, perhaps that could be added to the library, something like db_session.execute_script(file_name) ? I'm hoping that db_session will work just fine after all that (ie no need to restart engine) but not sure yet... further research needed (ie do we need to get a new engine or just a session after going behind sqlalchemy's back?)
FYI sqlite3 includes a related routine: sqlite3.complete_statement(sql) if you roll your own parser...

You can access the raw DBAPI connection through this
raw_connection = mySqlAlchemyEngine.raw_connection()
raw_cursor = raw_connection() #get a hold of the proxied DBAPI connection instance
but then it will depend on which dialect/driver you are using which can be referred to through this list.
For pyscog2, you can just do
raw_cursor.execute(open("my_script.sql").read())
but pysqlite you would need to do
raw_cursor.executescript(open("my_script").read())
and in line with that you would need to check the documentation of whichever DBAPI driver you are using to see if multiple statements are allowed in one execute or if you would need to use a helper like executescript which is unique to pysqlite.

Here's how to run the script splitting the statements, and running each statement directly with a "connectionless" execution with the SQLAlchemy Engine. This assumes that each statement ends with a ; and that there's no more than one statement per line.
engine = create_engine(url)
with open('script.sql') as file:
statements = re.split(r';\s*$', file.read(), flags=re.MULTILINE)
for statement in statements:
if statement:
engine.execute(text(statement))

In the current answers, I did not found a solution which works when a combination of these features in the .SQL file is present:
Comments with "--"
Multi-line statements with additional comments after "--"
Function definitions which have multiple SQL-queries ending with ";" butmust be executed as a whole statement
A found a rather simple solution:
# check for /* */
with open(file, 'r') as f:
assert '/*' not in f.read(), 'comments with /* */ not supported in SQL file python interface'
# we check out the SQL file line-by-line into a list of strings (without \n, ...)
with open(file, 'r') as f:
queries = [line.strip() for line in f.readlines()]
# from each line, remove all text which is behind a '--'
def cut_comment(query: str) -> str:
idx = query.find('--')
if idx >= 0:
query = query[:idx]
return query
# join all in a single line code with blank spaces
queries = [cut_comment(q) for q in queries]
sql_command = ' '.join(queries)
# execute in connection (e.g. sqlalchemy)
conn.execute(sql_command)

Code bellow works for me in alembic migrations
from alembic import op
import sqlalchemy as sa
from ekrec.common import get_project_root
def upgrade():
path = f'{get_project_root()}/migrations/versions/fdb8492f75b2_.sql'
op.execute(open(path).read())

I had success with David's answer here, with two slight modifications:
Use get_bind() as I was working with a Session rather than an Engine
Call cursor() on the raw connection
raw_connection = myDbSession.get_bind().raw_connection()
raw_cursor = raw_connection.cursor()
raw_cursor.execute(open("my_script.sql").read())

Related

Issue automating CSV import to an RSQLite DB

I'm trying to automate writing CSV files to an RSQLite DB.
I am doing so by indexing csvFiles, which is a list of data.frame variables stored in the environment.
I can't seem to figure out why my dbWriteTable() code works perfectly fine when I enter it manually but not when I try to index the name and value fields.
### CREATE DB ###
mydb <- dbConnect(RSQLite::SQLite(),"")
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in 1:length(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = csvFiles[i], overwrite=T)
i=i+1
}
# EXAMPLE CODE THAT SUCCESSFULLY MANUAL IMPORTS INTO mydb
dbWriteTable(mydb,"DEPARTMENT",DEPARTMENT)
When I run the for loop above, I'm given this error:
"Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'DEPARTMENT': No such file or directory
# note that 'DEPARTMENT' is the value of csvFiles[1]
Here's the dput output of csvFiles:
c("DEPARTMENT", "EMPLOYEE_PHONE", "PRODUCT", "EMPLOYEE", "SALES_ORDER_LINE",
"SALES_ORDER", "CUSTOMER", "INVOICES", "STOCK_TOTAL")
I've researched this error and it seems to be related to my working directory; however, I don't really understand what to change, as I'm not even trying to manipulate files from my computer, simply data.frames already in my environment.
Please help!
Simply use get() for the value argument as you are passing a string value when a dataframe object is expected. Notice your manual version does not have DEPARTMENT quoted for value.
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in seq_along(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = get(csvFiles[i]), overwrite=T)
}
Alternatively, consider building a list of named dataframes with mget and loop element-wise between list's names and df elements with Map:
dfs <- mget(csvfiles)
output <- Map(function(n, d) dbWriteTable(mydb, name = n, value = d, overwrite=T), names(dfs), dfs)

Execute multiple statements separated by semicolons in RODBC

I have a fairly complex SQL query that I am trying to run through RODBC that involves defining variables. A simplified version looks like this:
DECLARE #VARX CHAR = 'X';
SELECT * FROM TABLE WHERE TYPE = #VARX;
Running this code works just fine. This fails:
library(RODBC)
q <- "DECLARE #VARX CHAR = 'X';\nSELECT * FROM TABLE WHERE TYPE = #VARX;"
sqlQuery(ch, q)
# returns character(0)
I have found through experimentation that the first statement before the semicolon is executed, but the rest is not. There is no error--it just seems that everything after the semicolon is ignored. Is there a way to execute the full query?
I'm using SQL server by the way.
NOTE: I asked this question before and it was marked as a duplicate of this question, but they are asking completely different things. In this question I would like to execute a script that contains multiple statements, and in the other the author is only trying to execute a single statement.
You can try this:
library(RODBC)
library(stringr)
filename = "filename.sql" ### file where the sql code is stored
queries <- readLines(filename) ### read the sql file into R
queries1 = str_replace_all(queries,'--.*$'," ") ### remove any commented lines
queries2 = paste(queries1, collapse = '\n') ### collapse with new lines
queries3 = unlist(str_split(queries2,"(?<=;)")) ### separate individual queries
set up the odbc connection at this point and run the for loop below. you can also modify the queries to add/change variables within the queries before running the for loop
for (i in 1:length(queries3)) {
print(i)
sqlQuery(conn, queries3[i])
}
after the for loop is done, you can pull any volatile or regular tables generated in your session into R using sqlQuery(). I havent tested this extensively and there might be cases where it can fail, but it worked for what I was doing

how to insert utf8 characters into oracle database using robotframework database library

I have a robot script which inserts some sql statements from a sql file; some of these statements contain utf8 characters. If I insert this file manually into database using navicat tool, everything's fine. But when I try to execute this file using database library of robot framework, utf8 characters go crazy!
This is my utf8 included sql statement:
INSERT INTO "MY_TABLE" VALUES (2, 'تست1');
This is how I use database library:
Connect To Database Using Custom Params cx_Oracle ${dbConnection}
Execute Sql Script ${sqlFile}
Disconnect From Database
This is what I get in the database:
������������ 1
I have tried to execute the SQL file using cx_Oracle directly and it's still failing! It seems there is a problem in the original library. This is what I've used for importing SQL file:
import cx_Oracle
if __name__ == "__main__":
dsn_tns = cx_Oracle.makedsn(ip, port, sid)
db = cx_Oracle.connect(username, password, dsn_tns)
sql_commands = open(sql_file_addr, 'r').read().split(";")
cr = db.cursor()
for command in sql_commands:
if not command in ["", "\t", "\n", "\r", "\n\r", "\r\n", None]:
print "Executing SQL command:", command
cr.execute(command)
db.commit()
I have found that I can define character-set in the connection string. I've done it for mysql database and it the framework successfully inserted UTF8 characters into database; this is my connection string for MySQL:
database='db_name', user='db_username', password='db_password', host='db_ip', port=3306, charset='utf8'
But I don't know how to define character-set for Oracle connection string. I have tried this:
'db_username','db_password','db_ip:1521/db_sid','utf8'
And I've got this error:
TypeError: an integer is required
As #Yu Zhang suggested, I read discussion in this link and I found out that I should set an environment variable NLS_LANG in order to have a UTF-8 connection to the database. So I've added below line in my test setup:
os.environ["NLS_LANG"] = "AMERICAN_AMERICA.AL32UTF8"
Would any of links below help?
http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch6unicode.htm#i1006779
http://www.theserverside.com/news/thread.tss?thread_id=39575
https://community.oracle.com/thread/502949
There can be several problems in here...
The first problem might be that you don't save the test files using UTF-8 encoding.
Robot framework expects plain text test files to be saved using UTF-8 encoding, yet most text editors will not save by default using UTF-8.
Verify that your editor saves that way - for example, by opening the file using NotePad++ and choosing Encoding -> UTF-8
Another problem might be the connection to the Oracle database. It doesn't seem like you can configure the connection custom properties to explicitly state UTF-8
This means you probably need to state that the database schema itself is UTF-8

Trouble running SQL queries via RODBC

I have a file called q_cleanup.sql that I am reading into R via readLines(). This file has lots of little queries we wrote to clean up some really ugly data. Once I read the into R and process the text, I run each query in the file.
All of the queries work when run directly through Oracle's SQL Developer and Tora.
Some of the queries fail when run via RODBC.
For example. The file contains the following two queries (cut and pasted out of the file)
update T_HH_TMP
set program_type = 'not able to contact'
where
program_type like '%n0t%'
or program_type like '%not able to%'
;
update T_HH_TMP
set program_type = 'hh substance use'
where program_type like '%hh substance abuse%'
;
The first query runs. The second query errors. Below is the relevant section out of my cleanup.R file. The command odbcStart() is a function I built to simplify opening and closing rodbc connections. It is not the problem.
odbcStart()
qry <- readLines("sql/q_cleanup.sql")
qry <- paste(qry[-grep("--", qry)] , collapse=" ")
qry <- unlist(strsplit(qry, ";"))
for(i in seq_along(qry)) {
print("------------------------------------------------------------")
print(qry[i])
print(sqlQuery(con, qry[i]))
}
odbcClose(com)
I am stripping off anything / everything that I can think of that might cause a problem and my string is wrapped in double quotes and my query contains ONLY single quotes. Yet, the output looks like this:
[1] "------------------------------------------------------------"
[1] " update T_HH_TMP set program_type = 'not able to contact' where program_type like '%n0t%' or program_type like '%not able to%' "
character(0)
[1] "------------------------------------------------------------"
[1] " update T_HH_TMP set program_type = 'hh substance use' where program_type like '\\%hh substance abuse\\%' "
[1] "[RODBC] ERROR: Could not SQLExecDirect ' update T_HH_TMP set program_type = 'hh substance use' where program_type like '\\%hh substance abuse\\%' '"
I do not feel that the % is the problem because the first query runs just fine.
Any help? I really would prefer to script the running of all these queries in R.
I thought I would share what I know. I have a solution, even though I consider it sub-optimal because it complicates my workflow unnecessarily.
I do not know if the problem is caused by Oracle server, SQL Plus or if it has something to do with R / Emacs on Windows. I am not an Oracle expert and the office I work for is moving to Vertica by the end of the summer, so I am not going to invest much more effort in fixing this.
I am using sqlplus.exe to run SQL syntax that creates either a view or stored procedure and I am then running the view / SP via R. Thus, the command I have to pass to Oracle via R is SIMPLE and it can handle it.
To script sqlplus from R, I am using the following function that I will someday improve. It has no error handling and it basically assumes you are being nice, but it does work.
#' queryFile() runs a longish series of queries in a .sql file.
#' It is very important to understand that the path to sqlplus is hardcoded
#' because Windows has a shitty path system. It may not run on another system
#' without being edited.
#'
#' #param file - The relative path to the .sql file.
#' #return output - Vector containing the results from sqlplush
#'
queryFile <- function(file){
cmd <- "c:/Oracle/app/product/11.2.0/client_1/sqlplus.exe %user/%password#%db #%file"
cmd <- gsub("%user", getOption("DataMart")$uid, cmd )
cmd <- gsub("%password", getOption("DataMart")$pwd, cmd )
cmd <- gsub("%db", getOption("DataMart")$db, cmd )
cmd <- gsub("%file", file, cmd )
print(cmd)
output <- system(cmd, intern=TRUE)
return(output)
}
Apparently Markdown does not like my Roxygen style comments. Sorry.
The point of this function is that you pass it the file with the SQL syntax. It uses SQL Plus to run the syntax. To store / access user name, password, etc. I use a file called ~/passwords.R. It has a series of options() commands that look like this:
## Fake example.
options( DataMart = list(
uid = "user_name"
,pwd = "user_password"
,db = "TNS Database"
,con_type = "ODBC"
,srvr_type = "Oracle"
)
)
The last two (cont_type and srvr_type) are just things that I like to have documented. They are not really needed. I have ~ 10 of these in my file and I use this to remind me which db server I am writing against. I have to write against SQL Server, Vertica, MySQL and Oracle (different projects / employers) and this helps me.
The function I provided uses options() to access that necessary information and then runs SQLPlus.exe. I could have added SQLPlus to my Window's path, but I was trying to make this function semi-independent and it seems like our IT people are consistent about where SQL Plus lives (of course there are different versions running around, but at least I don't have to explain the idea of path to someone who is not really a programmer.)

django-admin.py dumpdata to SQL statements

I'm trying to dump my data to SQL statements.
the django-admin.py dumpdata provides only json,xml,yaml.
so:
does someone know a good way to do it?!
I tried that:
def sqldumper(model):
result = ""
units = model.objects.all().values()
for unit in units:
statement = "INSERT INTO myapp.model "+str(tuple(unit.keys())).replace("'", "")+" VALUES " + str(tuple(unit.values()))+"\r\n"
result+=statement
return result
so I'm going over the model values myself, and make the INSERT statement myself.
then I thought of using "django-admin.py sql" to get the "CREATE" statement.. but then I don't know how to use this line from inside my code (and not through the command-line).
I tried os.popen and os.system, but it doesn't really work..
any tips about that?
I'll put it clearly:
how do you use the "manage.py sql " from inside your code?
I add something like this to my view:
import os, sys
import imp
from django.core.management import execute_manager
sys_argv_backup = sys.argv
imp.find_module("settings")
import settings
sys.argv = ['','sql','myapp']
execute_manager(settings)
sys.argv = sys_argv_backup
the thing is - it works.. but it writes the statements to the stdout...
it's something, but not perfect. I'll try using django.core.management.sql.sql_create directly, we'll see how it goes..
thanks
I suggest to use SQL-specific dump program (e.g. mysqldump for MySQL).
For sqlite embedded in Python, you can look this example not involving Django:
# Convert file existing_db.db to SQL dump file dump.sql
import sqlite3, os
con = sqlite3.connect('existing_db.db')
with open('dump.sql', 'w') as f:
for line in con.iterdump():
f.write('%s\n' % line)