Trouble running SQL queries via RODBC - sql

I have a file called q_cleanup.sql that I am reading into R via readLines(). This file has lots of little queries we wrote to clean up some really ugly data. Once I read the into R and process the text, I run each query in the file.
All of the queries work when run directly through Oracle's SQL Developer and Tora.
Some of the queries fail when run via RODBC.
For example. The file contains the following two queries (cut and pasted out of the file)
update T_HH_TMP
set program_type = 'not able to contact'
where
program_type like '%n0t%'
or program_type like '%not able to%'
;
update T_HH_TMP
set program_type = 'hh substance use'
where program_type like '%hh substance abuse%'
;
The first query runs. The second query errors. Below is the relevant section out of my cleanup.R file. The command odbcStart() is a function I built to simplify opening and closing rodbc connections. It is not the problem.
odbcStart()
qry <- readLines("sql/q_cleanup.sql")
qry <- paste(qry[-grep("--", qry)] , collapse=" ")
qry <- unlist(strsplit(qry, ";"))
for(i in seq_along(qry)) {
print("------------------------------------------------------------")
print(qry[i])
print(sqlQuery(con, qry[i]))
}
odbcClose(com)
I am stripping off anything / everything that I can think of that might cause a problem and my string is wrapped in double quotes and my query contains ONLY single quotes. Yet, the output looks like this:
[1] "------------------------------------------------------------"
[1] " update T_HH_TMP set program_type = 'not able to contact' where program_type like '%n0t%' or program_type like '%not able to%' "
character(0)
[1] "------------------------------------------------------------"
[1] " update T_HH_TMP set program_type = 'hh substance use' where program_type like '\\%hh substance abuse\\%' "
[1] "[RODBC] ERROR: Could not SQLExecDirect ' update T_HH_TMP set program_type = 'hh substance use' where program_type like '\\%hh substance abuse\\%' '"
I do not feel that the % is the problem because the first query runs just fine.
Any help? I really would prefer to script the running of all these queries in R.

I thought I would share what I know. I have a solution, even though I consider it sub-optimal because it complicates my workflow unnecessarily.
I do not know if the problem is caused by Oracle server, SQL Plus or if it has something to do with R / Emacs on Windows. I am not an Oracle expert and the office I work for is moving to Vertica by the end of the summer, so I am not going to invest much more effort in fixing this.
I am using sqlplus.exe to run SQL syntax that creates either a view or stored procedure and I am then running the view / SP via R. Thus, the command I have to pass to Oracle via R is SIMPLE and it can handle it.
To script sqlplus from R, I am using the following function that I will someday improve. It has no error handling and it basically assumes you are being nice, but it does work.
#' queryFile() runs a longish series of queries in a .sql file.
#' It is very important to understand that the path to sqlplus is hardcoded
#' because Windows has a shitty path system. It may not run on another system
#' without being edited.
#'
#' #param file - The relative path to the .sql file.
#' #return output - Vector containing the results from sqlplush
#'
queryFile <- function(file){
cmd <- "c:/Oracle/app/product/11.2.0/client_1/sqlplus.exe %user/%password#%db #%file"
cmd <- gsub("%user", getOption("DataMart")$uid, cmd )
cmd <- gsub("%password", getOption("DataMart")$pwd, cmd )
cmd <- gsub("%db", getOption("DataMart")$db, cmd )
cmd <- gsub("%file", file, cmd )
print(cmd)
output <- system(cmd, intern=TRUE)
return(output)
}
Apparently Markdown does not like my Roxygen style comments. Sorry.
The point of this function is that you pass it the file with the SQL syntax. It uses SQL Plus to run the syntax. To store / access user name, password, etc. I use a file called ~/passwords.R. It has a series of options() commands that look like this:
## Fake example.
options( DataMart = list(
uid = "user_name"
,pwd = "user_password"
,db = "TNS Database"
,con_type = "ODBC"
,srvr_type = "Oracle"
)
)
The last two (cont_type and srvr_type) are just things that I like to have documented. They are not really needed. I have ~ 10 of these in my file and I use this to remind me which db server I am writing against. I have to write against SQL Server, Vertica, MySQL and Oracle (different projects / employers) and this helps me.
The function I provided uses options() to access that necessary information and then runs SQLPlus.exe. I could have added SQLPlus to my Window's path, but I was trying to make this function semi-independent and it seems like our IT people are consistent about where SQL Plus lives (of course there are different versions running around, but at least I don't have to explain the idea of path to someone who is not really a programmer.)

Related

How to run an SQL file with sqlplus in Powershell ISE

I want to execute an SQL file with sqlplus, but when I try to in Powershell ISE the result says how to use sqlplus. The result I get
The code I used in the example in ISE is:
sqlplus "username/password#database #C:Path\To\file.sql"
But when I run this code in CMD or regular Powershell it works without problems. The result is just some dummy Select 1 from dual.
I have tried to put the path in a single qoute( ' ) with and without the # (inside and outside of the quote) but nothing is working. I also didn't find much when googling the issue.
I also tried just to connect and it works without problems, although I can't type anything after it connects. Result with just the connect
because you are doing wrong
the real syntex is
sqlplus username/password#TnsAlias 'c:\path\to\DBscript.sql' | out-file 'c:\temp\sql- output.txt'
I think you (') use early.
or try this without outfile
$output = sqlplus username/password#TnsAlias 'c:\path\to\DBscript.sql'
store in variable

How to create a SQL view when using multiple go statements? [duplicate]

How can I execute the following SQL inside a single command (single execution) through ADO.NET?
ALTER TABLE [MyTable]
ADD NewCol INT
GO
UPDATE [MyTable]
SET [NewCol] = 1
The batch separator GO is not supported, and without it the second statement fails.
Are there any solutions to this other than using multiple command executions?
The GO keyword is not T-SQL, but a SQL Server Management Studio artifact that allows you to separate the execution of a script file in multiple batches.I.e. when you run a T-SQL script file in SSMS, the statements are run in batches separated by the GO keyword. More details can be found here: https://msdn.microsoft.com/en-us/library/ms188037.aspx
If you read that, you'll see that sqlcmd and osql do also support GO.
SQL Server doesn't understand the GO keyword. So if you need an equivalent, you need to separate and run the batches individually on your own.
Remove the GO:
String sql = "ALTER TABLE [MyTable] ADD NewCol INT;";
cmd = new SqlCommand(sql, conn);
cmd.ExecuteNonQuery();
sql = "UPDATE [MyTable] SET [NewCol] = 1";
cmd = new SqlCommand(sql, conn);
cmd.ExecuteNonQuery();
It seems that you can use the Server class for that. Here is an article:
C#: Executing batch T-SQL Scripts containing GO statements
In SSMS (SQL Server Management System), you can run GO after any query, but there's a catch. You can't have the semicolon and the GO on the same line. Go figure.
This works:
SELECT 'This Works';
GO
This works too:
SELECT 'This Too'
;
GO
But this doesn't:
SELECT 'This Doesn''t Work'
;GO
This can also happen when your batch separator has been changed in your settings. In SSMS click on Tools --> Options and go to Query Execution/SQL Server/General to check that batch separator.
I've just had this fail with a script that didn't have CR LF line endings. Closing and reopening the script seems to prompt a fix. Just another thing to check for!
Came across this trying to determine why my query was not working in SSRS. You don't use GO in SSRS, instead use semicolons between your different statements.
I placed a semicolon ; after the GO, which was the cause of my error.
You will also get this error if you have used IF statements and closed them incorrectly.
Remember that you must use BEGIN/END if your IF statement is longer than one line.
This works:
IF ##ROWCOUNT = 0
PRINT 'Row count is zero.'
But if you have two lines, it should look like this:
IF ##ROWCOUNT = 0
BEGIN
PRINT 'Row count is zero.'
PRINT 'You should probably do something about that.'
END
I got this error message when I placed the 'GO' keyword after a sql query in the same line, like this:
insert into fruits (Name) values ('Apple'); GO
Writing this in two separate lines run. Maybe this will help someone...
I first tried to remove GO statements by pattern matching on (?:\s|\r?\n)+GO(?:\s|\r?\n)+ regex but found more issues with our SQL scripts that were not compatible for SQL Command executions.
However, thanks to #tim-schmelter answer, I ended up using Microsoft.SqlServer.SqlManagementObjects package.
string sqlText;
string connectionString = #"Data Source=(localdb)\MSSQLLocalDB;Initial Catalog=FOO;Integrated Security=True;";
var sqlConnection = new System.Data.SqlClient.SqlConnection(connectionString);
var serverConnection = new Microsoft.SqlServer.Management.Common.ServerConnection(sqlConnection);
var server = new Microsoft.SqlServer.Management.Smo.Server(serverConnection);
int result = server.ConnectionContext.ExecuteNonQuery(sqlText);

Execute multiple statements separated by semicolons in RODBC

I have a fairly complex SQL query that I am trying to run through RODBC that involves defining variables. A simplified version looks like this:
DECLARE #VARX CHAR = 'X';
SELECT * FROM TABLE WHERE TYPE = #VARX;
Running this code works just fine. This fails:
library(RODBC)
q <- "DECLARE #VARX CHAR = 'X';\nSELECT * FROM TABLE WHERE TYPE = #VARX;"
sqlQuery(ch, q)
# returns character(0)
I have found through experimentation that the first statement before the semicolon is executed, but the rest is not. There is no error--it just seems that everything after the semicolon is ignored. Is there a way to execute the full query?
I'm using SQL server by the way.
NOTE: I asked this question before and it was marked as a duplicate of this question, but they are asking completely different things. In this question I would like to execute a script that contains multiple statements, and in the other the author is only trying to execute a single statement.
You can try this:
library(RODBC)
library(stringr)
filename = "filename.sql" ### file where the sql code is stored
queries <- readLines(filename) ### read the sql file into R
queries1 = str_replace_all(queries,'--.*$'," ") ### remove any commented lines
queries2 = paste(queries1, collapse = '\n') ### collapse with new lines
queries3 = unlist(str_split(queries2,"(?<=;)")) ### separate individual queries
set up the odbc connection at this point and run the for loop below. you can also modify the queries to add/change variables within the queries before running the for loop
for (i in 1:length(queries3)) {
print(i)
sqlQuery(conn, queries3[i])
}
after the for loop is done, you can pull any volatile or regular tables generated in your session into R using sqlQuery(). I havent tested this extensively and there might be cases where it can fail, but it worked for what I was doing

How to read the contents of an .sql file into an R script to run a query?

I have tried the readLines and the read.csv functions but then don't work.
Here is the contents of the my_script.sql file:
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE HireDate >= '1-july-1993'
and it is saved on my Desktop.
Now I want to run this query from my R script. Here is what I have:
conn = connectDb()
fileName <- "C:\\Users\\me\\Desktop\\my_script.sql"
query <- readChar(fileName, file.info(fileName)$size)
query <- gsub("\r", " ", query)
query <- gsub("\n", " ", query)
query <- gsub("", " ", query)
recordSet <- dbSendQuery(conn, query)
rate <- fetch(recordSet, n = -1)
print(rate)
disconnectDb(conn)
And I am not getting anything back in this case. What can I try?
I've had trouble with reading sql files myself, and have found that often times the syntax gets broken if there are any single line comments in the sql. Since in R you store the sql statement as a single line string, if there are any double dashes in the sql it will essentially comment out any code after the double dash.
This is a function that I typically use whenever I am reading in a .sql file to be used in R.
getSQL <- function(filepath){
con = file(filepath, "r")
sql.string <- ""
while (TRUE){
line <- readLines(con, n = 1)
if ( length(line) == 0 ){
break
}
line <- gsub("\\t", " ", line)
if(grepl("--",line) == TRUE){
line <- paste(sub("--","/*",line),"*/")
}
sql.string <- paste(sql.string, line)
}
close(con)
return(sql.string)
}
I've found for queries with multiple lines, the read_file() function from the readr package works well. The only thing you have to be mindful of is to avoid single quotes (double quotes are fine). You can even add comments this way.
Example query, saved as query.sql
SELECT
COUNT(1) as "my_count"
-- comment goes here
FROM -- tabs work too
my_table
I can then store the results in a data frame with
df <- dbGetQuery(con, statement = read_file('query.sql'))
You can use the read_file() function from the readr package.
fileName = read_file("C:/Users/me/Desktop/my_script.sql")
You will get a string variable fileName with the desired text.
Note: Use / instead of \\\
The answer by Matt Jewett is quite useful, but I wanted to add that I sometimes encounter the following warning when trying to read .sql files generated by sql server using that answer:
Warning message: In readLines(con, n = 1) : line 1 appears to contain
an embedded nul
The first line returned by readLines is often "ÿþ" in these cases (i.e. the UTF-16 byte order mark) and subsequent lines are not read properly. I solved this by opening the sql file in Microsoft SQL Server Management Studio and selecting
File -> Save As ...
then on the small downarrow next to the save button selecting
Save with Encoding ...
and choosing
Unicode (UTF-8 without signature) - Codepage 65001
from the Encoding dropdown menu.
If you do not have Microsoft SQL Server Management Studio and are using a Windows machine, you could also try opening the file with the default text editor and then selecting
File -> Save As ...
Encoding: UTF-8
to save with a .txt file extension.
Interestingly changing the file within Microsoft SQL Server Management Studio removes the BOM (byte order mark) altogether, whereas changing the file within the text editor converts the BOM to the UTF-8 BOM but nevertheless causes the query to be properly read using the referenced answer.
The combination of readr and textclean works well without having to create any new functions. read_file() reads the file into a character vector and replace_white() ensures all escape sequence characters are removed from your .sql file. Note: Does cause problems if you have comments in your SQL string !!
library(readr)
library(textclean)
SQL <- replace_white(read_file("file_path")))

Execute SQL from file in SQLAlchemy

How can I execute whole sql file into database using SQLAlchemy? There can be many different sql queries in the file including begin and commit/rollback.
sqlalchemy.text or sqlalchemy.sql.text
The text construct provides a straightforward method to directly execute .sql files.
from sqlalchemy import create_engine
from sqlalchemy import text
# or from sqlalchemy.sql import text
engine = create_engine('mysql://{USR}:{PWD}#localhost:3306/db', echo=True)
with engine.connect() as con:
with open("src/models/query.sql") as file:
query = text(file.read())
con.execute(query)
SQLAlchemy: Using Textual SQL
text()
I was able to run .sql schema files using pure SQLAlchemy and some string manipulations. It surely isn't an elegant approach, but it works.
# Open the .sql file
sql_file = open('file.sql','r')
# Create an empty command string
sql_command = ''
# Iterate over all lines in the sql file
for line in sql_file:
# Ignore commented lines
if not line.startswith('--') and line.strip('\n'):
# Append line to the command string
sql_command += line.strip('\n')
# If the command string ends with ';', it is a full statement
if sql_command.endswith(';'):
# Try to execute statement and commit it
try:
session.execute(text(sql_command))
session.commit()
# Assert in case of error
except:
print('Ops')
# Finally, clear command string
finally:
sql_command = ''
It iterates over all lines in a .sql file ignoring commented lines.
Then it concatenates lines that form a full statement and tries to execute the statement. You just need a file handler and a session object.
You can do it with SQLalchemy and psycopg2.
file = open(path)
engine = sqlalchemy.create_engine(db_url)
escaped_sql = sqlalchemy.text(file.read())
engine.execute(escaped_sql)
Unfortunately I'm not aware of a good general answer for this. Some dbapi's (psycopg2 for instance) support executing many statements at a time. If the files aren't huge you can just load them into a string and execute them on a connection. For others, I would try to use a command-line client for that db and pipe the data into that using the subprocess module.
If those approaches aren't acceptable, then you'll have to go ahead and implement a small SQL parser that can split the file apart into separate statements. This is really tricky to get 100% correct, as you'll have to factor in database dialect specific literal escaping rules, the charset used, any database configuration options that affect literal parsing (e.g. PostgreSQL standard_conforming_strings).
If you only need to get this 99.9% correct, then some regexp magic should get you most of the way there.
If you are using sqlite3 it has a useful extension to dbapi called conn.executescript(str), I've hooked this up via something like this and it seemed to work: (Not all context is shown but it should be enough to get the drift)
def init_from_script(script):
Base.metadata.drop_all(db_engine)
Base.metadata.create_all(db_engine)
# HACK ALERT: we can do this using sqlite3 low level api, then reopen session.
f = open(script)
script_str = f.read().strip()
global db_session
db_session.close()
import sqlite3
conn = sqlite3.connect(db_file_name)
conn.executescript(script_str)
conn.commit()
db_session = Session()
Is this pure evil I wonder? I looked in vain for a 'pure' sqlalchemy equivalent, perhaps that could be added to the library, something like db_session.execute_script(file_name) ? I'm hoping that db_session will work just fine after all that (ie no need to restart engine) but not sure yet... further research needed (ie do we need to get a new engine or just a session after going behind sqlalchemy's back?)
FYI sqlite3 includes a related routine: sqlite3.complete_statement(sql) if you roll your own parser...
You can access the raw DBAPI connection through this
raw_connection = mySqlAlchemyEngine.raw_connection()
raw_cursor = raw_connection() #get a hold of the proxied DBAPI connection instance
but then it will depend on which dialect/driver you are using which can be referred to through this list.
For pyscog2, you can just do
raw_cursor.execute(open("my_script.sql").read())
but pysqlite you would need to do
raw_cursor.executescript(open("my_script").read())
and in line with that you would need to check the documentation of whichever DBAPI driver you are using to see if multiple statements are allowed in one execute or if you would need to use a helper like executescript which is unique to pysqlite.
Here's how to run the script splitting the statements, and running each statement directly with a "connectionless" execution with the SQLAlchemy Engine. This assumes that each statement ends with a ; and that there's no more than one statement per line.
engine = create_engine(url)
with open('script.sql') as file:
statements = re.split(r';\s*$', file.read(), flags=re.MULTILINE)
for statement in statements:
if statement:
engine.execute(text(statement))
In the current answers, I did not found a solution which works when a combination of these features in the .SQL file is present:
Comments with "--"
Multi-line statements with additional comments after "--"
Function definitions which have multiple SQL-queries ending with ";" butmust be executed as a whole statement
A found a rather simple solution:
# check for /* */
with open(file, 'r') as f:
assert '/*' not in f.read(), 'comments with /* */ not supported in SQL file python interface'
# we check out the SQL file line-by-line into a list of strings (without \n, ...)
with open(file, 'r') as f:
queries = [line.strip() for line in f.readlines()]
# from each line, remove all text which is behind a '--'
def cut_comment(query: str) -> str:
idx = query.find('--')
if idx >= 0:
query = query[:idx]
return query
# join all in a single line code with blank spaces
queries = [cut_comment(q) for q in queries]
sql_command = ' '.join(queries)
# execute in connection (e.g. sqlalchemy)
conn.execute(sql_command)
Code bellow works for me in alembic migrations
from alembic import op
import sqlalchemy as sa
from ekrec.common import get_project_root
def upgrade():
path = f'{get_project_root()}/migrations/versions/fdb8492f75b2_.sql'
op.execute(open(path).read())
I had success with David's answer here, with two slight modifications:
Use get_bind() as I was working with a Session rather than an Engine
Call cursor() on the raw connection
raw_connection = myDbSession.get_bind().raw_connection()
raw_cursor = raw_connection.cursor()
raw_cursor.execute(open("my_script.sql").read())