SQL code in Rnw document with knitr - sql

I used the following sql code in .Rmd document. However, I want to use the same SQL code in .Rnw document.
```{r label = setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, max.print = NA)
```
```{r, echo=FALSE, results='hide'}
library(DBI)
db <- dbConnect(RSQLite::SQLite(), dbname = "survey.db")
dbListTables(db)
```
```{sql, label = Q1, connection=db, tab.cap = "Table Caption"}
SELECT *
FROM Person;
```
Would prefer to get code formatting and output printing facility.

Porting the RMarkdown to RNW requires some tweaking:
Of course, chunk delimiters need to be adjusted: The RNW equivalent of ```{r, echo=FALSE} is <<echo=FALSE>>= and RNW chunks end with #. (See the minimal RNW example.)
Importantly, while chunks in RMarkdown documents always specify an engine, the engine in RNW is implicitly R unless the option engine is set. So ```{r} becomes simply <<>>=, but the equivalent of ```{sql} is <<engine="sql">>=.
RMarkdown includes some very useful magic when embedding SQL chunks, see knitr Language Engines: SQL on rmarkdown.rstudio.com. By default, results are rendered as a nice table and only the first 10 results are printed. In RNW, we need to take care of this on our own.
For embedding SQL in RMarkdown, note that the SQL connection must be passed to the SQL chunk via the connection option. The option output.var can be used to specify the name of the object to which the result of the query will be assigned.
A simple solution (see previous revision) would just assign the SQL result to an object, say res, using output.var and add another R chunk that prints res nicely, e.g. using xtable. However, there is a more elegant approach using hooks:
The example uses the SQLite sample database from sqlitetutorial.net. Unzip it to your working directory before running the code.
\documentclass{article}
\begin{document}
\thispagestyle{empty}
<<include=FALSE>>=
library(knitr)
library(DBI)
knit_hooks$set(formatSQL = function(before, options, envir) {
if (!before && opts_current$get("engine") == "sql") {
sqlData <- get(x = opts_current$get("output.var"))
max.print <- min(nrow(sqlData), opts_current$get("max.print"))
myxtable <- do.call(xtable::xtable, c(list(x = sqlData[1:max.print, ]), opts_current$get("xtable.args")))
capture.output(myoutput <-do.call(xtable::print.xtable, c(list(x = myxtable, file = "test.txt"), opts_current$get("print.xtable.args"))))
return(asis_output(paste(
"\\end{kframe}",
myoutput,
"\\begin{kframe}")))
}
})
opts_chunk$set(formatSQL = TRUE)
opts_chunk$set(output.var = "formatSQL_result")
opts_chunk$set(max.print = getOption("max.print"))
#
<<echo=FALSE, results="hide">>=
db <- dbConnect(RSQLite::SQLite(), dbname = "chinook.db")
#
<<engine = "sql", connection=db, max.print = 8, xtable.args=list(caption = "My favorite artists?", label="tab:artist"), print.xtable.args=list(comment=FALSE, caption.placement="top")>>=
SELECT * FROM artists;
#
\end{document}
A new chunk hook formatSQL is added. (Chunk hooks run whenever the corresponding chunk option is not NULL.) After a chunk with engine="sql", it reads the SQL results into sqlData. Then, it uses xtable to print the first max.print rows of the result.
By default, the chunk hook formatSQL is activated (i.e. it is globally set to TRUE) and SQL results are stored in formatSQL_result. The chunk option max.print controls the number of rows to be printed (set it to Inf to print all rows, always).
The table produced by xtable is highly customizable. The chunk option xtable.args is passed to xtable and print.xtable.args is passed to print.xtable. In the example these options are used to set a caption, a label and to suppress xtable's default comment.
Below the generated PDF. Note that syntax highlighting for non-R code in RNW requires installing highlight and adding the directory to path (Windows).

Related

Issue automating CSV import to an RSQLite DB

I'm trying to automate writing CSV files to an RSQLite DB.
I am doing so by indexing csvFiles, which is a list of data.frame variables stored in the environment.
I can't seem to figure out why my dbWriteTable() code works perfectly fine when I enter it manually but not when I try to index the name and value fields.
### CREATE DB ###
mydb <- dbConnect(RSQLite::SQLite(),"")
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in 1:length(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = csvFiles[i], overwrite=T)
i=i+1
}
# EXAMPLE CODE THAT SUCCESSFULLY MANUAL IMPORTS INTO mydb
dbWriteTable(mydb,"DEPARTMENT",DEPARTMENT)
When I run the for loop above, I'm given this error:
"Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'DEPARTMENT': No such file or directory
# note that 'DEPARTMENT' is the value of csvFiles[1]
Here's the dput output of csvFiles:
c("DEPARTMENT", "EMPLOYEE_PHONE", "PRODUCT", "EMPLOYEE", "SALES_ORDER_LINE",
"SALES_ORDER", "CUSTOMER", "INVOICES", "STOCK_TOTAL")
I've researched this error and it seems to be related to my working directory; however, I don't really understand what to change, as I'm not even trying to manipulate files from my computer, simply data.frames already in my environment.
Please help!
Simply use get() for the value argument as you are passing a string value when a dataframe object is expected. Notice your manual version does not have DEPARTMENT quoted for value.
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in seq_along(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = get(csvFiles[i]), overwrite=T)
}
Alternatively, consider building a list of named dataframes with mget and loop element-wise between list's names and df elements with Map:
dfs <- mget(csvfiles)
output <- Map(function(n, d) dbWriteTable(mydb, name = n, value = d, overwrite=T), names(dfs), dfs)

knitr sql chunk not saving data into variable

My RMarkdown notebook with a SQL chunk runs fine when I run all the chunks one by one interactively, but when I try to knit, the SQL chunk does not have save the data into the specified variable. When the dataset that was supposed to be generated using the SQL chunk is referenced in later R chunks, the dataset variable is simply empty.
Here's an example
{r setup, include=FALSE, warning=FALSE, message=FALSE}
# load necessary libraries
library(bigrquery)
library(knitr)
library(tidyverse)
db <- dbConnect(dbi_driver(), dataset = 'sandbox', project = 'project_id', use_legacy_sql = FALSE)
df <- NULL
```
```{sql, connection=db, output.var=df}
select * from example_dataset
limit 10
```
returns dataset
```{r}
head(df)
```
NULL
I've tried the solution here (R: Knitr gives error for SQL-chunk), but it didn't solve my problem.
Just ran into the same problem and it looks like you need to quote the variable you are assigning.
```{sql, connection=db, output.var="df"}
select * from example_dataset
limit 10
```
Source: http://rmarkdown.rstudio.com/authoring_knitr_engines.html#sql

How to read the contents of an .sql file into an R script to run a query?

I have tried the readLines and the read.csv functions but then don't work.
Here is the contents of the my_script.sql file:
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE HireDate >= '1-july-1993'
and it is saved on my Desktop.
Now I want to run this query from my R script. Here is what I have:
conn = connectDb()
fileName <- "C:\\Users\\me\\Desktop\\my_script.sql"
query <- readChar(fileName, file.info(fileName)$size)
query <- gsub("\r", " ", query)
query <- gsub("\n", " ", query)
query <- gsub("", " ", query)
recordSet <- dbSendQuery(conn, query)
rate <- fetch(recordSet, n = -1)
print(rate)
disconnectDb(conn)
And I am not getting anything back in this case. What can I try?
I've had trouble with reading sql files myself, and have found that often times the syntax gets broken if there are any single line comments in the sql. Since in R you store the sql statement as a single line string, if there are any double dashes in the sql it will essentially comment out any code after the double dash.
This is a function that I typically use whenever I am reading in a .sql file to be used in R.
getSQL <- function(filepath){
con = file(filepath, "r")
sql.string <- ""
while (TRUE){
line <- readLines(con, n = 1)
if ( length(line) == 0 ){
break
}
line <- gsub("\\t", " ", line)
if(grepl("--",line) == TRUE){
line <- paste(sub("--","/*",line),"*/")
}
sql.string <- paste(sql.string, line)
}
close(con)
return(sql.string)
}
I've found for queries with multiple lines, the read_file() function from the readr package works well. The only thing you have to be mindful of is to avoid single quotes (double quotes are fine). You can even add comments this way.
Example query, saved as query.sql
SELECT
COUNT(1) as "my_count"
-- comment goes here
FROM -- tabs work too
my_table
I can then store the results in a data frame with
df <- dbGetQuery(con, statement = read_file('query.sql'))
You can use the read_file() function from the readr package.
fileName = read_file("C:/Users/me/Desktop/my_script.sql")
You will get a string variable fileName with the desired text.
Note: Use / instead of \\\
The answer by Matt Jewett is quite useful, but I wanted to add that I sometimes encounter the following warning when trying to read .sql files generated by sql server using that answer:
Warning message: In readLines(con, n = 1) : line 1 appears to contain
an embedded nul
The first line returned by readLines is often "ÿþ" in these cases (i.e. the UTF-16 byte order mark) and subsequent lines are not read properly. I solved this by opening the sql file in Microsoft SQL Server Management Studio and selecting
File -> Save As ...
then on the small downarrow next to the save button selecting
Save with Encoding ...
and choosing
Unicode (UTF-8 without signature) - Codepage 65001
from the Encoding dropdown menu.
If you do not have Microsoft SQL Server Management Studio and are using a Windows machine, you could also try opening the file with the default text editor and then selecting
File -> Save As ...
Encoding: UTF-8
to save with a .txt file extension.
Interestingly changing the file within Microsoft SQL Server Management Studio removes the BOM (byte order mark) altogether, whereas changing the file within the text editor converts the BOM to the UTF-8 BOM but nevertheless causes the query to be properly read using the referenced answer.
The combination of readr and textclean works well without having to create any new functions. read_file() reads the file into a character vector and replace_white() ensures all escape sequence characters are removed from your .sql file. Note: Does cause problems if you have comments in your SQL string !!
library(readr)
library(textclean)
SQL <- replace_white(read_file("file_path")))

knitr and SQL query

I am having difficulty querying a SQL database from a knitr chunk. I can establish a connection and the query works in an R session but hangs indefinitely when knitting from RStudio.
---
title: "Untitled"
author: "XXXXX XXXXXXXXX"
date: "Monday, April 27, 2015"
output: html_document
---
TEST TEST
```{r}
library(RJDBC)
jd<-JDBC(driverClass = "com.osisoft.jdbc.Driver",classPath = "C://Program Files (x86)//PIPC//JDBC//PIJDBCDriver.jar")
piDB<-dbConnect(drv = jd,"jdbc:pisql://XX.XXX.XX.XX/Data Source=XXX;Integrated Security=SSPI")
sql1<-"SELECT * FROM pipoints"
sql.dat <- dbGetQuery(piDB, sql1)
dbDisconnect(piDB)
print('Success')
```
If you can use a different connection driver, try ODBC.
RODBC works fine with knitr in RStudio:
```{r}
library(RODBC)
myconn = odbcConnect('myServer')
myquery = paste0("") #add some query
data = sqlQuery(myconn, myquery)
head(data)
```
With RStudio v1.0 you can now use sql chunks directly from your RMarkdown or RNotebook. I use the odbc package for this. I like this because it avoids hard-coding login details into your projects while still creating projects that run end-to-end without user input.
An RMarkdown example below:
```{r}
# Unfortunately, odbc is not on CRAN yet
# So we will need devtools
# install.packages(devtools)
library(devtools)
devtools::install_github("rstats-db/odbc")
# Get connection info from the Windows ODBC Data Source Administrator using the name you set manually.
# If you don't know what this is, just search in the windows start menu for "ODBC Data Source Administrator"
con <- dbConnect(odbc::odbc(), 'MyDataWarehouse')
```
```{sql connection = con, output.var = result}
-- This is sql code, comments need to be marked accordingly
SELECT * FROM SOMETABLE LIMIT 200;
```
```{R}
# And the result is available in the next chunk!
result
```

Execute SQL from file in SQLAlchemy

How can I execute whole sql file into database using SQLAlchemy? There can be many different sql queries in the file including begin and commit/rollback.
sqlalchemy.text or sqlalchemy.sql.text
The text construct provides a straightforward method to directly execute .sql files.
from sqlalchemy import create_engine
from sqlalchemy import text
# or from sqlalchemy.sql import text
engine = create_engine('mysql://{USR}:{PWD}#localhost:3306/db', echo=True)
with engine.connect() as con:
with open("src/models/query.sql") as file:
query = text(file.read())
con.execute(query)
SQLAlchemy: Using Textual SQL
text()
I was able to run .sql schema files using pure SQLAlchemy and some string manipulations. It surely isn't an elegant approach, but it works.
# Open the .sql file
sql_file = open('file.sql','r')
# Create an empty command string
sql_command = ''
# Iterate over all lines in the sql file
for line in sql_file:
# Ignore commented lines
if not line.startswith('--') and line.strip('\n'):
# Append line to the command string
sql_command += line.strip('\n')
# If the command string ends with ';', it is a full statement
if sql_command.endswith(';'):
# Try to execute statement and commit it
try:
session.execute(text(sql_command))
session.commit()
# Assert in case of error
except:
print('Ops')
# Finally, clear command string
finally:
sql_command = ''
It iterates over all lines in a .sql file ignoring commented lines.
Then it concatenates lines that form a full statement and tries to execute the statement. You just need a file handler and a session object.
You can do it with SQLalchemy and psycopg2.
file = open(path)
engine = sqlalchemy.create_engine(db_url)
escaped_sql = sqlalchemy.text(file.read())
engine.execute(escaped_sql)
Unfortunately I'm not aware of a good general answer for this. Some dbapi's (psycopg2 for instance) support executing many statements at a time. If the files aren't huge you can just load them into a string and execute them on a connection. For others, I would try to use a command-line client for that db and pipe the data into that using the subprocess module.
If those approaches aren't acceptable, then you'll have to go ahead and implement a small SQL parser that can split the file apart into separate statements. This is really tricky to get 100% correct, as you'll have to factor in database dialect specific literal escaping rules, the charset used, any database configuration options that affect literal parsing (e.g. PostgreSQL standard_conforming_strings).
If you only need to get this 99.9% correct, then some regexp magic should get you most of the way there.
If you are using sqlite3 it has a useful extension to dbapi called conn.executescript(str), I've hooked this up via something like this and it seemed to work: (Not all context is shown but it should be enough to get the drift)
def init_from_script(script):
Base.metadata.drop_all(db_engine)
Base.metadata.create_all(db_engine)
# HACK ALERT: we can do this using sqlite3 low level api, then reopen session.
f = open(script)
script_str = f.read().strip()
global db_session
db_session.close()
import sqlite3
conn = sqlite3.connect(db_file_name)
conn.executescript(script_str)
conn.commit()
db_session = Session()
Is this pure evil I wonder? I looked in vain for a 'pure' sqlalchemy equivalent, perhaps that could be added to the library, something like db_session.execute_script(file_name) ? I'm hoping that db_session will work just fine after all that (ie no need to restart engine) but not sure yet... further research needed (ie do we need to get a new engine or just a session after going behind sqlalchemy's back?)
FYI sqlite3 includes a related routine: sqlite3.complete_statement(sql) if you roll your own parser...
You can access the raw DBAPI connection through this
raw_connection = mySqlAlchemyEngine.raw_connection()
raw_cursor = raw_connection() #get a hold of the proxied DBAPI connection instance
but then it will depend on which dialect/driver you are using which can be referred to through this list.
For pyscog2, you can just do
raw_cursor.execute(open("my_script.sql").read())
but pysqlite you would need to do
raw_cursor.executescript(open("my_script").read())
and in line with that you would need to check the documentation of whichever DBAPI driver you are using to see if multiple statements are allowed in one execute or if you would need to use a helper like executescript which is unique to pysqlite.
Here's how to run the script splitting the statements, and running each statement directly with a "connectionless" execution with the SQLAlchemy Engine. This assumes that each statement ends with a ; and that there's no more than one statement per line.
engine = create_engine(url)
with open('script.sql') as file:
statements = re.split(r';\s*$', file.read(), flags=re.MULTILINE)
for statement in statements:
if statement:
engine.execute(text(statement))
In the current answers, I did not found a solution which works when a combination of these features in the .SQL file is present:
Comments with "--"
Multi-line statements with additional comments after "--"
Function definitions which have multiple SQL-queries ending with ";" butmust be executed as a whole statement
A found a rather simple solution:
# check for /* */
with open(file, 'r') as f:
assert '/*' not in f.read(), 'comments with /* */ not supported in SQL file python interface'
# we check out the SQL file line-by-line into a list of strings (without \n, ...)
with open(file, 'r') as f:
queries = [line.strip() for line in f.readlines()]
# from each line, remove all text which is behind a '--'
def cut_comment(query: str) -> str:
idx = query.find('--')
if idx >= 0:
query = query[:idx]
return query
# join all in a single line code with blank spaces
queries = [cut_comment(q) for q in queries]
sql_command = ' '.join(queries)
# execute in connection (e.g. sqlalchemy)
conn.execute(sql_command)
Code bellow works for me in alembic migrations
from alembic import op
import sqlalchemy as sa
from ekrec.common import get_project_root
def upgrade():
path = f'{get_project_root()}/migrations/versions/fdb8492f75b2_.sql'
op.execute(open(path).read())
I had success with David's answer here, with two slight modifications:
Use get_bind() as I was working with a Session rather than an Engine
Call cursor() on the raw connection
raw_connection = myDbSession.get_bind().raw_connection()
raw_cursor = raw_connection.cursor()
raw_cursor.execute(open("my_script.sql").read())