This question already has answers here:
Dynamic "string" in R
(4 answers)
Add a dynamic value into RMySQL getQuery [duplicate]
(2 answers)
RSQLite query with user specified variable in the WHERE field [duplicate]
(2 answers)
Closed 5 years ago.
Is there any way to pass a variable defined within R to the sqlQuery function within the RODBC package?
Specifically, I need to pass such a variable to either a scalar/table-valued function, a stored procedure, and/or perhaps the WHERE clause of a SELECT statement.
For example, let:
x <- 1 ## user-defined
Then,
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
Or...
example2 <- sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = x")
Or...
example3 <- sqlQuery(myDB,"EXEC dbo.my_stored_proc (x)")
Obviously, none of these work, but I'm thinking that there's something that enables this sort of functionality.
Build the string you intend to pass. So instead of
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
do
example <- sqlQuery(myDB, paste("SELECT * FROM dbo.my_table_fn (",
x, ")", sep=""))
which will fill in the value of x.
If you use sprintf, you can very easily build the query string using variable substitution. For extra ease-of-use, if you pre-parse that query string (I'm using stringr) you can write it over multiple lines in your code.
e.g.
q1 <- sprintf("
SELECT basketid, count(%s)
FROM %s
GROUP BY basketid
"
,item_barcode
,dbo.sales
)
q1 <- str_replace_all(str_replace_all(q1,"\n",""),"\\s+"," ")
df <- sqlQuery(shopping_database, q1)
Side-note and hat-tip to another R chap
Recently I found I wanted to make the variable substitution even simpler by using something like Python's string.format() function, which lets you reuse and reorder variables within the string
e.g.
$: w = "He{0}{0}{1} W{1}r{0}d".format("l","o")
$: print(w)
"Hello World"
However, this function doesn't appear to exist in R, so I asked around on Twitter, and a very helpful chap #kevin_ushey replied with his own custom function to be used in R. Check it out!
With more variables do this:
aaa <- "
SELECT ColOne, ColTwo
FROM TheTable
WHERE HpId = AAAA and
VariableId = BBBB and
convert (date,date ) < 'CCCC'
"
--------------------------
aaa <- gsub ("AAAA", toString(111),aaa)
aaa <- gsub ("BBBB", toString(2222),aaa)
aaa <- gsub ("CCCC", toString (2016-01-01) ,aaa)
try with this
x <- 1
example2 <- fn$sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = '$x'")
Related
I'm trying to import data from a table in SQL Server and then write it into a .txt file. I'm doing it in the following way. However when I do that all numbers having leading 0 s seems to get trimmed.
For example if I have 000124 in the database, it's shown as 124 in the .txt as well as if I check x_1 it's 124 in there as well.
How can I avoid this? I want to keep the leading 0 s in x_1 and also need them in the output .txt file.
library(RODBC)
library(lubridate)
library(data.table)
cn_1 <- odbcConnect('channel_name')
qry <- "
select
*
from table_name
"
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE)
rm(qry)
setDT(x_1)
fwrite(x=x_1, file=paste0(export_location, "file_name", date_today, ".txt"), sep="|", quote=TRUE, row.names=FALSE, na="")
Assuming that the underlying data in the DBMS is indeed "string"-like ...
RODBC::sqlQuery has the as.is= argument that can prevent it from trying to convert values. The default is FALSE, and when false and not a clear type like "date" or "timestamp", RODBC calls type.convert which will see the number-like field and convert it to integers or numbers.
Try:
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE, as.is = TRUE)
and that will stop auto-conversion of all columns.
That is a bit nuclear, to be honest, and will stop conversion of dates/times, and perhaps other columns that should be converted. We can narrow this down; ?sqlQuery says that read.table's documentation on as.is is relevant, and it says:
as.is: controls conversion of character variables (insofar as they
are not converted to logical, numeric or complex) to factors,
if not otherwise specified by 'colClasses'. Its value is
either a vector of logicals (values are recycled if
necessary), or a vector of numeric or character indices which
specify which columns should not be converted to factors.
so if you know which column (by name or column index) is being unnecessarily converted, then you can include it directly. Perhaps
## by column name
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE, as.is = "somename")
## or by column index
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE, as.is = 7)
(Side note: while I use select * ... on occasion as well, the presumption of knowing columns by-number is predicated on know all of the columns included in that table/query. If anything changes, perhaps it's actually a SQL view and somebody updates it ... or if somebody changes the order of columns, than your assumptions of column indices is a little fragile. All of my "production" queries in my internal packages have all columns spelled out, no use of select *. I have been bitten once when I used it, which is why I'm a little defensive about it.)
If you don't know, a hastily-dynamic way (that double-taps the query, unfortunately) could be something like
qry10 <- "
select
*
from table_name
limit 10"
x_1 <- sqlQuery(channel=cn_1, query=qry10, stringsAsFactors=FALSE, as.is = TRUE)
leadzero <- sapply(x_1, function(z) all(grepl("^0+[1-9]", z)))
x_1 <- sqlQuery(channel=cn_1, query=qry, stringsAsFactors=FALSE, as.is = which(leadzero))
Caveat: I don't use RODBC nor have I set up a temporary database with appropriately-fashioned values, so this untested.
Let x_1 be the result data.table from your SQL query. Then you can convert numeric columns (e.g. value) to formatted strings using sprintf to get leading zeros:
library(data.table)
x_1 <- data.table(value = c(1,12,123,1234))
x_1
#> value
#> 1: 1
#> 2: 12
#> 3: 123
#> 4: 1234
x_1$value <- x_1$value |> sprintf(fmt = "%04d")
x_1
#> value
#> 1: 0001
#> 2: 0012
#> 3: 0123
#> 4: 1234
Created on 2021-10-08 by the reprex package (v2.0.1)
I am using read.csv.sql to conditionally read in data (my data set is extremely large so this was the solution I chose to filter it and reduce it in size prior to reading the data in). I was running into memory issues by reading in the full data and then filtering it so that is why it is important that I use the conditional read so that the subset is read in, versus the full data set.
Here is a small data set so my problem can be reproduced:
write.csv(iris, "iris.csv", row.names = F)
library(sqldf)
csvFile <- "iris.csv"
I am finding that the notation you have to use is extremely awkward using read.csv.sql the following is the how I am reading in the file:
# Step 1 (Assume these values are coming from UI)
spec <- 'setosa'
petwd <- 0.2
# Add quotes and make comma-separated:
spec <- toString(sprintf("'%s'", spec))
petwd <- toString(sprintf("'%s'", petwd))
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Species" in ($spec)'
and "Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
My main problem is that if any of the values above (from UI) are null then it won't read in the data properly, because this chunk of code is all hard coded.
I would like to change this into: Step 1 - check which values are null and do not filter off of them, then filter using read.csv.sql for all non-null values on corresponding columns.
Note: I am reusing the code from this similar question within this question.
UPDATE
I want to clear up what I am asking. This is what I am trying to do:
If a field, say spec comes through as NA (meaning the user did not pick input) then I want it to filter as such (default to spec == EVERY SPEC):
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
Since spec is NA, if you try to filter/read in a file matching spec == NA it will read in an empty data set since there are no NA values in my data, hence breaking the code and program. Hope this clears it up more.
There are several problems:
some of the simplifications provided in the link in the question were not followed.
spec is a scalar so one can just use '$spec'
petwd is a numeric scalar and SQL does not require quotes around numbers so just use $petwd
the question states you want to handle empty fields but not how so we have used csvfix to map them to -1 and also strip off quotes. (Alternately let them enter and do it in R. Empty numerics will come through as 0 and empty character fields will come through as zero length character fields.)
you can use [...] in place of "..." in SQL
The code below worked for me in both Windows and Ubuntu Linux with the bash shell.
library(sqldf)
spec <- 'setosa'
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file where [Species] = '$spec' and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix map -smq -fv "" -tv -1'
)
Update
Regarding the update at the end of the question it was clarified that the NA could be in spec as opposed to being in the data being read in and that if spec is NA then the condition involving spec should be regarded as TRUE. In that case just expand the SQL where condition to handle that as follows.
spec <- NA
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file
where ('$spec' == 'NA' or [Species] = '$spec') and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix echo -smq'
)
The above will return all rows for which Petal.Width is 0.2 .
This question already has answers here:
Dynamic "string" in R
(4 answers)
Add a dynamic value into RMySQL getQuery [duplicate]
(2 answers)
RSQLite query with user specified variable in the WHERE field [duplicate]
(2 answers)
Closed 5 years ago.
Is there any way to pass a variable defined within R to the sqlQuery function within the RODBC package?
Specifically, I need to pass such a variable to either a scalar/table-valued function, a stored procedure, and/or perhaps the WHERE clause of a SELECT statement.
For example, let:
x <- 1 ## user-defined
Then,
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
Or...
example2 <- sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = x")
Or...
example3 <- sqlQuery(myDB,"EXEC dbo.my_stored_proc (x)")
Obviously, none of these work, but I'm thinking that there's something that enables this sort of functionality.
Build the string you intend to pass. So instead of
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
do
example <- sqlQuery(myDB, paste("SELECT * FROM dbo.my_table_fn (",
x, ")", sep=""))
which will fill in the value of x.
If you use sprintf, you can very easily build the query string using variable substitution. For extra ease-of-use, if you pre-parse that query string (I'm using stringr) you can write it over multiple lines in your code.
e.g.
q1 <- sprintf("
SELECT basketid, count(%s)
FROM %s
GROUP BY basketid
"
,item_barcode
,dbo.sales
)
q1 <- str_replace_all(str_replace_all(q1,"\n",""),"\\s+"," ")
df <- sqlQuery(shopping_database, q1)
Side-note and hat-tip to another R chap
Recently I found I wanted to make the variable substitution even simpler by using something like Python's string.format() function, which lets you reuse and reorder variables within the string
e.g.
$: w = "He{0}{0}{1} W{1}r{0}d".format("l","o")
$: print(w)
"Hello World"
However, this function doesn't appear to exist in R, so I asked around on Twitter, and a very helpful chap #kevin_ushey replied with his own custom function to be used in R. Check it out!
With more variables do this:
aaa <- "
SELECT ColOne, ColTwo
FROM TheTable
WHERE HpId = AAAA and
VariableId = BBBB and
convert (date,date ) < 'CCCC'
"
--------------------------
aaa <- gsub ("AAAA", toString(111),aaa)
aaa <- gsub ("BBBB", toString(2222),aaa)
aaa <- gsub ("CCCC", toString (2016-01-01) ,aaa)
try with this
x <- 1
example2 <- fn$sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = '$x'")
I am trying to read a big csv file. Indeed, I want select a subset using a special column which name is Race Color. Reading the file via read.csv, I have the head
library(sqldf)
df <- read.csv(file = 'df.txt', header = T, sep = ";")
head(df)
id Region Race Color ....
1 1 1
2 1 1
3 2 1
4 3 2
5 4 1
6 4 1
I would like to use read.csv.sql for selecting a subset of df without use the read.csv file. For example, I want all the people with Race Color equal to 1.
Using read.csv.sql, I have something like
>df <- read.csv.sql("df.txt", sql = "select * from file where Race Color = 1", sep=";", header=T, eol="\n")
but I have the following error
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: near "Color": syntax error
Trying
>df <- read.csv.sql("df.txt", sql = "select * from file where 'Race Color' = 1", sep=";", header=T, eol="\n")
I have df with zero rows.
Any solution?
R automatically adds a . to column names with a space on reading in the data to make Race.Color, but a . has a special meaning in sql, so that will screw things up.
There is a built in method in sqldf using square brackets ([Race.Color]) to explicitly name columns we can use so that we don't run into that problem. You can also use escaped quotes : \"Race.Color\"
This should work:
library(sqldf)
read.csv.sql("test.csv", sql = "select * from file where [Race.Color] = 1", sep=";", header=T, eol="\n")
A few posters have asked similar questions on here and these have taken me 80% of the way toward reading text files with sql queries in them into R to use as input to RODBC:
Import multiline SQL query to single string
RODBC Temporary Table Issue when connecting to MS SQL Server
However, my sql files have quite a few comments in them (as --comment on this and that). My question is, how would one go about either stripping comment lines from query on import, or making sure that the resulting string keeps line breaks, thus not appending actual queries to comments?
For example, query6.sql:
--query 6
select a6.column1,
a6.column2,
count(a6.column3) as counts
--count the number of occurences in table 1
from data.table a6
group by a6.column1
becomes:
sqlStr <- gsub("\t","", paste(readLines(file('SQL/query6.sql', 'r')), collapse = ' '))
sqlStr
"--query 6select a6.column1, a6.column2, count(a6.column3) as counts --count the number of occurences in table 1from data.table a6 group by a6.column1"
when read into R.
Are you sure you can't just use it as is? This works despite taking up multiple lines and having a comment:
> library(sqldf)
> sql <- "select * -- my select statement
+ from BOD
+ "
> sqldf(sql)
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
This works too:
> sql2 <- c("select * -- my select statement", "from BOD")
> sql2.paste <- paste(sql2, collapse = "\n")
> sqldf(sql2.paste)
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
I had trouble with the other answer, so I modified Roman's and made a little function. This has worked for all my test cases, including multiple comments, single-line and partial-line comments.
read.sql <- function(filename, silent = TRUE) {
q <- readLines(filename, warn = !silent)
q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments
q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments
q <- paste(q, collapse = " ")
return(q)
}
Summary
Function clean_query:
Removes all mixed comments
Creates single string output
Takes a SQL path or text string
Is simple
Function
require(tidyverse)
# pass in either a text query or path to a sql file
clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){
# if sql path, read, otherwise assume text input
if( str_detect(text_or_path, "(?i)\\.sql$") ){
text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n")
}
# echo original query to the console
# (unnecessary, but helpful for status if passing sequential queries to a db)
cat("\nThe query you're processing is: \n", text_or_path, "\n\n")
# return
text_or_path %>%
# remove all demarked /* */ sql comments
gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>%
# remove all demarked -- comments
gsub(pattern = '--[^\r\n]*', replacement = ' ') %>%
# remove everything after the query-end semicolon
gsub(pattern = ';.*', replacement = ' ') %>%
#remove any line break, tab, etc.
gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%
# remove extra whitespace
gsub(pattern = ' +', replacement = ' ')
}
You could attach regexps together if you want incomprehensibly long expressions, but I recommend readable code.
Output for "query6.sql"
[1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 "
Additional Text Input Example
query <- "
/* this query has
intentionally messy
comments
*/
Select
COL_A -- with a comment here
,COL_B
,COL_C
FROM
-- and some helpful comment here
Database.Datatable
;
-- or wherever
/* and some more comments here */
"
Call function:
clean_query(query)
Output:
[1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable "
If you want to test reading from a .sql file:
temp_path <- path.expand("~/query.sql")
cat(query, file = temp_path)
clean_query(temp_path)
file.remove(temp_path)
Something like this?
> cat("--query 6
+ select a6.column1,
+ a6.column2,
+ count(a6.column3) as counts
+ --count the number of occurences in table 1
+ from data.table a6
+ group by a6.column1", file = "query6.sql")
>
> my.q <- readLines("query6.sql")
Warning message:
In readLines("query6.sql") : incomplete final line found on 'query6.sql'
> my.q
[1] "--query 6" "select a6.column1, "
[3] "a6.column2," "count(a6.column3) as counts"
[5] "--count the number of occurences in table 1 " "from data.table a6"
[7] "group by a6.column1"
> find.com <- grepl("--", my.q)
>
> my.q <- my.q[!find.com]
> paste(my.q, collapse = " ")
[1] "select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1"
>
> unlink("query6.sql")
> rm(list = ls())
had to solve a similar problem lately using another language and still find R to be easier to implement
readSQLFile <- function(fname, retainNewLines=FALSE) {
lines <- readLines(fname)
#remove -- type comments
lines <- vapply(lines, function(x) {
#handle /* -- */ type comments
if (grepl("/\\*(.*)--", x))
return(x)
strsplit(x,"--")[[1]][1]
}, character(1))
#remove /* */ type comments
sqlstr <- paste(lines, collapse=ifelse(retainNewLines, "&&&&&&&&&&" , " "))
sqlstr <- gsub("/\\*(.|\n)*?\\*/","",sqlstr)
if (retainNewLines) {
sqlstr <- strsplit(sqlstr, "&&&&&&&&&&")[[1]]
sqlstr <- sqlstr[sqlstr!=""]
}
sqlstr
} #readSQLFile
#example
fname <- tempfile("sql",fileext=".sql")
cat("--query 6
select a6.column1, --trailing comments
a6.column2, ---test triple -
count(a6.column3) as counts, --/* funny comment */
a6.column3 - a6.column4 ---test single -
/*count the number of occurences in table 1;
test another comment style
*/
from data.table a6 /* --1st weirdo comment */
/* --2nd weirdo comment */
group by a6.column1\n", file=fname)
#remove new lines
readSQLFile(fname)
#retain new lines
readSQLFile(fname, TRUE)
unlink(fname)
It's possible to use readChar() instead of readLines(). I had an ongoing issue with mixed commenting (-- or /* */) and this has always worked well for me.
sql <- readChar(path.to.file, file.size(path.to.file))
query <- sqlQuery(con, sql, stringsAsFactors = TRUE)