pass a column name correctly as a function argument for SQL query in R - sql

I have a datafile sales_history. I want to query it in the following way.
my_df<-sqldf("SELECT *
FROM sales_history
WHERE Business_Unit=='RETAIL'"")
Now I want to write a function with argument datafile and column name to do the above job. So something like:
pick_column<-function(df, column_name){
my_df<-sqldf("SELECT *
FROM df
WHERE Business_Unit==column_name"
return(my_df)
}
Ideally, after running the above function definition, I should then be able to run
pick_column(sales_history,'RETAIL'). But when I do this, the second argument 'RETAIL' is not passed to the function correctly. What's the correct way to do this then?
I know that for this example, there are other ways to do this other than using "sqldf" for SQL query. But the point of my question here is how to pass the column_name correctly as a function argument.

the sqldf package uses gsubfn to allow you to add names of R variables into your SQL commands by prefixing them with the "$" character. So you can write
sales_history <- data.frame(
price=c(12,10),
Business_Unit=c("RETAIL","BUSINESS"),
stringsAsFactors=F
)
pick_column <- function(df, columnname) {
fn$sqldf("SELECT * FROM $df WHERE Business_Unit='$columnname'")
}
pick_column("sales_history","RETAIL")

Related

R: Summary of SQL Tables

I am working with the R programming language.
Normally, when I want to get the summary of a table, I can use something like the "str()" function or the "summary()" function:
str(my_table)
summary(my_table)
However, now I am trying to do this with tables on a server.
For instance, I am trying to get the summaries of variable types for a specific table (e.g. "my_table") on a server. I found a very indirect way to do this:
#load libraries
library(OBDC)
library(RODBC)
library(dbi)
#establish a connection and name it as "dbhandle"
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
dbColumnInfo(rs)
My Question: Is there a more "direct" way to do this? For example, can I get information about each column (e.g. whether the column is integer, character, date, etc.) in a table without first sending the query and then requesting the information? Can I do this directly?
Thanks!
You could try using fetch() from "RMySQL" to turn your SQL query into an R object (e.g. data frame)
library(RMySQL)
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
# Get the results from MySQL into R
my_table = fetch(rs, n=-1)
# clear result
dbClearResult(rs)
rm(rs)
Then use the functions you describe.
str(my_table)
summary(my_table)

Rmarkdown - Use table name as variable in dynamic sql chunk?

I need to execute an SQL-engine chunk in my Rmarkdown, where the table which is queried has a dynamic name, defined by R code.
I know that linking variables to the current R-environment is doable by using ?, but this works only for strings and numerics, not for "objects".
Of course I could just run the SQL query with DBI::dbGetQuery() but this would imply building all my request (which is very long) as a string which is not comfortable (I have many chunks to run).
Basically what I would need is :
`` {r}
mytable <- "name_of_table_on_sql_server"
``
then
`` {sql}
select * from ?mytable
``
This fails because the created query is select * from "name_of_table_on_sql_server" where SQL would need select * from name_of_table_on_sql_server (without quotes).
Using glue for defining mytable as mytable <- glue("name_of_table_on_sql_server") is not working neither.
Any idea ?
A slight variant on what you posted works for me (I don't have SQL Server so I tested with sqlite):
`` {r}
library(glue)
mytable <- glue_sql("name_of_table_on_sql_server")
``
then
`` {sql}
select * from ?mytable;
``
My only real changes were to use the function glue_sql and add a semicolon (;) to the end of the SQL chunk.

Putting output from sql query into another query using R environment

I am wondering what approach should have been selected to perform action from title. I am using ODBC connection and what I get from first sql query are like 40-50 rows in one column. What I want is to put this output as a values in to search for.
How should i treat this? Like a array or separated variables? I still do not know R well so just need to know where to search for.
Regards
------more explanation below----
I have list of 40-50 numbers of 10 digits each, organized in a column.
I am trying to do this:
list <- c(my_input)
sql_in <- paste0(list, collapse="")
and characters are organized like this after this operations:
'c(1234567890, , 1234567890, 1234567890)'
and almost all looks fine and fit into my query besides additional c character at the beginning and missing apostrophes.I try to use gsub function but did not work in way I want.
You may likely do this in one SQL call using a subquery. Notice in the call below that the result of
SELECT n_gear
FROM Gear
WHERE n_gear IN (3,4)
Is passed to the WHERE clause of the primary query. This is perfectly valid and will allow your query to execute entirely in SQL without having to do any intermediate steps in R.
(I use sqldf for simplicity of illustration, but this should work through just about any ODBC connection)
library(sqldf)
Gear <- data.frame(n_gear = 1:5)
sqldf(
"SELECT mpg, qsec, gear, wt
FROM mtcars
WHERE gear IN (SELECT n_gear
FROM Gear
WHERE n_gear IN (3,4))"
)
Try something like this:
list<-c("try","this") #The output from your first query
sql_in<-paste0(list, collapse="','")
The Output
paste("select * from table where table.var in ",paste("('",sql_in,"')",sep=''))
[1] "select * from table where table.var in ('try','this')"
If yuo have space as first or last element of the string you can use this code:
`list<-c(" first element is a space","try","this","last element is a space ")` #The output from your first query
Find space at first or last character
first_space<-substr(list, start = 1, stop = 1)==" "
last_space<-substr(list, start = nchar(list), stop = nchar(list))==" "
Remove spaces
list[first_space]<-substr(list[first_space], start = 2, stop = nchar(list[first_space]))
list[last_space]<-substr(list[last_space], start = 1, stop = nchar(list[last_space])-1)
sql_in<-paste0(list, collapse="','")
Your output
paste0("select * from table where table.var in ",paste("('",sql_in,"')",sep=''))
"select * from table where table.var in ('first element is a space','try','this','last element is a space')"
I think You are expecting some thing like shown below code,
data <- dbGetQuery(con, "select column from yourfirsttable")
list <- paste(data$column, collapse="','")
result <- dbGetQuery(con, statement = sprintf("select * from yourresulttable where inv in ('%s')",list))
It's not entirely clear exactly what you're wanting to achieve here. For example, one use case just means you can do it all with a join. But I have cases where I don't know the values for the test without doing some computation. Then I do a separate query having created a query string thus:
> id <- 1:5
> paste0("SELECT * FROM table WHERE ID IN (", paste0(id, collapse = ","), ")")
[1] "SELECT * FROM table WHERE ID IN (1,2,3,4,5)"

IPython SQL Magic - Generate Query String Programmatically

I'm generating SQL programmatically so that, based on certain parameters, the query that needs to be executed could be different (i.e., tables used, unions, etc). How can I insert a string like this: "select * from table", into a %%sql block? I know that using :variable inserts variable into the %%sql block, but it does so as a string, rather than sql code.
The answer was staring me in the face:
query="""
select
*
from
sometable
"""
%sql $query
If you want to templatize your queries, you can use string.Template:
from string import Template
template = Template("""
SELECT *
FROM my_data
LIMIT $limit
""")
limit_one = template.substitute(limit=1)
limit_two = template.substitute(limit=2)
%sql $limit_one
Source: JupySQL documentation.
Important: If you use this approach, ensure you trust/sanitize the input!

RODBC Multiple Inputs from Shiny

I have a Shiny app that has a checkbox group input. The user can select multiple inputs. I also have an ODBC connection linked to a database. The process would be that when a user selects items from the check box group, that user input would be part of a string in the sql query to filter the data.
UI.R (partial to show example)
checkboxGroupInput('Type', 'Type', c(
"AX"="AX",
"AY"="AY",
"AZ"="AZ",
"BGB"="BGB",
"BT"="BT",
"BX"="BX",
"BXT"="BXT",
"C"="C",
"CNT"="CNT")),
The column in the table where the "Type" information is in is called COMPONENT, so my sql query using RODBC is
data <- odbcConnect("database", uid="username", pwd="password")
query <- (SELECT ID, NAME, TYPE FROM COMPONENT WHERE TYPE LIKE Input$Type)
df <- odbcQuery(data, query)
The query line would not work, but I have no idea how to take multiple inputs and place them properly in the query. Also, there is an added level of complexity that I am not sure how to handle. The data in the database is alpha numeric, so instead of AX, it might be listed as AX14 or AX 71. Also, because there are some one letter types, using a wildcard seems a little difficult.
To answer your initial question regarding "multiple inputs in the query", I use concatenation to achieve this.
Using paste0(), I write something as follows:
type = "AX14"
myQuery <- paste0("Select variable1, variable2 from my_table where type like ",type)
myQuery
[1] "Select variable1, variable2 from my_table where type like AX14"
You can add little things like single quotes or wildcard operators as follows:
myQuery <- paste0("Select variable1, variable2 from my_table where type like '%",type,"%'")
myQuery
[1] "Select variable1, variable2 from my_table where type like '%AX14%'"
Then proceed with actually running the query:
df <- odbcQuery(data, myQuery)