I have an Shiny App where User can filter a SQL Database of Movies. So far, you can only filter by different countries.
con <- dbConnect(RSQLite::SQLite(), 'Movies.db')
movies_data <- dbReadTable(con, 'Movies')
ui <- fluidPage(
fluidRow(
selectInput(
inputId = "country",
label = "Country:",
choices = movies_data$journal,
multi=T
),
br(),
fluidRow(width="100%",
dataTableOutput("table")
)
)
)
server <- function(input, output, session) {
output$table <- renderDataTable({
dbGetQuery(
conn = con,
statement = 'SELECT * FROM movies WHERE country IN ( ? )',
params = list(input$country))
})
}
shinyApp(ui = ui, server = server)
Now i want to give the user more Filters, for example Actor or Genre. All Filters are Multiselect and optional. How can i create the Statement dynamic? Would i use some switch statement for every possible combination (i.e. no Filter on Country but only Action Movies)? This seems ab it bit exhausting to me.
First off, you say the filter is optional but I see no way to disable it in your code. I'm assuming that deselecting all options is your way of disabling the filter, or at least that it's intended to work that way. If all options are selected for any filter, then the current approach should work fine, and will just show all films.
You can probably just construct the overall query piece by piece, and then paste it all together at the end.
Base query: 'SELECT * FROM movies'
Country filter: 'country in ' input country
Actor filter: 'actor in' input actor
Genre filter: 'genre in' input genre
Then you put it all together with paste.
To summarize: Base query. Then, if any of the filters are active, add a WHERE. Join all filters together, separating by AND. Pass the final query in as a direct string.
You can even put the filters into a list for easier parsing.
# Here, filterList is a list containing input$country, input$actor, input$genre
# and filterNames contains the corresponding names in the database
# e.g. filterList <- list("c1", list("a1", "a2"), "g1")
# filterNames <- filterNames <- list("c", "a", "g")
baseQuery <- "SELECT * FROM movies"
# If any of the filters have greater than 0 value, this knows to do the filters
filterCheck <- any(sapply(filterList, length)>0)
# NOTE: If you have a different selection available for None
# just modify the sapply function accordingly
if(filterCheck)
{
baseQuery <- paste(baseQuery, "WHERE")
# This collapses multiselects for a filter into a single string with a comma separator
filterList <- sapply(filterList, paste, collapse = ", ")
# Now you construct the filters
filterList <- sapply(1:length(filterList), function(x)
paste0(filterNames[x], " IN (", filterList[x], ")"))
# Paste the filters together
filterList <- paste(filterList, collapse = " and ")
baseQuery <- paste(baseQuery, filterList)
}
# Final output using the sample input above:
# "SELECT * FROM movies WHERE c IN (c1) and a IN (a1, a2) and g IN (g1)"
Now use baseQuery as the direct query statement
Related
I am trying to pass along values from a Shiny input parameter into a ValueBox. I have seen versions of this, however I cannot seem to get it to work. Below is a sample of what I am trying to do. I would like to have the selections from the input field pass along to the ValueBox, and the value box returns another column (sum is fine, or just the value of the column). Its hard with mpg data because there is not a one to one relationship of manufacturer and the city column but in my dataset one entry in the drop down aligns with one entry in another table. I cannot seem to pass along the the input list to the valuebox.
library(shiny)
library(shinydashboard)
ui <- dashboardPage(
dashboardHeader(title = "Dynamic boxes"),
dashboardSidebar(
selectInput('column', 'Column:', mpg$manufacturer)
),
dashboardBody(
valueBoxOutput("vbox")
)
)
server <- function(input, output) {
output$vbox <- renderValueBox({
valueBox(
paste('Sum', input$column),
sum(mpg$cty[[input$column]])
)
})
}
shinyApp(ui, server)
Is this what you're looking for?
library(shiny)
library(shinydashboard)
library(ggplot2)
ui <- dashboardPage(
dashboardHeader(title = "Dynamic boxes"),
dashboardSidebar(
selectInput('column', 'Column:', unique(mpg$manufacturer))
),
dashboardBody(
valueBoxOutput("vbox")
)
)
server <- function(input, output) {
output$vbox <- renderValueBox({
valueBox(
paste('Sum', input$column),
sum(mpg$cty[mpg$manufacturer %in% input$column])
)
})
}
shinyApp(ui, server)
I am using read.csv.sql to conditionally read in data (my data set is extremely large so this was the solution I chose to filter it and reduce it in size prior to reading the data in). I was running into memory issues by reading in the full data and then filtering it so that is why it is important that I use the conditional read so that the subset is read in, versus the full data set.
Here is a small data set so my problem can be reproduced:
write.csv(iris, "iris.csv", row.names = F)
library(sqldf)
csvFile <- "iris.csv"
I am finding that the notation you have to use is extremely awkward using read.csv.sql the following is the how I am reading in the file:
# Step 1 (Assume these values are coming from UI)
spec <- 'setosa'
petwd <- 0.2
# Add quotes and make comma-separated:
spec <- toString(sprintf("'%s'", spec))
petwd <- toString(sprintf("'%s'", petwd))
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Species" in ($spec)'
and "Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
My main problem is that if any of the values above (from UI) are null then it won't read in the data properly, because this chunk of code is all hard coded.
I would like to change this into: Step 1 - check which values are null and do not filter off of them, then filter using read.csv.sql for all non-null values on corresponding columns.
Note: I am reusing the code from this similar question within this question.
UPDATE
I want to clear up what I am asking. This is what I am trying to do:
If a field, say spec comes through as NA (meaning the user did not pick input) then I want it to filter as such (default to spec == EVERY SPEC):
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
Since spec is NA, if you try to filter/read in a file matching spec == NA it will read in an empty data set since there are no NA values in my data, hence breaking the code and program. Hope this clears it up more.
There are several problems:
some of the simplifications provided in the link in the question were not followed.
spec is a scalar so one can just use '$spec'
petwd is a numeric scalar and SQL does not require quotes around numbers so just use $petwd
the question states you want to handle empty fields but not how so we have used csvfix to map them to -1 and also strip off quotes. (Alternately let them enter and do it in R. Empty numerics will come through as 0 and empty character fields will come through as zero length character fields.)
you can use [...] in place of "..." in SQL
The code below worked for me in both Windows and Ubuntu Linux with the bash shell.
library(sqldf)
spec <- 'setosa'
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file where [Species] = '$spec' and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix map -smq -fv "" -tv -1'
)
Update
Regarding the update at the end of the question it was clarified that the NA could be in spec as opposed to being in the data being read in and that if spec is NA then the condition involving spec should be regarded as TRUE. In that case just expand the SQL where condition to handle that as follows.
spec <- NA
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file
where ('$spec' == 'NA' or [Species] = '$spec') and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix echo -smq'
)
The above will return all rows for which Petal.Width is 0.2 .
I'm using RJDBC in RStudio to pull a set of data from an Oracle database into R.
After loading the RJDBC package I have the following lines:
drv = JDBC("oracle.jdbc.OracleDriver", classPath="C:/R/ojdbc7.jar", identifier.quote = " ")
conn = dbConnect(drv,"jdbc:oracle:thin:#private_server_info", "804301", "password")
rs = dbSendQuery(conn, statement= paste("LONG SQL QUERY TO SELECT REQUIRED DATA INCLUDING REQUEST FOR VARIABLE x"))
masterdata = fetch(rs, n = -1) # extract all rows
Run through the usual script, they always execute without fail; it can sometimes take a few minutes dependent on variable x, e.g. may result in 100K rows or 1M rows being pulled. masterdata will return everything in a dataframe.
I'm now trying to place all of the above into a function, with one required argument, variable x which is a TEXT argument (a city name); this input however is also part of the LONG SQL QUERY.
The function I wrote called Data_Grab is as follows:
Data_Grab = function(x) {
drv = JDBC("oracle.jdbc.OracleDriver", classPath="C:/R/ojdbc7.jar", identifier.quote = " ")
conn = dbConnect(drv,"jdbc:oracle:thin:#private_server_info", "804301", "password")
rs = dbSendQuery(conn, statement= paste("LONG SQL QUERY TO SELECT REQUIRED DATA,
INCLUDING REQUEST FOR VARIABLE x"))
masterdata = fetch(rs, n = -1) # extract all rows
return (masterdata)
}
My function appears to execute in seconds (no error is produced) however I get just the 21 column headings for the dataframe and the line
<0 rows> (or 0-length row.names)
Not sure what is wrong here; obviously expecting function to still take minutes to execute as data being pulled is large, but not being returned any actual data frame.
Help is appreciated!
if you want to parameterize your query to a JDBC database, try also using the gsubfn package. code might look like this:
library(gsubfn)
library(RJDBC)
Data_Grab = function(x) {
rd1 = x
df <- fn$dbGetQuery(conn,"SELECT BLAH1, BLAH2
FROM TABLENAME
WHERE BLAH1 = '$rd1')
return(df)
basically, you need to put a $ before the variable name that stores the parameter you wish to pass.
I'm a beginner to R from a SAS background trying to do a basic "case when" match on two tables to get a flag where I have and have not found a match. Please see the SAS code I have in mind below. I just need something analogous to this in R. Thanks in advance.
proc sql;
create table
x as
select
a.*,
b.*,
case when a.first_column=b.column_first and
a.second_column=b.column_second
then 1 else 0 end as matched_flag
from table1 as a
left join
table2 as b
on a.first_column=b.column_first and a.second_column=b.column_second;
quit;
I'm not familiar with SAS, but I think I understand what you are trying to do. To see how many rows/columns are similar between two tables, you can use %in% and the length function.
For example, initialize two matrices of different dimensions and given them similar row names and column names:
mat.a <- matrix(1, nrow=3, ncol = 2)
mat.b <- matrix(1, nrow=2, ncol = 3)
rownames(mat.a) <- c('a','b','c')
rownames(mat.b) <- c('a','d')
colnames(mat.a) <- c('g','h')
colnames(mat.b) <- c('h','i')
mat.a and mat.b now exist with different row and column names. To match the rows by names, you can use:
row.match <- rownames(mat.a)[rownames(mat.a) %in% rownames(mat.b)]
num.row.match <- length(row.match)
Note that row.match can now be used to index into both of the matrices. The %in% operator returns a logical of the same length of the first argument (in this case, rownames(mat.a)) that indicates if the ith element of the first argument was found anywhere in the elements of the second argument. This nature of %in% means that you have to be sensitive to how you order the arguments for your indexing.
If you simply want to quantify how many rows or columns are the same between the two matrices, then you can use the sum function with the %in% operator:
sum(rownames(mat.a) %in% rownames(mat.b))
With the sum function used like this, you do not need to be sensitive to how you order the arguments, because the number of row names of mat.a in row names of mat.b is equivalent to the number of row names of mat.b in row names of mat.a. That is to say that this usage of %in% is commutative.
I hope this helps!
You will want to use dataframe objects. These are like datasets in SAS. You can use bind to put two dataframe objects together side by side. Then you can select rows based on conditions and set the flag based on this. In the code below you will see that I did this twice: once to set the 1 flag and once to set the 0 flag.
To select the rows where all fields match you can do something similar, but instead of assigning a new column you can assign all the results back to the name of the table you are working on.
Here's the code:
# make up example a and b data frames
table1 <- data.frame(list(a.first_column=c(1,2,3),a.second_column=c(4,5,6)))
table2 <- data.frame(list(b.first_column=c(1,3,6),b.second_column=c(4,5,9)))
# Combine columns (horizontally)
x <- cbind(table1, table2)
print("Combined Data Frames")
print(x)
# create matched flag (1 when the first columns match)
x$matched_flag[x$a.first_column==x$b.first_column] <- 1
x$matched_flag[!x$a.first_column==x$b.first_column] <- 0
# only select records that match both data frames
x <- x[x$a.first_column==x$b.first_column & x$a.second_column==x$b.second_column,]
print("Matched Data Frames")
print(x)
BTW: since you are used to using SQL, you might want to try the sqldf package in R. It will let you use the same techniques that you are used to but in R and on data frames.
I am new to this library(stringr). In my df I have this column named Sentences which contains a single sentence in each row. Now I want to find the position of a word and 3 words before and after the word..
For eg-
string <- "We have a three step process to validate
claims data we use in our analysis."
If we search for the word validate it will return 8 ,and the words---- 'step' 'process' 'to' 'claims' 'data' 'we'. I tried str_match and str_extract.
Use strsplit and grep:
myString <- "We have a three step process to validate claims data we use in our analysis."
# Split the string into individual words
splitString <- strsplit(myString, " ")[[1]]
# Find the location of the word of interest
loc <- grep("validate", splitString)
# Subset as you normally would
splitString[(loc-3):(loc+3)]
# [1] "step" "process" "to" "validate" "claims" "data" "we"
Update
If you have multiple strings in a vector, you can try something like the following. I've modified it a bit to be on the safer side and not try to extract non-existent positions.
words <- c("How data is Validated?",
"We have a three step process to validate claims data we use in our analysis.",
"Sample Validate: Since No One vendor can provide the total population of claims in a given geographic region")
x <- strsplit(words, " ")
lapply(x, function(y) {
len <- length(y)
locs <- grep("validate", y, ignore.case=TRUE)
min <- ifelse((locs - 3) <= 0, 1, locs-3)
max <- ifelse((locs + 3) >= length(y), length(y), locs + 3)
y[min:max]
})
# [[1]]
# [1] "How" "data" "is" "Validated?"
#
# [[2]]
# [1] "step" "process" "to" "validate" "claims" "data" "we"
#
# [[3]]
# [1] "Sample" "Validate:" "Since" "No" "One"
The result, as you can see, is a list of vectors.