R readline prompt to use in SQL script within R - sql

I am running a sql script within R. There is a date filter within the script and right now it is hardcoded in. Using the extract I run some analysis, etc. My final goal is to turn this script into Shiny.
I want to be able to make the date filter a prompt using readlines. Does anyone know if I can stick the date in the SQL script using that readline output?
For example:
Readline asks Start date?
Input as 2020-10-01 and gets set as X
The sql code reads:
SELECT * from database
WHERE DATE= 'X'
Thank you!

Using Parameterised Query
As pointed out in the comments, using paste is unsafe as it leaves the system vulnerable to exploits such as sql injection attacks. This can be mitigated by using parametrised queries, this is a helpful document for reference.
# Get input
r <- readline("Input Date: ")
# Using parameterised Query
sql <- "SELECT * FROM database WHERE DATE = ?"
# Send the paramterised query
query <- DBI::dbSendQuery(conn_string, sql)
# Bind the parameter
DBI::dbBind(query, list(r))
# Fetch Result
DBI::dbFetch(query)
You can also use the sqlInterpolate if having trouble with parameterised query,
query <-
sqlInterpolate(conn_string,
"SELECT * FROM database WHERE DATE = ?date",
date = r
)
# Get Results
dbGetQuery(conn_string, query)
Unsafe way
You can use paste0() to process the date string using single quotes correctly.
# Get input
r <- readline("Input Date: ")
# Paste input with query string noting the use of single quotes '
query <- paste0("SELECT * FROM database WHERE DATE = '", r, "'")
# Make query
dbGetQuery(conn, query)

Related

List of objects in SQL-Server using R [duplicate]

I'm trying to catalog the structure of a MSSQL 2008 R2 database using R/RODBC. I have set up a DSN, connected via R and used the sqlTables() command but this is only getting the 'system databases' info.
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlTables(conn1)
However if I do this:
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlQuery('USE my_db_1')
sqlTables(conn1)
I get the tables associated with the my_db_1 database. Is there a way to see all of the databases and tables without manually typing in a separate USE statement for each?
There may or may not be a more idiomatic way to do this directly in SQL, but we can piece together a data set of all tables from all databases (a bit more programatically than repeated USE xyz; statements) by getting a list of databases from master..sysdatabases and passing these as the catalog argument to sqlTables - e.g.
library(RODBC)
library(DBI)
##
tcon <- RODBC::odbcConnect(
dsn = "my_dsn",
uid = "my_uid",
pwd = "my_pwd"
)
##
db_list <- RODBC::sqlQuery(
channel = tcon,
query = "SELECT name FROM master..sysdatabases")
##
R> RODBC::sqlTables(
channel = tcon,
catalog = db_list[14, 1]
)
(I can't show any of the output for confidentiality reasons, but it produces the correct results.) Of course, in your case you probably want to do something like
all_metadata <- lapply(db_list$name, function(DB) {
RODBC::sqlTables(
channel = tcon,
catalog = DB
)
})
# or some more efficient variant of data.table::rbindlist...
meta_df <- do.call("rbind", all_metadata)

R RDB Query of date using RODBC Connection

I am trying to query an old Oracle RDB database using something like Sys.Date()-1 as that works in R, but I have not been able to find the right syntax.
The following works but I want to run it on a regular schedule so a fixed time frame will not work :
`Output <- SQLQuery ("SELECT * FROM tablename WHERE productionDate BETWEEN 20-FEB-2017 00:00:00 AND 21-FEB-2017 23:59:00")`
I would like to have something like:
Output <- SQLQuery ("SELECT * FROM tablename WHERE productionDate >= Today()-1")
I have also tried assigning macro variables outside of the query and then call the. Inside the query with no success. The entries go back several years so to take everything takes a couple minutes to run the query and then I have to subset the data after the fact. I would hope there was a better way for R to query a database even an old one based on dates.
Thank you for any help.
something like this?
startDt <- as.Date("2016-02-12")
endDate <- as.Date("2016-03-12")
sqlstr <- paste0("SELECT * FROM tablename WHERE productionDate BETWEEN '",
paste(format(c(startDt, endDate), "%d %B %Y"), collapse="' and '"),"'")
sqlstr

How do I export a SQL query into SPSS?

I have this monster query written in T-SQL that pulls together and crunches data from multiple tables. I can export the result to CSV or Excel easily enough, but would like to send it right into SPSS. The ODBC drivers in SPSS only recognize Tables and Views in my SQL database. Any ideas how to get the results of my query into SPSS?
Options:
Export to Excel then import to SPSS... formatting things like dates become unwieldy
Save query as a table in my database... but then I would have to make a new table every time I run the query, yes?
As recommended below, simply run my SQL statement in the GET DATA statement of my SPSS syntax, but I am struggling with that...
UPDATE: In an attempt to use SPSS to run my SQL query I edited this code and get this error indicating that SPSS doesn't like my declaration of nvarchar (currently investigating how to handle this using alternative method). I have tested my connection between SPSS and SQL and the connection is good:
SQLExecDirect failed :[Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near 'N'.
Here is my query simplified to pull just one field from one table:
GET DATA
/TYPE=ODBC
/CONNECT='DSN=temp_Hisp;Description=tempHisp;UID=;Trusted_Connection=Yes;APP=IBM SPSS '+
'Products: Statistics Common;WSID=ARCH5-50;DATABASE=temp_HispTreat'
/SQL='With CTE_BASENG As (Select StudyID, Visit, Question, CAST(Response As Int) As RESPONSE from temp_HispTreat.dbo.BAS AS PVTable outer apply (values (N'BAS1',BAS1), +'
'(N'BAS24',BAS24)) P(Question, Response)) select SubVis.IRB#, SubVis.StudyID, SubVis.Clin_num, Subvis.Visit, BASENG.BAS_ENGTOT From (Select Distinct IRB#, StudyID, +'
'Clin_Num, Visit_ID As Visit from temp_HispTreat.dbo.Subjects, temp_HispTreat.dbo.StudyStructure where subjects.IRB# = 5516 and StudyStructure.IRB = 5516) As SubVis left join (Select StudyID, +'
'Visit, SUM (Scoring.dbo.GetValue9(response)) As BAS_ENGTOT from CTE_BASENG group by StudyID, Visit) AS BASENG On SubVis.Studyid = BASENG.StudyID And SubVis.Visit = BASENG.Visit'
/ASSUMEDSTRWIDTH=255.
CACHE.
EXECUTE.
Thanks all: Solved. There is quite a bit of tweaking necessary to get SPSS to run SQL query, but this is the best way to export SQL data into SPSS. In my case (values (N'BAS1',BAS1) had to be changed to (values ("BAS1",BAS1) but all of my commands, e.g. outer apply, union, etc, ran like champs! Appreciate the help.
You can use GET DATA procedure to import data from SQL directly in SPSS. See the SQL subcommand. You can use your complicated query here. For example:
GET DATA
/TYPE = ODBC
/CONNECT = "DSN = DSNname"
/SQL = "SELECT * FROM empl_data "
"WHERE ((bdate>=#1/1/1960# and edate<=#12/31/1960#) or bdate is null)".
It is clear why (values (N'BAS1',BAS1) caused the error. Because you are using single quotes for the argument of the SQL subcommand \SQL = ' '. And the first single quote in (values (N'BAS1',BAS1) defines the end of the argument. Switching to double quotes solves it.
I tried to rearrange your code. I can not test it, but I believe it should work:
GET DATA
/TYPE = ODBC
/CONNECT = "DSN=temp_Hisp;DATABASE=temp_HispTreat"
/SQL = "With CTE_BASENG As (Select StudyID, Visit, Question, "
"CAST(Response As Int) As RESPONSE "
"from temp_HispTreat.dbo.BAS AS PVTable "
"outer apply (values (N'BAS1',BAS1), (N'BAS24',BAS24)) "
"P(Question, Response)) "
"select SubVis.IRB#, SubVis.StudyID, SubVis.Clin_num, Subvis.Visit, "
"BASENG.BAS_ENGTOT "
"From (Select Distinct IRB#, StudyID, Clin_Num, Visit_ID As Visit "
"from temp_HispTreat.dbo.Subjects, temp_HispTreat.dbo.StudyStructure "
"where subjects.IRB# = 5516 and StudyStructure.IRB = 5516) As SubVis "
"left join (Select StudyID, Visit, "
"SUM(Scoring.dbo.GetValue9(response)) As BAS_ENGTOT "
"from CTE_BASENG group by StudyID, Visit) AS BASENG On "
"SubVis.Studyid = BASENG.StudyID And SubVis.Visit = BASENG.Visit".
The SQL is processed by the ODBC driver, so the capabilities of that driver will determine what sort of SQL can be issued. The capabilities may be database specific. Someetimes there are multiple drivers available for a particular database, some from the IBM SPSS Data Access Pack and some from a db vendor directly, so you may want to investigate what is available for your particular database.

SQL String in VBA in Excel 2010 with Dates

I've had a look around but cannot find the issue with this SQL Statement:
strSQL = "SELECT Directory.DisplayName, Department.DisplayName, Call.CallDate, Call.Extension, Call.Duration, Call.CallType, Call.SubType FROM (((Department INNER JOIN Directory ON Department.DepartmentID = Directory.DepartmentID) INNER JOIN Extension ON (Department.DepartmentID = Extension.DepartmentID) AND (Directory.ExtensionID = Extension.ExtensionID)) INNER JOIN Site ON Extension.SiteCode = Site.SiteCode) INNER JOIN Call ON Directory.DirectoryID = Call.DirectoryID WHERE (Call.CallDate)>=27/11/2012"
Regardless of what I change the WHERE it always returns every single value in the database (atleast I assume it does since excel completely hangs when I attempt this) this SQL statement works perfectly fine in Access (if dates have # # around them). Any idea how to fix this, currently trying to create a SQL statement that allows user input on different dates, but have to get over the this random hurdle first.
EDIT: The date field in the SQL Database is a DD/MM/YY HH:MM:SS format, and this query is done in VBA - EXCEL 2010.
Also to avoid confusion have removed TOP 10 from the statement, that was to stop excel from retrieving every single row in the database.
Current Reference I have activated is: MicrosoftX Data Objects 2.8 Library
Database is a MSSQL, using the connection string:
Provider=SQLOLEDB;Server=#######;Database=#######;User ID=########;Password=########;
WHERE (Call.CallDate) >= #27/11/2012#
Surround the date variable with #.
EDIT: Please make date string unambiguous, such as 27-Nov-2012
strSQL = "SELECT ........ WHERE myDate >= #" & Format(dateVar, "dd-mmm-yyyy") & "# "
If you are using ado, you should look at Paramaters instead of using dynamic query.
EDIT2: Thanks to #ElectricLlama for pointing out that it is SQL Server, not MS-Access
strSQL = "SELECT ........ WHERE myDate >= '" & Format(dateVar, "mm/dd/yyyy") & "' "
Please verify that the field Call.CallDate is of datatype DATETIME or DATE
If you are indeed running this against SQL Server, try this syntax for starters:
SELECT Directory.DisplayName, Department.DisplayName, Call.CallDate,
Call.Extension, Call.Duration, Call.CallType, Call.SubType
FROM (((Department INNER JOIN Directory
ON Department.DepartmentID = Directory.DepartmentID)
INNER JOIN Extension ON (Department.DepartmentID = Extension.DepartmentID)
AND (Directory.ExtensionID = Extension.ExtensionID))
INNER JOIN Site ON Extension.SiteCode = Site.SiteCode)
INNER JOIN Call ON Directory.DirectoryID = Call.DirectoryID
WHERE (Call.CallDate)>= '2012-11-27'
The date format you see is simply whatever format your client tool decides to show it in. Dates are not stored in any format, they are effectively stored as a duration since x.
By default SQL Uses the format YYYY-MM-DD if you want to use a date literal.
But you are much better off defining a parameter of type date in your code and keeping your date a data type 'date' for as long as possible. This may include only allowing them to enter the date using a calendar control to stop ambiguities.

R RODBC putting list of numbers into an IN() statement

I've looked at the 'Pass R variable to RODBC's sqlQuery with multiple entries? ' already but can't seem to get it to work. I'm trying to do an sqlQuery() from R on a SQL Server 2008 R2 db. I'm trying to get a sample from a large db based on row numbers. First I created a list of random numbers:
sampRowNum <- sample(seq(1,100000,1), 5000)
Then I try to use those numbers in a query using:
query1 <- sqlQuery(channel, paste("select *
FROM db where row_id in (", sampRowNum,")", sep=""))
I get just the results from the db where the row_id is equal to the first number in sampRowNum. Any suggestions?
You're not pasteing your query together correctly.
If you run the paste statement in isolation, you'll see that you get a vector of length 5000, so sqlQuery is only executing the first one of those, corresponding to the first element in samRowNum.
What you want to do is something more like this:
paste("select * FROM db where row_id in (",
paste(sampRowNum,collapse = ","),")", sep="")
Just as an added note (and since I've had to do stuff like this a lot...) constructing sql queries with an IN clause with strings is a bit more of a nuisance, since you have to tack on all the single quotes:
vec <- letters[1:5]
paste("SELECT * FROM db WHERE col IN ('",
paste(vec,collapse = "','"),"')",sep = "")
[1] "SELECT * FROM db WHERE col IN ('a','b','c','d','e')"
If you do this a lot, you'll end up writing a little function that does that pasting of character vectors for you.
As always, this kind of SQL string manipulation is Not Good if you are dealing with user inputs (e.g. in a web app), due to SQL injection attacks. In my particular situation this isn't much of a concern, but in general people will prefer parametrized queries if you don't have much control over the input values.