Trying to solve issue with wrong display of national characters (Polish) in results of query to MS SQL database.
The script is pretty standard
First the definnition connection object
library(DBI)
db.conn <- DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server Native Client 11.0",
Server = "10.0.0.100",
Port = 1433,
Database = "DB",
UID = "user",
PWD = rstudioapi::askForPassword("Database password"),
encoding = "latin1"
)
then SQL statement
db_sql = "
select
*
from test
where active = 'ACTIVE'
order by name_id"
Then execution of SQL
db_query <- dbSendQuery(db.conn, db_sql)
db_data <- dbFetch(db_query)
or
db_data <- dbGetQuery(db.conn, db_sql)
It does not matter whether in connection object definition I use "latin1", "windows-1250" or "utf-8" parameter for encoding parameter the results are always the same
Strings with U+009C or similar
It also does not matter what codepage I select in RStudio Global options.
Problem solved.
First it is necessary to change locale to Polish
Sys.setlocale(category = "LC_ALL", locale = "Polish")
Then set proper encoding in
DBI::dbConnect(odbc::odbc()
...
encoding = "windows-1250"
and voilla, working.
Related
I'm trying to catalog the structure of a MSSQL 2008 R2 database using R/RODBC. I have set up a DSN, connected via R and used the sqlTables() command but this is only getting the 'system databases' info.
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlTables(conn1)
However if I do this:
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlQuery('USE my_db_1')
sqlTables(conn1)
I get the tables associated with the my_db_1 database. Is there a way to see all of the databases and tables without manually typing in a separate USE statement for each?
There may or may not be a more idiomatic way to do this directly in SQL, but we can piece together a data set of all tables from all databases (a bit more programatically than repeated USE xyz; statements) by getting a list of databases from master..sysdatabases and passing these as the catalog argument to sqlTables - e.g.
library(RODBC)
library(DBI)
##
tcon <- RODBC::odbcConnect(
dsn = "my_dsn",
uid = "my_uid",
pwd = "my_pwd"
)
##
db_list <- RODBC::sqlQuery(
channel = tcon,
query = "SELECT name FROM master..sysdatabases")
##
R> RODBC::sqlTables(
channel = tcon,
catalog = db_list[14, 1]
)
(I can't show any of the output for confidentiality reasons, but it produces the correct results.) Of course, in your case you probably want to do something like
all_metadata <- lapply(db_list$name, function(DB) {
RODBC::sqlTables(
channel = tcon,
catalog = DB
)
})
# or some more efficient variant of data.table::rbindlist...
meta_df <- do.call("rbind", all_metadata)
New to R shiny and SQL
I have made some reactive dashboards but none yet using SQL database connection.
Here is my toy:
The database is the MySQL world database.
I want to join various tables and show some columns from each, but I want to be able to filter by Language found in the CountryLanguage table.
My WHERE statement doesn't work.
Current code:
ui <- fluidPage(
numericInput("nrows", "Enter the number of rows to display:", 5),
selectizeInput("inputlang", label = "Language", choices = NULL, selected = NULL, options = list(placeholder = "Please type a language")),
tableOutput("tbl")
)
server <- function(input, output, session) {
output$tbl <- renderTable({
conn <- dbConnect(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest")
on.exit(dbDisconnect(conn), add = TRUE)
dbGetQuery(conn, paste0(
"SELECT City.Name, City.Population, Country.Name, Country.Continent, CountryLanguage.Language, CountryLanguage.Percentage
FROM City
INNER JOIN Country on City.CountryCode = Country.Code
INNER JOIN CountryLanguage on Country.Code = CountryLanguage.CountryCode
WHERE CountryLanguage.Language = reactive({get(input$Selectize)})
LIMIT ", input$nrows, ";"))
})
}
shinyApp(ui, server)
I did not expect that code to work, but tried anyway. I suspect I can't pass an R command from within a dbGetQuery because it is expecting SQL syntax only. Is that correct?
So... what is the best way to set something like this up? I imagine I could make the joined selected stuff into a dataframe like
df <-dbGetQuery ( SELECT & JOIN)
dffilter <- df %>% filter ()
But is that going to make things super slow if the dataset is still quite large?
What would be the best practice here?
Having reactive(...) in a string is not evaluated, it's just a string. Further, DBI is not using glue on the query, so {get(...)} will do nothing.
You define the input as input$inputlang but in your reactive, you reference input$Selectize, I think that's a mistake.
You may want to consider parameterized queries vice constructing query strings manually. While there are security concerns about malicious SQL injection (e.g., XKCD's Exploits of a Mom aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's a single data analyst running the query. Both DBI (with odbc) and RODBC support parameterized queries, either natively or via add-ons.
While this does not work for the LIMIT portion, it is useful for most other portions of a query. For that limit part, the req(is.numeric(input$nrows)) should be a reasonable check to ensure inadvertent injection problems.
Try this:
output$tbl <- renderTable({
req(is.numeric(input$nrows), input$inputlang)
conn <- dbConnect(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest")
on.exit(dbDisconnect(conn), add = TRUE)
dbGetQuery(conn, paste("
SELECT City.Name, City.Population, Country.Name, Country.Continent, CountryLanguage.Language, CountryLanguage.Percentage
FROM City
INNER JOIN Country on City.CountryCode = Country.Code
INNER JOIN CountryLanguage on Country.Code = CountryLanguage.CountryCode
WHERE CountryLanguage.Language = ?
LIMIT ", input$nrows),
params = list(input$inputlang))
})
I am running a sql script within R. There is a date filter within the script and right now it is hardcoded in. Using the extract I run some analysis, etc. My final goal is to turn this script into Shiny.
I want to be able to make the date filter a prompt using readlines. Does anyone know if I can stick the date in the SQL script using that readline output?
For example:
Readline asks Start date?
Input as 2020-10-01 and gets set as X
The sql code reads:
SELECT * from database
WHERE DATE= 'X'
Thank you!
Using Parameterised Query
As pointed out in the comments, using paste is unsafe as it leaves the system vulnerable to exploits such as sql injection attacks. This can be mitigated by using parametrised queries, this is a helpful document for reference.
# Get input
r <- readline("Input Date: ")
# Using parameterised Query
sql <- "SELECT * FROM database WHERE DATE = ?"
# Send the paramterised query
query <- DBI::dbSendQuery(conn_string, sql)
# Bind the parameter
DBI::dbBind(query, list(r))
# Fetch Result
DBI::dbFetch(query)
You can also use the sqlInterpolate if having trouble with parameterised query,
query <-
sqlInterpolate(conn_string,
"SELECT * FROM database WHERE DATE = ?date",
date = r
)
# Get Results
dbGetQuery(conn_string, query)
Unsafe way
You can use paste0() to process the date string using single quotes correctly.
# Get input
r <- readline("Input Date: ")
# Paste input with query string noting the use of single quotes '
query <- paste0("SELECT * FROM database WHERE DATE = '", r, "'")
# Make query
dbGetQuery(conn, query)
A similar question was asked.
The goal is to create the function described here:
def DB_Query(d1, d2):
conn = pyodbc.connect('DSN=DATASOURCE')
tbl = "SELECT TableA.Field_1 \
FROM TableA \
WHERE TableA.Date>=? AND TableA.Date<=?"
df_tbl= pd.read_sql_query(tbl, conn, params = (d1,d2))
conn.close
return df_tbl
This worked on database with SQL Server driver, but it won't work on Microsoft ODBC for Oracle driver.
When I give, for e.g., d1 = '2020-02-20' and d2 = '2020-02-25', I get error
('HY004', '[HY004] [Microsoft][ODBC Driver Manager] SQL data type out of range (0) (SQLBindParameter)')
I understand in Oracle, you need DATE 'YYYY-MM-DD' to express a date, which is different from SQL server where you can just use 'YYYY-MM-DD'.
I've tried add DATE in front of ? but doesn't work. Any ideas?
Found solution here. Just define d1 as date(2020,02,20).
How to parameterize datestamp in pyODBC query?
In R, i am using the following function,which uses 3 or 4 database operation within that function. But an error message is displaying like this:
Error in sqliteExecStatement(conn, statement, ...) :
RS-DBI driver: (RS_SQLite_exec: could not execute1: database is locked)
What modification i need to make in my code? my code is as follows:
library('RSQLite')
test <- function(portfolio,date,frame){
lite <- dbDriver("SQLite", max.con = 25)
db <- dbConnect(lite, dbname = "portfolioInfo1.db")
sql <- paste("SELECT * from ", portfolio," where portDate='", date, "' ", sep = "")
res <- dbSendQuery(db, sql)
data <- fetch(res)
frame1 <- data.frame(portDate=date,frame)
lite <- dbDriver("SQLite", max.con = 25)
db <- dbConnect(lite, dbname = "portfolioInfo1.db")
sql <- paste("delete from ", portfolio," where portDate='", date, "' ", sep = "")
res <- dbSendQuery(db, sql)
lite <- dbDriver("SQLite", max.con = 25)
db <- dbConnect(lite, dbname = "portfolioInfo1.db")
dbWriteTable(db,portfolio,frame1,append=TRUE,row.names=FALSE)
}
tick <- c("AAPL","TH","YHOO")
quant <- c("121","1313","131313131")
frame <-data.frame(ticker=tick,quantities=quant)
#print(frame)
test("RUSEG","2006-02-28",frame)
It seems that you connect several times to the same database without disconnecting. Probably the database goes into a lock if a connection is made to prevent anyone else from editing a database which is already being edited.
Either disconnect after each connect, or simply connect once, perform all the queries, and than finally disconnect.
More precisely, multiple processes can read an SQLite database file simultaneously, but only one process at a time can write: SQLite documentation, File Locking
In my case I was using the DB Browser to add the table. I didn't save the changes, that's why trying to connect in RStudio (Shiny) did not work.