Query SQL Server from R with ETLUtils for big tables - sql

Normally to query a sql-server database from R, I'd use:
library(RODBC)
con <- odbcConnect(dsn = "ESTUDIOS", uid = "estudios", pwd = "yyyy")
sql_trx <- "SELECT [Fecha], [IDServicio]
FROM [ESTUDIOS].[dbo].[TRX] where MONTH(Fecha) = MONTH('2016-08-01') and YEAR(Fecha) = YEAR('2016-08-01');"
trx.server <- sqlQuery(channel = con, sql_trx)
odbcClose(con)
But when the table of the database is too big, I could the use the libraries: ff and ETLUtils.
So, the normal thing to do must be:
library(RODBC)
library(ff)
library(ETLUtils)
sql2_trx <- read.odbc.ffdf(query = sql_trx, odbcConnect.args = list(con))
But this doesn't give me the desired result, instead this returned the following error.
1: In RODBC::odbcDriverConnect("DSN=11") :
[RODBC] ERROR: state IM002, code 0, message [Microsoft][Administrador de controladores ODBC] No se encuentra el nombre del origen de datos y no se especificó ningún controlador predeterminado
2: In RODBC::odbcDriverConnect("DSN=11") : ODBC connection failed
Can you point out what is wrong with the use of read.odbc.ffdf ?

Currently you are passing what seems to be the previous RODBC connection object, con, into read.odbc.ffdf() but remember the method is attempting to create an ODBC connection and call a query. The R docs mention the proper assignment of odbcConnect.args:
odbcConnect.args a list of arguments to pass to ODBC's odbcConnect
(like dsn, uid, pwd)
Consider passing your original DSN and credentials like you did in regular RODBC connection:
sql2_trx <- read.odbc.ffdf(query = sql_trx, odbcConnect.args = list(dsn = "ESTUDIOS", uid = "estudios", pwd = "yyyy"))

Related

Login failed for user token-identified principal when connecting via sql_driver_manager.getConnection

I am trying to connect to Azure SQL using Service Principle to create views, but it says
com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user ClientConnectionId: XXXXX-XXXX-XXXX
However, with the same SPN I was able to connect and create tables, read tables.
import adal
resource_app_id_url = "https://database.windows.net/"
service_principal_id = dbutils.secrets.get(scope = "XX", key = "XXX")
service_principal_secret = dbutils.secrets.get(scope = "XX", key = "spn-XXXX")
tenant_id = dbutils.secrets.get(scope = "XX", key = "xxId")
authority = "https://login.windows.net/" + tenant_id
azure_sql_url = "jdbc:sqlserver://xxxxxxx.windows.net"
database_name = "testDatabase"
encrypt = "true"
host_name_in_certificate = "*.database.windows.net"
context = adal.AuthenticationContext(authority)
token = context.acquire_token_with_client_credentials(resource_app_id_url, service_principal_id, service_principal_secret)
access_token = token["accessToken"]
using above code I am able to create and read tables. There is a requirement to create views so I am using sql_driver_manager to connect to Azure SQL
properties = spark._sc._gateway.jvm.java.util.Properties()
properties.setProperty("accessToken", access_token)
sql_driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager
sql_con = sql_driver_manager.getConnection(azure_sql_url, properties)
query = """
create or alter view test_view as select * from dbo.test_table
"""
stmt = sql_con.createStatement()
stmt.executeUpdate(query)
stmt.close()
this is resulting in an error:
Py4JJavaError: An error occurred while calling
z:java.sql.DriverManager.getConnection. :
com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user token-identified principal. ClientConnectionId:
If I try the same with username and password instead of token, it works but I just need to use spn token for authenticating.
Working code:
sql_driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager
sql_con = sql_driver_manager.getConnection(azure_sql_url, username, password)
query = """
create or alter view test_view as select * from dbo.test_table
"""
stmt = sql_con.createStatement()
stmt.executeUpdate(query)
stmt.close()
What is that I am missing, can someone help me understand the issue. Thanks.
You can not use that method since it was built to execute an update statement and return indexes.
See documentation. Use the prepare() and execute() methods.
https://learn.microsoft.com/mt-mt/sql/connect/jdbc/reference/executeupdate-method-java-lang-string?view=azuresqldb-current
#
# 5 - Upsert from stage to active table.
# Grab driver manager conn, exec sp
#
driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager
connection = driver_manager.getConnection(url, user_name, password)
connection.prepareCall("exec [stage].[UpsertFactInternetSales]").execute()
connection.close()
This is sample code from an article I wrote. It calls a stored procedure to execute an UPSERT. However, any DML or DDL will work as long as it does not return a result set.

Why I'm getting an error in DataBricks connection with a SQL database?

I am trying to connect to a SQL Server but somehow i'm getting the below error when trying to connect to db from databricks using Python:
Error:java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbd.SQLServerDriver
My connection code is the next one:
jdbcHostname = "hostname"
jdbcDatabase = "databasename"
jdbcPort = port
username = 'username'
password = 'password'
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
"user" : username,
"password" : password,
"driver" : "com.microsoft.sqlserver.jdbd.SQLServerDriver"
}
The last code works, but when I try to execute a query I got the mentioned error coming out from the second code line from the next block:
pushdown_query = "select * from table"
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
display(df)
I tried to install different connectors but I have not been lucky with it, could you help me?

Character encoding in R MySQL on Linux machine

I'm trying to fetch data which includes some German word with umlaut characters. following the bellow structure everything is fine in windows machine :
Sys.setlocale('LC_ALL','C')
library(RMySQL)
conn <- dbConnect(MySQL(), user = "user", dbname = "database",
host = "host", password = "pass")
sql.query <- paste0("some query")
df <- dbSendQuery(conn, sql.query)
names <- fetch(df, -1)
dbDisconnect(conn)
As an example I have :
names[1230]
[1] "Strübbel"
What should I change in order to get the same result in Linux Ubuntu ?
the query will run without problem, but the result is :
names[1230]
[1] "Str\374bbel"
I have checked This solution, but when I put the 'set character set "utf8"' inside of query I'm getting the following error :
df <- dbSendQuery(conn, sql.query, 'set character set "utf8"')
names <- fetch(df, -1)
Error in .local(conn, statement, ...) :
unused argument ("set character set \"utf8\"")
I should mention the encoding for the result is unknown :
Encoding(names[1230])
[1] "unknown"
and doing the :
Encoding(names[1230]) <- "UTF-8"
names[1230]
[1] "Str<fc>bbel"
does not solve the problem !
Instead of :
Sys.setlocale('LC_ALL','C')
You have to use :
Sys.setlocale('LC_ALL','en_US.UTF-8')
and in the sql query :
library(RMySQL)
conn <- dbConnect(MySQL(), user = "user", dbname = "database",
host = "host", password = "pass")
sql.query <- paste0("some query")
dbSendQuery(conn,'set character set "utf8"')
df <- dbSendQuery(conn, sql.query)
names <- fetch(df, -1)
dbDisconnect(conn)
Not sure if this solution will help you but you could try such approach:
con <- dbConnect(MySQL(), user = "user", dbname = "database",
host = "host", password = "pass", encoding = "ISO-8859-1")
If this encoding doesn't work then try "brute force" with different variants

shinyapps.io does not work when my shiny use RODBC to link a SQL database

On my local computer, I use shiny to design a web page to show the analysis result. The data is extracted from the company's SQL database using RODBC to link the database to R. The code is like this:
library(shiny)
library(shinydashboard)
library(DT)
library(RODBC)
library(stringr)
library(dplyr)
DNS <- '***'
uid <- '***'
pwd <- '***'
convertMenuItem <- function(mi,tabName) {
mi$children[[1]]$attribs['data-toggle']="tab"
mi$children[[1]]$attribs['data-value'] = tabName
mi
}
sidebar <- dashboardSidebar(
sidebarMenu(
convertMenuItem(menuItem("Query1",tabName="Query1",icon=icon("table"),
dateRangeInput('Date1','Date Range',start = Sys.Date()-1, end = Sys.Date()-1,
separator=" - ",format="dd/mm/yy"),
textInput('Office1','Office ID','1980'),
submitButton("Submit")), tabName = "Query1"),
convertMenuItem(menuItem("Query2",tabName="Query2",icon=icon("table"),
dateRangeInput('Date2','Date Range',start = Sys.Date()-1, end = Sys.Date()-1,
separator=" - ",format="dd/mm/yy"),
textInput('Office2','Office ID','1980'),
submitButton("Submit")), tabName = "Query2"),
)
)
body <- dashboardBody(
tabItems(
tabItem(tabName="Query1",
helpText('********************************'),
fluidRow(
column(12,DT::dataTableOutput('table1'))
)
),
tabItem(tabName = "Query2",h2("Widgets tab content"))
)
)
dashboardheader <- dashboardHeader(
title = 'LOSS PREVENTION'
)
ui <- dashboardPage(
skin='purple',
dashboardheader,
sidebar,
body
)
server <- function(input, output) {
output$table1 <- DT::renderDataTable({
ch<-odbcConnect(DNS,uid=uid,pwd=pwd)
a <- sqlQuery(ch,paste(' ***'))
odbcClose(ch)
DT::datatable(a,options = list(scrollX=T))
})
}
shinyApp(ui, server)
Then, I have my account on shinyapps.io. And use rsconnect to deploy this programm. And the deployment is successful.
But when I use https://myAccount.shinyapps.io/myshiny/ to access my code. I have the following error:
2018-05-10T00:57:38.473259+00:00 shinyapps[340325]: Warning in RODBC::odbcDriverConnect("DSN=****;UID=****;PWD=****") :
2018-05-10T00:57:38.473262+00:00 shinyapps[340325]: [RODBC] ERROR: state IM002, code 0, message [unixODBC][Driver Manager]Data source name not found, and no default driver specified
But, if there is no RODBC and SQL database involved in my code, the code works fine.
So, the problem is because shinyapps.io cannot access my company's SQL database. How can I deal with it?
The app works on your computer because the Data Source Name (DSN) has been configured there. It is not configured on shinyapps.io. According to this help article you can use for example
odbcDriverConnect('Driver=FreeTDS;TDS_Version=7.0;Server=<server>;Port=<port>;Database=<db>;Uid=<uid>;Pwd=<pw>;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;')
A more complete treatment can be found in the documentation.

sqlSave and sqlDrop in R

Here is my code:
library('RODBC')
db.handle <- odbcDriverConnect('driver={SQL Server Native Client 11.0};server=server_name;database = db_name;trusted_connection=yes')
sql_table <- 'db_name.table_name'
sqlDrop(db.handle, sql_table, errors = TRUE)
sqlSave(db.handle,df_small,tablename = sql_table,safer=FALSE,append=TRUE,
rownames = FALSE)
close(db.handle)
When I execute line:
sqlDrop(db.handle, sql_table, errors = TRUE)
I get error message:
Error in odbcTableExists(channel, sqtable, abort = errors) :
‘db_name.table_name’: table not found on channel
When I execute line:
sqlSave(db.handle,df_small,tablename = sql_table,safer=FALSE,append=TRUE,
rownames = FALSE)
I get the following error message:
Error in sqlSave(db.handle, df_small, tablename = sql_table, safer =
FALSE, : 42S01 2714 [Microsoft][SQL Server Native Client 11.0][SQL
Server]
There is already an object named 'table_name' in the database.
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE
db_name.table_name ("State_rename" varchar(255), "CoverageType"
varchar(255))'
I execute code consecutively and cannot understand how both error messages can be true.
Consider removing the schema from the sqltable variable which SQL Server uses with period qualifier. Specifically, change db_name.table_name to table_name. Reason you do not need this schema is your connection handle already specifies the database. With this connection, you cannot access other database schemas in specified server.
library('RODBC')
db.handle <- odbcDriverConnect(paste0('driver={SQL Server Native Client 11.0};',
'server=server_name;database=db_name;trusted_connection=yes'))
sql_table <- 'table_name'
sqlDrop(db.handle, sql_table, errors = TRUE)
sqlSave(db.handle, df_small, tablename = sql_table, safer = FALSE,
append = TRUE, rownames = FALSE)
close(db.handle)
By the way, you can simply use append=FALSE which will overwrite table (first dropping it and then re-creating it) with no need to call sqlDrop:
sqlSave(db.handle, df_small, tablename = sql_table, safer = FALSE,
append = FALSE, rownames = FALSE)