Character encoding in R MySQL on Linux machine - sql

I'm trying to fetch data which includes some German word with umlaut characters. following the bellow structure everything is fine in windows machine :
Sys.setlocale('LC_ALL','C')
library(RMySQL)
conn <- dbConnect(MySQL(), user = "user", dbname = "database",
host = "host", password = "pass")
sql.query <- paste0("some query")
df <- dbSendQuery(conn, sql.query)
names <- fetch(df, -1)
dbDisconnect(conn)
As an example I have :
names[1230]
[1] "Strübbel"
What should I change in order to get the same result in Linux Ubuntu ?
the query will run without problem, but the result is :
names[1230]
[1] "Str\374bbel"
I have checked This solution, but when I put the 'set character set "utf8"' inside of query I'm getting the following error :
df <- dbSendQuery(conn, sql.query, 'set character set "utf8"')
names <- fetch(df, -1)
Error in .local(conn, statement, ...) :
unused argument ("set character set \"utf8\"")
I should mention the encoding for the result is unknown :
Encoding(names[1230])
[1] "unknown"
and doing the :
Encoding(names[1230]) <- "UTF-8"
names[1230]
[1] "Str<fc>bbel"
does not solve the problem !

Instead of :
Sys.setlocale('LC_ALL','C')
You have to use :
Sys.setlocale('LC_ALL','en_US.UTF-8')
and in the sql query :
library(RMySQL)
conn <- dbConnect(MySQL(), user = "user", dbname = "database",
host = "host", password = "pass")
sql.query <- paste0("some query")
dbSendQuery(conn,'set character set "utf8"')
df <- dbSendQuery(conn, sql.query)
names <- fetch(df, -1)
dbDisconnect(conn)

Not sure if this solution will help you but you could try such approach:
con <- dbConnect(MySQL(), user = "user", dbname = "database",
host = "host", password = "pass", encoding = "ISO-8859-1")
If this encoding doesn't work then try "brute force" with different variants

Related

Loading Data from sql table to spark dataframe in Azure Databricks

I'm new to dbricks and I'm learning it. I am trying to load a SQL table into a dataframe. I am following the official documentation from Microsoft.
But I am getting this error:
java.lang.ClassNotFoundException:
My notebook block:
connectionProperties = {
"Driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
server_name = "jdbc:sqlserver://removed.database.windows.net"
database_name = "demo"
url = server_name + ";" + "databaseName=" + database_name + ";"
table_name = "Production.Data"
username = "removed"
password = "dummy"
try:
Dataframe = spark.read \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", table_name) \
.option("user", username) \
.option("password", password).load()
except ValueError as error :
print("Connector write failed", error)
Error :
java.lang.ClassNotFoundException
This error mainly happens because of package missing, please install below package.
Loading Data from SQL table to spark dataframe
Please follow below repro it has a detailed information about how to connect Azure SQL to Databricks.
PySpark
Updated code:
jdbcHostname = "xxxx.database.windows.net"
jdbcDatabase = "Databasename"
jdbcPort = "1433"
username = "sql Username"
password = "xxxxx"
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
conProp = {
"user" : username,
"password" : password,
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
def1 = "(Select * from Table_name where Table_ID = 1) Table_ID "
df = spark.read.jdbc(url=jdbcUrl, table=def1, properties=conProp)
display(df)
Output:
Reference:
https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases
https://www.sqlshack.com/load-data-into-azure-sql-database-from-azure-databricks/

Why I'm getting an error in DataBricks connection with a SQL database?

I am trying to connect to a SQL Server but somehow i'm getting the below error when trying to connect to db from databricks using Python:
Error:java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbd.SQLServerDriver
My connection code is the next one:
jdbcHostname = "hostname"
jdbcDatabase = "databasename"
jdbcPort = port
username = 'username'
password = 'password'
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
"user" : username,
"password" : password,
"driver" : "com.microsoft.sqlserver.jdbd.SQLServerDriver"
}
The last code works, but when I try to execute a query I got the mentioned error coming out from the second code line from the next block:
pushdown_query = "select * from table"
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
display(df)
I tried to install different connectors but I have not been lucky with it, could you help me?

shinyapps.io does not work when my shiny use RODBC to link a SQL database

On my local computer, I use shiny to design a web page to show the analysis result. The data is extracted from the company's SQL database using RODBC to link the database to R. The code is like this:
library(shiny)
library(shinydashboard)
library(DT)
library(RODBC)
library(stringr)
library(dplyr)
DNS <- '***'
uid <- '***'
pwd <- '***'
convertMenuItem <- function(mi,tabName) {
mi$children[[1]]$attribs['data-toggle']="tab"
mi$children[[1]]$attribs['data-value'] = tabName
mi
}
sidebar <- dashboardSidebar(
sidebarMenu(
convertMenuItem(menuItem("Query1",tabName="Query1",icon=icon("table"),
dateRangeInput('Date1','Date Range',start = Sys.Date()-1, end = Sys.Date()-1,
separator=" - ",format="dd/mm/yy"),
textInput('Office1','Office ID','1980'),
submitButton("Submit")), tabName = "Query1"),
convertMenuItem(menuItem("Query2",tabName="Query2",icon=icon("table"),
dateRangeInput('Date2','Date Range',start = Sys.Date()-1, end = Sys.Date()-1,
separator=" - ",format="dd/mm/yy"),
textInput('Office2','Office ID','1980'),
submitButton("Submit")), tabName = "Query2"),
)
)
body <- dashboardBody(
tabItems(
tabItem(tabName="Query1",
helpText('********************************'),
fluidRow(
column(12,DT::dataTableOutput('table1'))
)
),
tabItem(tabName = "Query2",h2("Widgets tab content"))
)
)
dashboardheader <- dashboardHeader(
title = 'LOSS PREVENTION'
)
ui <- dashboardPage(
skin='purple',
dashboardheader,
sidebar,
body
)
server <- function(input, output) {
output$table1 <- DT::renderDataTable({
ch<-odbcConnect(DNS,uid=uid,pwd=pwd)
a <- sqlQuery(ch,paste(' ***'))
odbcClose(ch)
DT::datatable(a,options = list(scrollX=T))
})
}
shinyApp(ui, server)
Then, I have my account on shinyapps.io. And use rsconnect to deploy this programm. And the deployment is successful.
But when I use https://myAccount.shinyapps.io/myshiny/ to access my code. I have the following error:
2018-05-10T00:57:38.473259+00:00 shinyapps[340325]: Warning in RODBC::odbcDriverConnect("DSN=****;UID=****;PWD=****") :
2018-05-10T00:57:38.473262+00:00 shinyapps[340325]: [RODBC] ERROR: state IM002, code 0, message [unixODBC][Driver Manager]Data source name not found, and no default driver specified
But, if there is no RODBC and SQL database involved in my code, the code works fine.
So, the problem is because shinyapps.io cannot access my company's SQL database. How can I deal with it?
The app works on your computer because the Data Source Name (DSN) has been configured there. It is not configured on shinyapps.io. According to this help article you can use for example
odbcDriverConnect('Driver=FreeTDS;TDS_Version=7.0;Server=<server>;Port=<port>;Database=<db>;Uid=<uid>;Pwd=<pw>;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;')
A more complete treatment can be found in the documentation.

Create Shiny DataTable based on selected input

The following code (within my shiny app) is giving me this error:
"You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1"
shinyServer(function(input, output, session) {
dataTable <- reactive ({
data <- input$dataset
con <-
dbConnect(
drv = dbDriver("MySQL"),
dbname = "Database",
host = 'remote',
port = 3306,
user = "user",
password = "password")
on.exit(dbDisconnect(con))
dbGetQuery(con, paste("select * from ", data, ";"))
})
output$myTable <- renderDataTable({
datatable(dataTable(),
rownames = FALSE,
filter = "top",
extensions = 'Buttons',
options = list(dom = 'Bfrtip', buttons = I('colvis')))
})
})
shinyUI(fluidPage(
titlePanel("Data Search"),
# SidePanel -------------------------------------------
# -The Input/Dropdown Menu that Control the Output
sidebarLayout(
sidebarPanel(
selectInput(
inputId = "dataset",
label = "Select Dataset",
choices = c("", "Schools", "GradRates"),
selected = "",
multiple = FALSE),
width = 3
),
# MainPanel -------------------------------------------
# -The Output/Table Displayed Based on Input
mainPanel(
dataTableOutput(outputId = "myTable"),
width = 9
)
))
You have most likely problem with this line
dbGetQuery(con, paste("select * from ", data, ";"))
It appears that variable data doesn't contain table name as expected. Check your code where you are inserting table name into data.

Query SQL Server from R with ETLUtils for big tables

Normally to query a sql-server database from R, I'd use:
library(RODBC)
con <- odbcConnect(dsn = "ESTUDIOS", uid = "estudios", pwd = "yyyy")
sql_trx <- "SELECT [Fecha], [IDServicio]
FROM [ESTUDIOS].[dbo].[TRX] where MONTH(Fecha) = MONTH('2016-08-01') and YEAR(Fecha) = YEAR('2016-08-01');"
trx.server <- sqlQuery(channel = con, sql_trx)
odbcClose(con)
But when the table of the database is too big, I could the use the libraries: ff and ETLUtils.
So, the normal thing to do must be:
library(RODBC)
library(ff)
library(ETLUtils)
sql2_trx <- read.odbc.ffdf(query = sql_trx, odbcConnect.args = list(con))
But this doesn't give me the desired result, instead this returned the following error.
1: In RODBC::odbcDriverConnect("DSN=11") :
[RODBC] ERROR: state IM002, code 0, message [Microsoft][Administrador de controladores ODBC] No se encuentra el nombre del origen de datos y no se especificó ningún controlador predeterminado
2: In RODBC::odbcDriverConnect("DSN=11") : ODBC connection failed
Can you point out what is wrong with the use of read.odbc.ffdf ?
Currently you are passing what seems to be the previous RODBC connection object, con, into read.odbc.ffdf() but remember the method is attempting to create an ODBC connection and call a query. The R docs mention the proper assignment of odbcConnect.args:
odbcConnect.args a list of arguments to pass to ODBC's odbcConnect
(like dsn, uid, pwd)
Consider passing your original DSN and credentials like you did in regular RODBC connection:
sql2_trx <- read.odbc.ffdf(query = sql_trx, odbcConnect.args = list(dsn = "ESTUDIOS", uid = "estudios", pwd = "yyyy"))