I'm trying to use RMySQL to perform search on imdb, but does not seem to get the tables right - rmysql

I'm new to RMySQL. I'm trying to do an assignment based on the code in this page: https://beanumber.github.io/sds192/lab-sql.html and got stuck after I ran the first few lines:
library(mdsr)
library(RMySQL)
db <- dbConnect_scidb(dbname = "imdb")
class(db)
db %>%
dbGetQuery("SELECT * FROM kind_type;")
Below is my output:
[1] "MySQLConnection"
attr(,"package")
[1] "RMySQL"
> db %>%
+ dbGetQuery("SELECT * FROM kind_type;")
Error in .local(conn, statement, ...) :
could not run statement: Table 'imdb.kind_type' doesn't exist
I also tried to list the tables in db and below is the output:
> dbListTables(db)
character(0)
I greatly appreciate help on this issue.

Related

Retrieve df from spark.sql : [PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT'

I'm using a databricks notebook and I'd like to retrieve a dataframe from an SQL execution in Spark. I have:
statement = f""" USER {db}; SELECT * FROM {table}
"""
df = spark.sql(statement)
display(df)
However, unlike when I fire off the same statement in an SQL cell in the notebook, I get the following error:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT': extra input 'SELECT'(line 1...
Where am I going wrong?
I tried to reproduce the same in my environment and got below results:
This my sample demo table Persons.
Create dataframe by using this code as shown in the below image.
df = sqlContext.sql("select * from Persons")
display(df)

I've performed a JOIN using bigrquery and the dbGetQuery function. Now I'd like to query the temporary table I've created but can't connect

I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.
I've successfully established my connection to BigQuery:
con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)
Then performed a LEFT JOIN using the coalesce function:
dbGetQuery(con,
"SELECT
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")
The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:
Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.
And if I follow up:
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn#quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. \-DBI:::.local(conn, statement, ...)
4. +-DBI::dbSendQuery(conn, statement, ...)
5. \-bigrquery::dbSendQuery(conn, statement, ...)
6. \-bigrquery:::BigQueryResult(conn, statement, ...)
7. \-bigrquery::bq_job_wait(job, quiet = conn#quiet)
Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?
From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.
local_df = dbGetQuery(...
should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.
You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:
con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)
When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).
There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:
local_df = remote_df %>%
collect()

R equivalent of SQL update statement

I use the below statement to update to the postgreSQL db using the following statement
update users
set col1='setup',
col2= 232
where username='rod';
Can anyone guide how to do similar to using R ?I am not good in R
Thanks in advance for the help
Since you didn't provide any data, I've created some here.
users <- data.frame(username = c('rod','stewart','happy'), col1 = c(NA_character_,'do','run'), col2 = c(111,23,145), stringsAsFactors = FALSE)
To update using base R:
users[users$username == 'rod', c('col1','col2')] <- c('setup', 232)
If you prefer the more explicit syntax provided by the data.table package, you would execute:
library(data.table)
setDT(users)
users[username == 'rod', `:=`(col1 = 'setup', col2 = 232)]
To update your database through RPostgreSQL, you will first need to create Database Connection, and then simply store your query in a string, e.g.
con <- dbConnect('PostgreSQL', dbname = <your database name>, user=<user>, password= <password>)
statement <- "update <schema>.users set col1='setup', col2= 232 where username='rod';"
dbGetQuery(con, statement)
dbDisconnect()
Note depending upon your PostgreSQL configs, you may need to also set your search path dbGetQuery(con, 'set search_path = <schema>;')
I'm more familiar with RPostgres, so you may want to double check the syntax and vignettes of the PostgreSQL package.
EDIT: Seems like RPostgreSQL prefers dbGetQuery to send updates and commands rather than dbSendQuery

Putting output from sql query into another query using R environment

I am wondering what approach should have been selected to perform action from title. I am using ODBC connection and what I get from first sql query are like 40-50 rows in one column. What I want is to put this output as a values in to search for.
How should i treat this? Like a array or separated variables? I still do not know R well so just need to know where to search for.
Regards
------more explanation below----
I have list of 40-50 numbers of 10 digits each, organized in a column.
I am trying to do this:
list <- c(my_input)
sql_in <- paste0(list, collapse="")
and characters are organized like this after this operations:
'c(1234567890, , 1234567890, 1234567890)'
and almost all looks fine and fit into my query besides additional c character at the beginning and missing apostrophes.I try to use gsub function but did not work in way I want.
You may likely do this in one SQL call using a subquery. Notice in the call below that the result of
SELECT n_gear
FROM Gear
WHERE n_gear IN (3,4)
Is passed to the WHERE clause of the primary query. This is perfectly valid and will allow your query to execute entirely in SQL without having to do any intermediate steps in R.
(I use sqldf for simplicity of illustration, but this should work through just about any ODBC connection)
library(sqldf)
Gear <- data.frame(n_gear = 1:5)
sqldf(
"SELECT mpg, qsec, gear, wt
FROM mtcars
WHERE gear IN (SELECT n_gear
FROM Gear
WHERE n_gear IN (3,4))"
)
Try something like this:
list<-c("try","this") #The output from your first query
sql_in<-paste0(list, collapse="','")
The Output
paste("select * from table where table.var in ",paste("('",sql_in,"')",sep=''))
[1] "select * from table where table.var in ('try','this')"
If yuo have space as first or last element of the string you can use this code:
`list<-c(" first element is a space","try","this","last element is a space ")` #The output from your first query
Find space at first or last character
first_space<-substr(list, start = 1, stop = 1)==" "
last_space<-substr(list, start = nchar(list), stop = nchar(list))==" "
Remove spaces
list[first_space]<-substr(list[first_space], start = 2, stop = nchar(list[first_space]))
list[last_space]<-substr(list[last_space], start = 1, stop = nchar(list[last_space])-1)
sql_in<-paste0(list, collapse="','")
Your output
paste0("select * from table where table.var in ",paste("('",sql_in,"')",sep=''))
"select * from table where table.var in ('first element is a space','try','this','last element is a space')"
I think You are expecting some thing like shown below code,
data <- dbGetQuery(con, "select column from yourfirsttable")
list <- paste(data$column, collapse="','")
result <- dbGetQuery(con, statement = sprintf("select * from yourresulttable where inv in ('%s')",list))
It's not entirely clear exactly what you're wanting to achieve here. For example, one use case just means you can do it all with a join. But I have cases where I don't know the values for the test without doing some computation. Then I do a separate query having created a query string thus:
> id <- 1:5
> paste0("SELECT * FROM table WHERE ID IN (", paste0(id, collapse = ","), ")")
[1] "SELECT * FROM table WHERE ID IN (1,2,3,4,5)"

ROracle Errors When Trying to Use Bound Parameters

I'm using ROracle on a Win7 machine running the following R version:
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 1.1
year 2014
month 07
day 10
svn rev 66115
language R
version.string R version 3.1.1 (2014-07-10)
nickname Sock it to Me
Eventually, I'm going to move the script to a *nix machine, cron it, and run it with RScript.
I want to do something similar to:
select * from tablename where 'thingy' in ('string1','string2')
This would return two rows with all columns in SQLDeveloper (or Toad, etc).
(Ultimately, I want to pull results from one DB into a single column in a data.frame then use those results to loop through
and pull results from a second db, but I also need to be able to do just this function as well.)
I'm following the documentation for RORacle from here.
I've also looked at this (which didn't get an answer):
Bound parameters in ROracle SELECT statements
When I attempt the query from ROracle, I get two different errors, depending on whether I try a dbGetQuery() or dbSendQuery().
As background, here are the versions, queries and data I'm using:
Driver name: Oracle (OCI)
Driver version: 1.1-11
Client version: 11.2.0.3.0
The connection information is standard:
library(ROracle)
ora <- dbDriver("Oracle")
dbcon <- dbConnect(ora, username = "username", password = "password", dbname = "dbnamefromTNS")
These two queries return the expected results:
rs_send <- dbSendQuery(dbcon, "select * from tablename where columname_A = 'thingy' and rownum <= 1000")
rs_get <- dbGetQuery(dbcon, "select * from tablename where columname_A = 'thingy' and rownum <= 1000")
That is to say, 1000 rows from tablename where 'thingy' exists in columnname_A.
I have a data.frame of one column, with two rows.
my.data = data.frame(RANDOM_STRING = as.character(c('string1', 'string2')))
and str(my.data) returns this:
str(my.data)
'data.frame': 2 obs. of 1 variable:
$ RANDOM_STRING: chr "string1" "string2"
my attempted queries are:
nope <- dbSendQuery(dbcon, "select * from tablename where column_A = 'thingy' and widget_name =:1", data = data.frame(widget_name =my.data$RANDOM_STRING))
which gives me an error of:
Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch, :
bind data does not match bind specification
and
not_this_either <- dbGetQuery(dbcon, "select * from tablename where column_A = 'thingy' and widget_name =:1", data = data.frame(widget_name =my.data$RANDOM_STRING))
which gives me an error of:
Error in .oci.GetQuery(conn, statement, data = data, prefetch = prefetch, :
bind data has too many rows
I'm guessing that my problem is in the data=(widget_name=my.data$RANDOM_STRING) part of the queries, but haven't been able to rubber duck my way through it.
Also, I'm very curious as to why I get two separate and different errors depending on whether the queries use the send (and fetch later) format or the get format.
If you like the tidyverse there's a slightly more compact way to achieve the above using purrr
library(ROracle)
library(purrr)
ora <- dbDriver("Oracle")
con <- dbConnect(ora, username = "username", password = "password", dbname = "yourdbnamefromTNSlist")
yourdatalist <- c(12345, 23456, 34567)
output <- map_df(yourdatalist, ~ dbGetQuery(con, "select * from YourTableNameHere where YOURCOLUMNNAME = :d", .x))
Figured it out.
It wasn't a problem with Oracle or ROracle (I'd suspected this) but with my R code.
I stumbled over the answer trying to solve another problem.
This answer about "dynamic strings" was the thing that got me moving towards a solution.
It doesn't fit exactly, but close enough to rubberduck my way to an answer from there.
The trick is to wrap the whole thing in a function and run an ldply on it:
library(ROracle)
ora <- dbDriver("Oracle")
con <- dbConnect(ora, username = "username", password = "password", dbname = "yourdbnamefromTNSlist")
yourdatalist <- c(12345, 23456, 34567)
thisfinallyworks <- function(x) {
dbGetQuery(con, "select * from YourTableNameHere where YOURCOLUMNNAME = :d", data = x)
}
ldply(yourdatalist, thisfinallyworks)
row1 of results where datapoint in YOURCOLUMNNAME = 12345
row2 of results where datapoint in YOURCOLUMNNAME = 23456
row3 of results where datapoint in YOURCOLUMNNAME = 34567
etc