If I want to create a new column containing observation numbers with data frames I can,
mtcars %>% mutate(i=row_number())
But row_number() does not work with sql tables.
mydb <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(mydb, "mt", mtcars)
mt.sql=tbl(mydb, "mt")
mt.sql %>% mutate(i=row_number())
Error:
Window function row_number() is not supported by this database
Would there be any other ways around this problem?
You could work around it by using SQLite syntax like this
RSQLite::dbSendQuery(mydb, "ALTER TABLE mt ADD COLUMN i INTEGER")
RSQLite::dbSendQuery(mydb, "UPDATE mt SET (i) = ROWID")
Then you can go on using dplyr syntax like after re-assigning mt from your db connection to the mt.sql object.
mt.sql=tbl(mydb, "mt")
mt.sql %>% select(mpg, i) # e.g.
SQLite doesn't support row number function.
Did you try mtcars %>% mutate(i=row_number(desc(disp)))?
It works for me even in SQL.
Related
Using R's DBI, I need to:
run a parametrized query with different parameters (i.e. a vector of parameters);
get the results sets concatenated (i.e. rbinded as per R terminology or unioned as per SQL terminology);
and get the resulting table in the database for further manipulation.
dbBind/dbGetquery fullfils requirements 1 and 2, but I then need to write the resulting data frame to the database using dbWriteTable, which is ineficient:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "iris", iris)
res <- dbGetQuery(con,
"select * from iris where Species = ?",
params = list(c("setosa", "versicolor")))
dbWriteTable(con, "mytable", res)
Conversely, dbExecute fulfils requirement 3, but I don't think it has the "rbind feature". Of course, this throw an error because the table would get overwritten:
dbExecute(con,
"create table mytable as select * from iris where Species = ?",
params = list(c("setosa", "versicolor")))
What is the most efficient/recommended way of doing so?
Notes:
I am not the DBA and can only access the database through R.
My example is too trivial and could be achieved in a single query. My use case really requires a parametrized query to be run multiple times with different parameters.
I have to use Oracle, but I am interested in a solution even if it don't works with Oracle.
1) Create the table with the first parameter and then insert each of the others into it.
library(RSQLite)
con <- dbConnect(SQLite())
dbWriteTable(con, "iris", iris)
parms <- c("setosa", "versicolor")
dbExecute(con, "create table mytable as
select * from iris where Species = ?",
params = parms[1])
for (p in parms[-1]) {
dbExecute(con, "insert into mytable
select * from iris where Species = ?",
params = p)
}
# check
res <- dbGetQuery(con, "select * from mytable")
str(res)
2) Alternately generate the text of an SQL statement to do it all. sqldf pulls in RSQLite and gsubfn which supplies fn$ that enables the text substitution.
library(sqldf)
con <- dbConnect(SQLite())
dbWriteTable(con, "iris", iris)
parms <- c("setosa", "versicolor")
parmString <- toString(sprintf("'%s'", parms))
fn$dbExecute(con, "create table mytable as
select * from iris where Species in ($parmString)")
# check
res <- dbGetQuery(con, "select * from mytable")
str(res)
3) A variation of (2) is to insert the appropriate number of question marks.
library(sqldf)
con <- dbConnect(SQLite())
dbWriteTable(con, "iris", iris)
params <- list("setosa", "versicolor")
quesString <- toString(rep("?", length(params)))
fn$dbExecute(con, "create table mytable as
select * from iris where Species in ($quesString)", params = params)
# check
res <- dbGetQuery(con, "select * from mytable")
str(res)
Based on #r2evans comment and #G.Grothendieck answer, instead of query/download/combine/upload, I used a parameterized query that inserts directly into a table.
First, I created the table with the appropriate columns to collect the results:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
create_table <-
"CREATE TABLE warpbreaks2 (
breaks real,
wool text,
tension text
)"
dbExecute(con, create_table)
Then I executed an INSERT INTO step:
dbWriteTable(con, "warpbreaks", warpbreaks)
insert_into <-
"INSERT INTO warpbreaks2
SELECT warpbreaks.breaks,
warpbreaks.wool,
warpbreaks.tension
FROM warpbreaks
WHERE tension = ?"
dbExecute(con, insert_into, params = list(c("L", "M")))
This is a dummy example for illustration purpose. It could be achieve more directly with e.g.:
direct_query <-
"CREATE TABLE warpbreaks3 AS
SELECT *
FROM warpbreaks
WHERE tension IN ('L', 'M')"
dbExecute(con, direct_query )
I'm trying to catalog the structure of a MSSQL 2008 R2 database using R/RODBC. I have set up a DSN, connected via R and used the sqlTables() command but this is only getting the 'system databases' info.
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlTables(conn1)
However if I do this:
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlQuery('USE my_db_1')
sqlTables(conn1)
I get the tables associated with the my_db_1 database. Is there a way to see all of the databases and tables without manually typing in a separate USE statement for each?
There may or may not be a more idiomatic way to do this directly in SQL, but we can piece together a data set of all tables from all databases (a bit more programatically than repeated USE xyz; statements) by getting a list of databases from master..sysdatabases and passing these as the catalog argument to sqlTables - e.g.
library(RODBC)
library(DBI)
##
tcon <- RODBC::odbcConnect(
dsn = "my_dsn",
uid = "my_uid",
pwd = "my_pwd"
)
##
db_list <- RODBC::sqlQuery(
channel = tcon,
query = "SELECT name FROM master..sysdatabases")
##
R> RODBC::sqlTables(
channel = tcon,
catalog = db_list[14, 1]
)
(I can't show any of the output for confidentiality reasons, but it produces the correct results.) Of course, in your case you probably want to do something like
all_metadata <- lapply(db_list$name, function(DB) {
RODBC::sqlTables(
channel = tcon,
catalog = DB
)
})
# or some more efficient variant of data.table::rbindlist...
meta_df <- do.call("rbind", all_metadata)
How Should I create Table in SQL using the R Data frame object structure with out writing complex code. Is there is a any function in R to accomplish this.
if you are using mysql, you can use the dbWriteTable from RMySQL package
library(RMySQL)
con <- dbConnect(MySQL(),
user="USER_NAME",
host="localhost",
password = "PASS",
db = "NAME_DATA_BASE")
dbWriteTable(conn = con, name = 'test', value = iris)
In this example i put iris data frame in table named test
In sqlite:
library(RSQLite)
#set working directory to where your sqlite database resides
setwd("C:/sqlite/Data")
#connect
sqlite<-dbDriver("SQLite")
my_conn<-dbConnect(sqlite,"my_db.db")
#write out the dataframe in the database
dbWriteTable(my_conn, "new_table_in_db", my_dataframe, row.names=F)
I've got a simple dataframe with three columns. One of the columns contains a database name. I first need to check if data exists, and if not, insert it. Otherwise do nothing.
Sample data frame:
clientid;region;database
135;Europe;europedb
2567;Asia;asiadb
23;America;americadb
So I created a function to apply to dataframe this way:
library(RMySQL)
check_if_exist <- function(df){
con <- dbConnect(MySQL(),
user="myuser", password="mypass",
dbname=df$database, host="myhost")
query <- paste0("select count(*) from table where client_id='", df$clientid,"' and region='", df$region ,"'")
rs <- dbSendQuery(con, query)
rs
}
Function call:
df$new_column <- lapply(df, check_if_exist)
But this doesn't work.
This is a working example of what you are asking, if I understood correctly. But I don't have your database, so we just print the query for verification, and fetch a random number as the result.
Note that by doing lapply(df,...), you are looping over the columns of the database, and not the rows as you want.
df = read.table(text="clientid;region;database
135;Europe;europedb
2567;Asia;asiadb
23;America;americadb",header=T,sep=";")
check_if_exist <- function(df){
query = paste0("select count(*) from table where client_id='", df$clientid,"' and region='", df$region ,"'")
print(query)
rs <- runif(1,0,1)
return(rs)
}
df$new_column <- sapply(split(df,seq(1,nrow(df))),check_if_exist)
Hope this helps.
I've looked at the 'Pass R variable to RODBC's sqlQuery with multiple entries? ' already but can't seem to get it to work. I'm trying to do an sqlQuery() from R on a SQL Server 2008 R2 db. I'm trying to get a sample from a large db based on row numbers. First I created a list of random numbers:
sampRowNum <- sample(seq(1,100000,1), 5000)
Then I try to use those numbers in a query using:
query1 <- sqlQuery(channel, paste("select *
FROM db where row_id in (", sampRowNum,")", sep=""))
I get just the results from the db where the row_id is equal to the first number in sampRowNum. Any suggestions?
You're not pasteing your query together correctly.
If you run the paste statement in isolation, you'll see that you get a vector of length 5000, so sqlQuery is only executing the first one of those, corresponding to the first element in samRowNum.
What you want to do is something more like this:
paste("select * FROM db where row_id in (",
paste(sampRowNum,collapse = ","),")", sep="")
Just as an added note (and since I've had to do stuff like this a lot...) constructing sql queries with an IN clause with strings is a bit more of a nuisance, since you have to tack on all the single quotes:
vec <- letters[1:5]
paste("SELECT * FROM db WHERE col IN ('",
paste(vec,collapse = "','"),"')",sep = "")
[1] "SELECT * FROM db WHERE col IN ('a','b','c','d','e')"
If you do this a lot, you'll end up writing a little function that does that pasting of character vectors for you.
As always, this kind of SQL string manipulation is Not Good if you are dealing with user inputs (e.g. in a web app), due to SQL injection attacks. In my particular situation this isn't much of a concern, but in general people will prefer parametrized queries if you don't have much control over the input values.