I am doing database analysis using SQL Server and forecasting using R. I need to get the results from R back into the SQL Server database. One approach is to output the forecast data to a text file using write.table and import using BULK INSERT. Is there a better way?
You can use dbBulkCopy from rsqlserver package. It is a DBI extension that interfaces the Microsoft SQL Server popular command-line utility named bcp to quickly bulk copying large files into table.
dat <- matrix(round(rnorm(nrow * ncol), 2), nrow = nrow, ncol = ncol)
colnames(dat) <- cnames
id.file = "temp_file.csv"
write.csv(dat, file = id.file, row.names = FALSE)
dbBulkCopy(conn, "NEW_BP_TABLE", value = id.file)
Thanks for your comments and answers! I went with a solution based on the comment by nrussell. Below is my code. The specific command is the last line; I am providing the preceding lines to provide a little bit of context for anyone trying to use this answer.
data <- sqlQuery(myconn, query) # returns time series with year, month (both numeric), and value
data_ts <- ts(data$value,
start=c(data$year[1],data$month[1]), # start is first year and month
end=c(data$year[nrow(data)],data$month[nrow(data)]), # end is last year and month
frequency=12)
data_fit <- auto.arima(data_ts)
fct <- forecast(data_ts, 12)
sqlQuery(myconn, 'truncate table dgtForecast') # Pre-existing table
sqlSave(myconn, data.frame(fct), tablename='dgtForecast', rownames='MonthYear', append=TRUE)
Related
So I want to create a new data frame adding the values of the Sometimes and Often column and dividing it by the values of the total column and multiplying it by 100 to get percentages (unless there is a function that automatically does this in R). How would I go about doing that?
You have added an "sql" tag to your question. Should you prefer SQL over R for reasons of experience and/or knowledge you might be interested in the fabulous sqldf package which allows you to use SQL syntax within R. You will have to download it first via install.packages("sqldf") and then you can use it as in
expl <- data.frame(sometimes = c(1, 2, 4), often = c(2, 2, 2), total =c(6, 9, 8))
library(sqldf)
sqldf("SELECT 100*(sometimes+often)/total FROM expl")
The far more often used way is to add a percent column to the same data.frame instead of introducing a new one. That way, all data are kept together and you do not loose the link to e. g. the week column.
One way to go about that would be the following one-liner:
expl <- data.frame(sometimes = c(1, 2, 4), often = c(2, 2, 2), total =c(6, 9, 8))
print(expl)
expl$percent = 100 * (expl$sometimes + expl$often)/expl$total
print(expl)
First, it looks as though Total, Sometimes, and Often are character because you have commas in them, so you would need to get rid of the commas and convert them to numeric. You can do that as follows (assuming your dataframe is called mydata):
for(i in c("Total","Sometimes","Often")) mydata[[i]] = as.numeric(gsub(",", "", mydata[[i]])
Then you can use the answer by Bernard:
mydata$percent = 100 * (mydata$Sometimes + mydata$Often)/mydata$Total
Another option using the tidyverse:
library(tidyverse)
newdataframe <- olddataframe %>%
mutate(percent = (Sometimes+Often)/Total*100) %>%
select(percent)
But as said before, better leave the percentage column with the other data. In that case, remove the %>% select(percent).
I am attempting to create a small, training database for a package that I am writing. I am using the following code to create the database:
library(tidyverse)
library(DBI)
dat <- data.frame(name = rep("Clyde", 100),
DOB = sample(x = seq(as.POSIXct('1970/01/01'), as.POSIXct('1995/01/01'), by="day"),
size = 100, replace = T))
# Example using schemas with SQLite
train_con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
## create tables in primary db
copy_to(dest = train_con, df = dat, name = "client_list", temporary = FALSE)
The above portion works fine. However, when I attempt to pull data from the database, I see that all dates have been converted to numeric.
train_con %>% tbl("client_list")
Can anybody tell me how to fix this? Thanks!
SQLite does not have a datetime type. In the absence of such a type POSIXct objects are sent to the database as seconds since the UNIX Epoch and SQLite does not know that they are intended to represent date times.
Either convert such columns yourself after you read them back into R or else use a different database. Nearly all databases except SQLite support this.
So in a database Y I have a table X with more than 400 million observations. Then I have a KEY.csv file with IDs, that I want to use for filtering the data (small data set, ca. 50k unique IDs). If I had unlimited memory, I would do something like this:
require(RODBC)
require(dplyr)
db <- odbcConnect('Y',uid = "123",pwd = '123')
df <- sqlQuery(db,'SELECT * from X')
close(db)
keys <- read.csv('KEY.csv')
df_final <- df %>% filter(ID %in% KEY$ID)
My issue is, that I don't have the rights to upload the KEY.csv file to the database Y and do the filtering there. Would it be somehow possible to do the filtering in the query, while referencing the file loaded in R memory? And then write this filtered X table directly to a database I have access? I think even after filtering it R might not be able to keep it in the memory.
I could also try to do this in Python, however don't have much experience in that language.
I dont know how many keys you have but maybe you can try to use the build_sql() function to use the keys inside the query.
I dont use RODBC, I think you should use odbc and DBI (https://db.rstudio.com).
library(dbplyr) # dbplyr not dplyr
library(DBI)
library(odbc)
# Get keys first
keys = read.csv('KEY.csv')
db = dbConnect('Y',uid = "123",pwd = '123') # the name of function changes in odbc
# write your query (dbplyr)
sql_query = build_sql('SELECT * from X
where X.key IN ', keys, con = db)
df = dbGetQuery(db,sql_query) # the name of function changes in odbc
Whenever I use read.csv.sql I cannot select from the first column with and any output from the code places an unusual character (A(tilde)-..) at the begging of the first column's name.
So suppose I create a df.csv file in in Excel that looks something like this
df = data.frame(
a = 1,
b = 2,
c = 3,
d = 4)
Then if I use sqldf to query the csv which is in my working directory I get the following error:
> read.csv.sql("df.csv", sql = "select * from file where a == 1")
Error in result_create(conn#ptr, statement) : no such column: a
If I query a different column than the first, I get a result but with the output of the unusual characters as seen below
df <- read.csv.sql("df.csv", sql = "select * from file where b == 2")
View(df)
Any idea how to prevent these characters from being added to the first column name?
The problem is presumably that you have a file that is larger than R can handle and so only want to read a subset of rows into R and specifying the condition to filter it by involves referring to the first column whose name is messed up so you can't use it.
Here are two alternative approaches. The first one involves a bit more code but has the advantage that it is 100% R. The second one is only one statement and also uses R but additionally makes use an of an external utility.
1) skip header Read the file in skipping over the header. That will cause the columns to be labelled V1, V2, etc. and use V1 in the condition.
# write out a test file - BOD is a data frame that comes with R
write.csv(BOD, "BOD.csv", row.names = FALSE, quote = FALSE)
# read file skipping over header
DF <- read.csv.sql("BOD.csv", "select * from file where V1 < 3",
skip = 1, header = FALSE)
# read in header, assign it to DF and fix first column
hdr <- read.csv.sql("BOD.csv", "select * from file limit 0")
names(DF) <- names(hdr)
names(DF)[1] <- "TIME" # suppose we want TIME instead of Time
DF
## TIME demand
## 1 1 8.3
## 2 2 10.3
2) filter Another way to proceed is to use the filter= argument. Here we assume we know that the end of the column name is ime but there are other characters prior to that that we don't know. This assumes that sed is available and on your path. If you are on Windows install Rtools to get sed. The quoting might need to be changed depending on your shell.
When trying this on Windows I noticed that sed from Rtools changed the line endings so below we specified eol= to ensure correct processing. You may not need that.
DF <- read.csv.sql("BOD.csv", "select * from file where TIME < 3",
filter = 'sed -e "1s/.*ime,/TIME,/"' , eol = "\n")
DF
## TIME demand
## 1 1 8.3
## 2 2 10.3
So I figured it out by reading through the above comments.
I'm on a Windows 10 machine using Excel for Office 365. The special characters will go away by changing how I saved the file from a "CSV UTF-8 (Comma Delimited)" to just "CSV (Comma delimited)".
I would like to get some data from mysql in OpenERP.
In one way I can do it like that:
#!/usr/bin/python
import MySQLdb
# connect
db = MySQLdb.connect(host="localhost", user="appuser", passwd="",
db="onco")
cursor = db.cursor()
# execute SQL select statement
cursor.execute("SELECT * FROM LOCATION")
# commit your changes
db.commit()
# get the number of rows in the resultset
numrows = int(cursor.rowcount)
# get and display one row at a time.
for x in range(0,numrows):
row = cursor.fetchone()
print row[0], "-->", row[1]
from How do I connect to a MySQL Database in Python?
But is it maybe smarter way to do it? To use cr like standard OpenERP object?
Your way is ok, but:
You don't need db.commit() after SELECT. It's necessary only if you change something in database.
Instead of getting the number of rows and for x in range(0, numrows) you can use for x in cursor.fetchall():. To get only n elements you can use cursor.fetchmany(n).