Writing a loop that brings in two different date formats - sql

I need to use R for writing a query coming from a database my R environment is connected to. The structure of the query looks like this:
ALTER TABLE cph.table_id ADD PARTITION (event_date = 'YYYY-MM-DD')
LOCATION 's3://external-dwh-company-com/id-to-idl/YYYYMMDD'
So for example, today's addition would like as such:
ALTER TABLE cph.table_id ADD PARTITION (event_date = '2018-08-02')
LOCATION 's3://external-dwh-company-com/id-to-idl/20180802'
The issue is, I need to be doing this for every data going back to 03/01/2018.
So the steps would look like:
initial_query <- paste(#however the above query would be formatted with the dates)
results_query <- dbGetQuery(conn, initial_query)
But yeah, the biggest hurdle for me is 1.) Figuring out the paste formatting for that first part and 2.) Creating a loop that will allow me to run the above steps until the current date.

Consider looping through the range of days since a targeted begin date with lapply and seq, concatenating strings with sprintf and corresponding date format:
# GET DIFFERENCE BETWEEN TODAY AND DESIRED BEGIN DATE
date_diff <- Sys.Date() - as.Date("2018-03-01")
date_diff[[1]]
# 155
sql <- "ALTER TABLE cph.table_id ADD PARTITION (event_date = '%s')
LOCATION 's3://external-dwh-company-com/id-to-idl/%s'"
# RUN QUERY SEQUENTIALLY ADDING TO BEGIN DATE
output <- lapply(seq(1, date_diff[[1]]), function(i)
dbGetQuery(conn, sprintf(sql, strftime(as.Date("2018-03-01") + i, "%Y-%m-%d"),
strftime(as.Date("2018-03-01") + i, "%Y%m%d"))))

Related

R: Summary of SQL Tables

I am working with the R programming language.
Normally, when I want to get the summary of a table, I can use something like the "str()" function or the "summary()" function:
str(my_table)
summary(my_table)
However, now I am trying to do this with tables on a server.
For instance, I am trying to get the summaries of variable types for a specific table (e.g. "my_table") on a server. I found a very indirect way to do this:
#load libraries
library(OBDC)
library(RODBC)
library(dbi)
#establish a connection and name it as "dbhandle"
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
dbColumnInfo(rs)
My Question: Is there a more "direct" way to do this? For example, can I get information about each column (e.g. whether the column is integer, character, date, etc.) in a table without first sending the query and then requesting the information? Can I do this directly?
Thanks!
You could try using fetch() from "RMySQL" to turn your SQL query into an R object (e.g. data frame)
library(RMySQL)
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
# Get the results from MySQL into R
my_table = fetch(rs, n=-1)
# clear result
dbClearResult(rs)
rm(rs)
Then use the functions you describe.
str(my_table)
summary(my_table)

How to iteratively run a query on each column in a SQL table with R?

I have a table with multiple columns (colA, colB, colC) and I want to run a query against each of them and store the result so I can use them for comparison purposes later, for example this query to find the ratio of NULL and not NULL values in a column:
SELECT COUNT(*) - COUNT(column), COUNT(column) FROM table;
I have too many columns to do this manually, so I'm looking for a way for it to cycle through each column and store the result. Using a WHILE loop in t-sql doesn't seem to be suitable to this problem, and trying to use for loop with R doesn't work at all:
tableDataColumnName <- names(tableDataDataframe)
for (i in tableDataColumnName){
nullColumnNumber <- dbGetQuery(con, "SELECT COUNT (*) - COUNT(i), COUNT(i) FROM dbo.table;")
}
Is there a way to execute a query multiple times, once for each column in a table, without doing so manually?
You're trying to use a variable within a string (the i). To do this you should either use paste or paste0 from base or something like the glue package
## Base
tableDataColumnName <- names(tableDataDataframe)
for (i in tableDataColumnName){
nullColumnNumber <- dbGetQuery(con, paste0("SELECT COUNT (*) - COUNT(", i, "), COUNT(", i, ") FROM dbo.table;"))
}
## Glue
library(glue)
for (i in tableDataColumnName){
nullColumnNumber <- dbGetQuery(con, glue("SELECT COUNT (*) - COUNT({i}), COUNT({i}) FROM dbo.table;"))
However, you're also overwriting the result on each iteration of the loop. My solution for the whole problem would be something like the following:
library(glue)
tableDataColumnName <- names(tableDataDataframe)
nullColumnNumber <- numeric(length(tableDataColumnName))
for (i in seq_along(tableDataColumnName)){
nullColumnNumber[i] <- dbGetQuery(con, glue("SELECT COUNT (*) - COUNT({tableDataColumnName[i]}), COUNT({tableDataColumnName[i]}) FROM dbo.table;"))
}

How to set intervalstyle = iso_8601 and then run a select query in golang

I have a table with an interval column, something like this.
CREATE TABLE validity (
window INTERVAL NOT NULL
);
Assume the value stored is 'P3DT1H' which is in iso_8601 format.
When I try to read the value, it comes in regular postgres format.
3 days 01:00:00
However I want the value in iso_8601 format. How can I achieve it?
so=# CREATE TABLE validity (
w INTERVAL NOT NULL
);
CREATE TABLE
so=# insert into validity values ('3 days 01:00:00');
INSERT 0 1
you probably are looking for intervalstyle
so=# set intervalstyle to iso_8601;
SET
so=# select w From validity;
w
--------
P3DT1H
(1 row)
surely it can be set per transaction/session/role/db/cluster
You can use SET intervalstyle query and set the style to iso_8601. Then, when you output the results, they will be in ISO 8601 format.
_, err := s.db.Exec("SET intervalstyle='iso_8601'")
res, err := s.db.Query("select interval '1d1m'")
// res contains a row with P1DT1M
If you are looking for a way to change intervalstyle for all sessions on a server level, you can update it in your configuration file:
-- connect to your psql using whatever client, e.g. cli and run
SHOW config_file;
-- in my case: /usr/local/var/postgres/postgresql.conf
Edit this file and add the following line:
intervalstyle = 'iso_8601'
In my case the file already had a commented out line with intervalstyle, and its value was postgres. You should change it and restart the service.
That way you won't have to change the style from golang each time you run a query.

Putting output from sql query into another query using R environment

I am wondering what approach should have been selected to perform action from title. I am using ODBC connection and what I get from first sql query are like 40-50 rows in one column. What I want is to put this output as a values in to search for.
How should i treat this? Like a array or separated variables? I still do not know R well so just need to know where to search for.
Regards
------more explanation below----
I have list of 40-50 numbers of 10 digits each, organized in a column.
I am trying to do this:
list <- c(my_input)
sql_in <- paste0(list, collapse="")
and characters are organized like this after this operations:
'c(1234567890, , 1234567890, 1234567890)'
and almost all looks fine and fit into my query besides additional c character at the beginning and missing apostrophes.I try to use gsub function but did not work in way I want.
You may likely do this in one SQL call using a subquery. Notice in the call below that the result of
SELECT n_gear
FROM Gear
WHERE n_gear IN (3,4)
Is passed to the WHERE clause of the primary query. This is perfectly valid and will allow your query to execute entirely in SQL without having to do any intermediate steps in R.
(I use sqldf for simplicity of illustration, but this should work through just about any ODBC connection)
library(sqldf)
Gear <- data.frame(n_gear = 1:5)
sqldf(
"SELECT mpg, qsec, gear, wt
FROM mtcars
WHERE gear IN (SELECT n_gear
FROM Gear
WHERE n_gear IN (3,4))"
)
Try something like this:
list<-c("try","this") #The output from your first query
sql_in<-paste0(list, collapse="','")
The Output
paste("select * from table where table.var in ",paste("('",sql_in,"')",sep=''))
[1] "select * from table where table.var in ('try','this')"
If yuo have space as first or last element of the string you can use this code:
`list<-c(" first element is a space","try","this","last element is a space ")` #The output from your first query
Find space at first or last character
first_space<-substr(list, start = 1, stop = 1)==" "
last_space<-substr(list, start = nchar(list), stop = nchar(list))==" "
Remove spaces
list[first_space]<-substr(list[first_space], start = 2, stop = nchar(list[first_space]))
list[last_space]<-substr(list[last_space], start = 1, stop = nchar(list[last_space])-1)
sql_in<-paste0(list, collapse="','")
Your output
paste0("select * from table where table.var in ",paste("('",sql_in,"')",sep=''))
"select * from table where table.var in ('first element is a space','try','this','last element is a space')"
I think You are expecting some thing like shown below code,
data <- dbGetQuery(con, "select column from yourfirsttable")
list <- paste(data$column, collapse="','")
result <- dbGetQuery(con, statement = sprintf("select * from yourresulttable where inv in ('%s')",list))
It's not entirely clear exactly what you're wanting to achieve here. For example, one use case just means you can do it all with a join. But I have cases where I don't know the values for the test without doing some computation. Then I do a separate query having created a query string thus:
> id <- 1:5
> paste0("SELECT * FROM table WHERE ID IN (", paste0(id, collapse = ","), ")")
[1] "SELECT * FROM table WHERE ID IN (1,2,3,4,5)"

update an SQL table via R sqlSave

I have a data frame in R having 3 columns, using sqlSave I can easily create a table in an SQL database:
channel <- odbcConnect("JWPMICOMP")
sqlSave(channel, dbdata, tablename = "ManagerNav", rownames = FALSE, append = TRUE, varTypes = c(DateNav = "datetime"))
odbcClose(channel)
This data frame contains information about Managers (Name, Nav and Date) which are updatede every day with new values for the current date and maybe old values could be updated too in case of errors.
How can I accomplish this task in R?
I treid to use sqlUpdate but it returns me the following error:
> sqlUpdate(channel, dbdata, tablename = "ManagerNav")
Error in sqlUpdate(channel, dbdata, tablename = "ManagerNav") :
cannot update ‘ManagerNav’ without unique column
When you create a table "the white shark-way" (see documentation), it does not get a primary index, but is just plain columns, and often of the wrong type. Usually, I use your approach to get the columns names right, but after that you should go into your database and assign a primary index, correct column widths and types.
After that, sqlUpdate() might work; I say might, because I have given up using sqlUpdate(), there are too many caveats, and use sqlQuery(..., paste("Update....))) for the real work.
What I would do for this is the following
Solution 1
sqlUpdate(channel, dbdata,tablename="ManagerNav", index=c("ManagerNav"))
Solution 2
Lcolumns <- list(dbdata[0,])
sqlUpdate(channel, dbdata,tablename="ManagerNav", index=c(Lcolumns))
Index is used to specify what columns R is going to update.
Hope this helps!
If none of the other solutions work and your data is not that big, I'd suggest using sqlQuery() and loop through your dataframe.
one_row_of_your_df <- function(i) {
sql_query <-
paste0("INSERT INTO your_table_name (column_name1, column_name2, column_name3) VALUES",
"(",
"'",your_dataframe[i,1],",",
"'",your_dataframe[i,2],"'",",",
"'",your_dataframe[i,3],"'",",",
")"
)
return(sql_query)
}
This function is Exasol specific, it is pretty similar to MySQL, but not identical, so small changes could be necessary.
Then use a simple for loop like this one:
for(i in 1:nrow(your_dataframe))
{
sqlQuery(your_connection, one_row_of_your_df(i))
}