Using R to write "Create Table" Commands - sql

I have this dataset over here in R:
my_table = data.frame(id = c(1,2,3), name = c("sam", "smith", "sean"), height = c(156, 175, 191), address = c("123 first street", "234 second street", "345 third street"))
id name height address
1 1 sam 156 123 first street
2 2 smith 175 234 second street
3 3 sean 191 345 third street
I can get the summary of this table as follows:
st = capture.output(str(my_table))
>st
[1] "'data.frame':\t3 obs. of 4 variables:"
[2] " $ id : num 1 2 3"
[3] " $ name : chr \"sam\" \"smith\" \"sean\""
[4] " $ height : num 156 175 191"
[5] " $ address: chr \"123 first street\" \"234 second street\" \"345 third street\""
Further manipulation:
types = substr(st[c(2,3,4,5)], 13,15)
[1] "num" "chr" "num" "chr"
type_frame = data.frame(id = 1,2,3,4, types)
type_frame = type_frame[,c(1,5)]
type_frame$sql = ifelse(type_frame$types == "num", "int", "varchar(255)")
type_frame$name = colnames(my_table)
id types sql name
1 1 num int id
2 1 chr varchar(255) name
3 1 num int height
4 1 chr varchar(255) address
Based on this information, I would like to generate the following string output:
CREATE TABLE my_table (
id int,
name varchar(255),
height int,
address varchar(255),
);
I thought of the following approach:
first_part = "CREATE TABLE my_table ( "
second_part = c(type_frame[1,4], type_frame[1,3])
second_part = paste(second_part , collapse = " ")
third_part = c(type_frame[2,4], type_frame[2,3])
third_part = paste(third_part , collapse = " ")
fourth_part = c(type_frame[3,4], type_frame[3,3])
fourth_part = paste(fourth_part , collapse = " ")
fifth_part = c(type_frame[4,4], type_frame[4,3])
fifth_part = paste(fifth_part , collapse = " ")
final = paste0(first_part, second_part, " , ", third_part, " , ", fourth_part, " , ", fifth_part, " ,);")
The end result looks something like this:
>final
[1] "CREATE TABLE my_table ( id int , name varchar(255) , height int , address varchar(255) ,);"
In the end, I would like to take the above string and enter it into an SQL software.
The code I have written is very inefficient - can someone please show me a faster way to do this?
Thank you!

Related

Replacing Specific Row with the data from another table that matches specific condition

I wrote a query in Access that produces the following result.
qryCalculateXY
EquipmentID, ZoneNumber, RowNumber, ColumnNumber, XCoordinate, YCoordinate, ComponentID
1, 0, 1, 1, 500, 600, 1
1, 0, 1, 20, 500, 1200, 1
.
.
.
.
This query quickly calculates 1000+ XY co-ordinates of PrimaryComponent based on adding very few rows (maximum 30) in the table. And I am very happy with this so far.
However, there are very few locations (maximum 20) where, I may have ComponentID changed. For that I've created a table as below that lists out all exceptions.
tblException
EquipmentID, ZoneNumber, RowNumber, ColumnNumber, ComponentID
1, 0, 1, 20, 2
I want to generate another query which would list out all rows from qryCalculateXY as it is where it does not find corresponding values of EquipmentID, ZoneNumber, RowNumber, ColumnNumber from table tblException. And replace the value of ComponentID from tblException where values of these four columns match in the table tblException.
The resulting query should look like this -
qryCalculateXYFinal
EquipmentID, ZoneNumber, RowNumber, ColumnNumber, XCoordinate, YCoordinate, ComponentID
1, 0, 1, 1, 500, 600, 1
1, 0, 1, 20, 500, 1200, 2
.
.
.
.
This would save lots of time from my side not to convert query into a table and then change specific values.
How can I achieve this?
Thanks,
Nimish
I tried the left joint but to no avail.
Use Nz in a simple query with a left join:
Select
qryCalculateXY.EquipmentID,
qryCalculateXY.ZoneNumber,
qryCalculateXY.RowNumber,
qryCalculateXY.ColumnNumber,
qryCalculateXY.XCoordinate,
qryCalculateXY.YCoordinate,
Val(Nz([tblException].[ComponentID],[qryCalculateXY].[ComponentID])) As ComponentID
From
qryCalculateXY
Left Join
tblException On
(qryCalculateXY.ColumnNumber = tblException.ColumnNumber
(qryCalculateXY.RowNumber = tblException.RowNumber) AND
(qryCalculateXY.ZoneNumber = tblException.ZoneNumber) AN
(qryCalculateXY.EquipmentID = tblException.EquipmentID);
Output:
assuming:
'table xy
----------------------------------------------------------------------------------------------------------------------------------------------------
| EquipmentID | ZoneNumber | RowNumber | ColumnNumber | XCoordinate | YCoordinate | ComponentID |
----------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | 0 | 1 | 1 | 500 | 600 | 1 |
----------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | 0 | 1 | 20 | 500 | 1200 | 1 |
----------------------------------------------------------------------------------------------------------------------------------------------------
' saved query qryCalculateXY:
SELECT tableXY.*
FROM tableXY;
QryCalculateXYFinal:
'qryCalculateXYFinal
SELECT qryCalculateXY.EquipmentID, qryCalculateXY.ZoneNumber, qryCalculateXY.RowNumber, qryCalculateXY.ColumnNumber, qryCalculateXY.XCoordinate, qryCalculateXY.YCoordinate, DLookUp("ComponentID","tblException","EquipmentID = " & [EquipmentID] & " AND ZoneNumber = " & [ZoneNumber] & " AND RowNumber = " & [RowNumber] & " AND ColumnNumber = " & [ColumnNumber]) AS ExceptionComponentID, IIf([ExceptionComponentID],[ExceptionComponentID],[qryCalculateXY].[ComponentID]) AS ComponentID, GetReplacementComponentID([ComponentID],[EquipmentID],[ZoneNumber],[RowNumber],[ColumnNumber]) AS C2
FROM qryCalculateXY;
'ExceptionComponentID
ExceptionComponentID: DLookUp("ComponentID","tblException","EquipmentID = " & [EquipmentID] & " AND ZoneNumber = " & [ZoneNumber] & " AND RowNumber = " & [RowNumber] & " AND ColumnNumber = " & [ColumnNumber])
'The fixed ComponentID
ComponentID: IIf([ExceptionComponentID],[ExceptionComponentID],[qryCalculateXY].[ComponentID])
'C2
C2: GetReplacementComponentID([ComponentID],[EquipmentID],[ZoneNumber],[RowNumber],[ColumnNumber])
'GetReplacementComponentID in code module
Public Function GetReplacementComponentID(ComponentID As Long, EquipmentID As Long, Zonenumber As Long, RowNumber As Long, ColumnNumber As Long)
Dim returnvalue As Long
returnvalue = Nz(DLookup("ComponentID", "tblException", "(EquipmentID = " & EquipmentID & ") AND (ZoneNumber = " & Zonenumber & ") AND (RowNumber = " & RowNumber & ") AND (ColumnNumber = " & ColumnNumber & ")"), ComponentID)
GetReplacementComponentID = returnvalue
End Function
as shown above you can look up the appropriate ComponentID in the query. ExceptionComponentID is just there to make reading easier you can replace it with its expression everywhere it occurs or better yet abstract even more of the Calculations to a public function like in c2. You can cut, paste, import the tables and sql into your version of Access. You can also replace the vba functions with a subquery which is not shown.

Converting a Table into SQL Statements

I have this dataset over here in R:
my_table = data.frame(id = c(1,2,3), name = c("sam", "smith", "sean"), height = c(156, 175, 191), address = c("123 first street", "234 second street", "345 third street"))
id name height address
1 1 sam 156 123 first street
2 2 smith 175 234 second street
3 3 sean 191 345 third street
Based on this table, I am trying to generate the following string statement - take the entries from "my_table" and put them int the following format::
# pretend some table called "new_table" already exists - below is the desired output that I want:
INSERT INTO new_table ( id, name, height, address ) VALUES
( 1, sam, 156, 123 first street), ( 2, smith, 175, 234 second street), ( 3, sean, 191, 345 third street)
I thought of the following way to do this:
first_part = "INSERT INTO new_table ("
second_part = paste(colnames(my_table), collapse = ", ")
third_part = c(my_table[1,1], my_table[1,2], my_table[1,3], my_table[1,4])
third_part = paste(third_part , collapse = ", ")
fourth_part = c(my_table[2,1], my_table[2,2], my_table[2,3], my_table[2,4])
fourth_part = paste( fourth_part, collapse = ", ")
fifth_part = c(my_table[3,1], my_table[3,2], my_table[3,3], my_table[3,4])
fifth_part = paste(fifth_part , collapse = ", ")
final = paste0(first_part, second_part, "),", " VALUES ", "( ", third_part, " ),", " (" ,fourth_part, " ),", "(", fifth_part, ") ")
The resulting output somewhat matches the desired output:
> final
"INSERT INTO new_table (id, name, height, address), VALUES ( 1, sam, 156, 123 first street ), (2, smith, 175, 234 second street ),(3, sean, 191, 345 third street) "
In the end, I would like to paste this resulting string into a SQL software.
This was a very inefficient way of solving this problem - it's very long and time consuming, and there are plenty of places where mistakes.
Can someone please show me a "faster" way to to accomplish this?
Thank you!
Your final is not legal SQL, you need to quote your strings.
ischr <- sapply(dat, inherits, c("character", "factor"))
dat[ischr] <- lapply(dat[ischr], sQuote, FALSE)
paste(
"INSERT INTO new_table (",
paste(colnames(dat), collapse = " , "),
") VALUES",
paste(
paste0("( ", do.call(mapply, c(list(FUN = paste, sep = " , "), dat)), " )"),
collapse = ", "
)
)
# [1] "INSERT INTO new_table ( id , name , height , address ) VALUES ( 1 , 'sam' , 156 , '123 first street' ), ( 2 , 'smith' , 175 , '234 second street' ), ( 3 , 'sean' , 191 , '345 third street' )"
Data
dat <- structure(list(id = 1:3, name = c("'sam'", "'smith'", "'sean'"), height = c(156L, 175L, 191L), address = c("'123 first street'", "'234 second street'", "'345 third street'")), row.names = c("1", "2", "3"), class = "data.frame")

Dynamic raw query (select clause) Django

I'm trying to execute a raw query in Django where I dynamically want to pick column names.
Eg
def func(all=True){
if all:
query_parameters = {
'col': '*'
}
else:
query_parameters = {
'col': 'a,b,c'
}
with connections["redshift"].cursor() as cursor:
cursor.execute(
"select %(col)s from table limit 2 ;",
query_parameters,
)
val = dictfetchall(cursor)
return val
}
Django is executing it like.
select "*" from table limit 2;
so the output is just like select "*"
*
and in the else case it is executed like
select "a,b,c" from table limit 2;
so the output is a,b,c
How can I run the command so that Django run it like
select a , b , c from table limit 2
so that the output is
a b c
1 2 3
4 5 6
I found a hack myself.
See Here
Explanation
Prepared query step by step
Input data (Columns I need)
self.export_col = “a,b,c”
def calc_col(self):
if self.exp == 'T':
select_col = ""
self.export_col = self.export_col.split(',')
for col in self.export_col:
select_col += col + ","
select_col = select_col[:-1]
self.export_col = select_col
else:
self.export_col += '*'
def prepare_query(self):
query += " SELECT "
query += self.export_col
query += """ from table limit 2;"""

Filtering with sqldf in R when fields have quotation marks

I have a large sql db (7gbs), where the fields appear to have quotation marks in them.
For example:
res <- dbSendQuery(con, "
SELECT *
FROM master")
dbf2 <- fetch(res, n = 3)
dbClearResult(res)
Yields
NPI EntityTypeCode ReplacementNPI EmployerIdentificationNumber.EIN.
1 "1679576722" "1" "" ""
2 "1588667638" "1" "" ""
3 "1497758544" "2" "" "<UNAVAIL>"
ProviderOrganizationName.LegalBusinessName. ProviderLastName.LegalName. ProviderFirstName
1 "" "WIEBE" "DAVID"
2 "" "PILCHER" "WILLIAM"
3 "CUMBERLAND COUNTY HOSPITAL SYSTEM, INC" "" ""
I've been trying to get a smaller table by filtering on, say EntityTypeCode but I'm not getting any results. Here's an example of a query not getting anything, any advice? I think the issue is use of double quotes in the fields.
# Filter on State
res <- dbSendQuery(npi2, "
SELECT *
FROM master
WHERE (ProviderBusinessPracticeLocationAddressStateName = 'PA')
")
# Filter on State and type
res <- dbSendQuery(npi2, "
SELECT *
FROM master
WHERE (ProviderBusinessPracticeLocationAddressStateName = 'PA') AND
(EntityTypeCode = '1')
")
Escape the inner double quotes (ie, the ones in the cell) with a \.
res <- dbSendQuery(npi2, "
SELECT *
FROM master
WHERE (ProviderBusinessPracticeLocationAddressStateName = '\"PA\"') AND
(EntityTypeCode = '1')
")
This produces the following string:
SELECT *
FROM master
WHERE (ProviderBusinessPracticeLocationAddressStateName = '"PA"')

How to vectorize an SQL update query in R

I have the following zoo object (res)
column1 column2 column3
2015-12-30 3.2735 2.3984 1.1250
2015-12-31 2.5778 1.8672 1.1371
2016-01-01 3.3573 2.4999 1.1260
2016-01-04 3.3573 2.4999 1.1463
and I would like to produce a vectorized update query.
UPDATE table SET column1=3.2735, column2=2.3984, column3=1.1250 WHERE dt = '2015-12-30';
UPDATE table SET column1=2.5778, column2=1.8672, column3=1.1371 WHERE dt = '2015-12-31';
etc.
I was able to do something similar previously for an INSERT query
sColumns <- paste0("dt, index, ", paste0(colnames(res), collapse=", "))
sValues = apply(data.frame(paste0("'", index(res), "'"), paste0("'", index, "'"), coredata(res)),
1 , paste, collapse = ",")
sql <- paste0("INSERT INTO table (", sColumns, ") VALUES (", sValues, ")")
which was considerably easier because all column names were grouped, and all values were grouped. For an UPDATE query, I have to combine alternately columns and fields.
So far, I have the following:
sColumns <- paste0(colnames(res), "=")
tmp <- paste(c(matrix(c(sColumns, res[1, ]), 2, byrow = T)), collapse = ", ")
tmp <- gsub("=, ", "=", tmp)
Which produces (for one row), output like:
[1] "column1=3.2735, column2=2.3984, column3=1.125"
Can anyone provide guidance as to how I can use something like apply() to do this for all rows of 'res'?
1) Try this:
library(zoo)
sapply(1:nrow(res), function(i) paste0(
"UPDATE table SET ",
toString(paste0(names(res), "=", coredata(res)[i, ])),
" WHERE dt='", time(res)[i], "'"))
giving the following character vector:
[1] "UPDATE table SET column1=3.2735, column2=2.3984, column3=1.125 WHERE dt='2015-12-30'"
[2] "UPDATE table SET column1=2.5778, column2=1.8672, column3=1.1371 WHERE dt='2015-12-31'"
[3] "UPDATE table SET column1=3.3573, column2=2.4999, column3=1.126 WHERE dt='2016-01-01'"
[4] "UPDATE table SET column1=3.3573, column2=2.4999, column3=1.1463 WHERE dt='2016-01-04'"
2) And a variation giving the same result:
sapply(unname(split(res, time(res))), function(z) paste0(
"UPDATE table SET ",
toString(paste0(names(z), "=", z)),
" WHERE dt='", time(z), "'"))
Note 1: If your table is not too large then you could alternately consider reading it into R, performing the update in R and then writing it back.
Note 2: Here is the input shown reproducibly:
Lines <- "Date column1 column2 column3
2015-12-30 3.2735 2.3984 1.1250
2015-12-31 2.5778 1.8672 1.1371
2016-01-01 3.3573 2.4999 1.1260
2016-01-04 3.3573 2.4999 1.1463"
library(zoo)
res <- read.zoo(text = Lines, header = TRUE)
Is this what you want?
foo <- apply(res,1, function(x){
sprintf("%s = %f", names(x),x)
})
lapply(colnames(foo), function(nn) {
sprintf("UPDATE table SET %s WHERE dt = \'%s\'",
paste(foo[,nn], collapse=","),
nn)
})
which gives:
[[1]]
[1] "UPDATE table SET column1 = 3.273500,column2 = 2.398400,column3 = 1.125000 WHERE dt = '2015-12-30'"
[[2]]
[1] "UPDATE table SET column1 = 2.577800,column2 = 1.867200,column3 = 1.137100 WHERE dt = '2015-12-31'"
[[3]]
[1] "UPDATE table SET column1 = 3.357300,column2 = 2.499900,column3 = 1.126000 WHERE dt = '2016-01-01'"
[[4]]
[1] "UPDATE table SET column1 = 3.357300,column2 = 2.499900,column3 = 1.146300 WHERE dt = '2016-01-04'"