How to prevent errors in format (string-variable) when downloading data from Google Big Query to R? - sql

I have some data stored (Tweets streamed from Twitters Rest API) in Google Big Query, which, in the preview looks like this
'I’m up by myself.'
However, when I download it into R, it looks like this;
'I’m up by myself.'
Is there any way to prevent it?
I am using this code to download the data in R:
library(bigrquery)
project_id <- "my_project"
sql_string <-
"SELECT
text,
FROM my_under_project.my_table,
LIMIT 500
;"
test <- query_exec(sql_string, project = project_id, useLegacySql = FALSE, allowLargeResults=TRUE, max_pages = Inf)
str(test)
#data.frame': 500 obs. of 1 variable:
#$ text: chr "tweets" ...
The data from 'text' is stored as a string in Big Query.
Any help is appreciated! Thanks in advance!

I downloaded the data by 'bq_table_download' from the same package (instead of query_exec) from the same package and that solved the problem!
Special characters when importing from BigQuery to R

Related

Save the output as csv file from VScode programmatically

My Requirement: Save the SQL query output (received the data) in to the local drives in csv file format.
My OS: Windows 10 64 bit
VS Code 1.67.1:
I have installed the following extensions to connect with snowflake data warehouse:
SQL Tools
Snowflake driver for SQL Tools
I have successfully connected my snowflake (cloud) data warehouse and received the data at
the VS code.
What I want is to save (export) the output (received the data) to the local file (for example to D:\result\result.csv).
How can I achieve that?
image attached for your reference.
thank you all.
pmk
If you can run a bit of python, this does pretty much what you need
import snowflake.connector
import csv
conn = snowflake.connector.connect(
user='',
password='',
account='',
warehouse='',
database='',
schema='',
role=''
)
results = conn.cursor().execute("""MY QUERY""").fetchall()
csvfile = open(r"PATH TO FILE",'a', newline='')
output = csv.writer(csvfile)
output.writerows(results)
csvfile.close

knitr sql chunk not saving data into variable

My RMarkdown notebook with a SQL chunk runs fine when I run all the chunks one by one interactively, but when I try to knit, the SQL chunk does not have save the data into the specified variable. When the dataset that was supposed to be generated using the SQL chunk is referenced in later R chunks, the dataset variable is simply empty.
Here's an example
{r setup, include=FALSE, warning=FALSE, message=FALSE}
# load necessary libraries
library(bigrquery)
library(knitr)
library(tidyverse)
db <- dbConnect(dbi_driver(), dataset = 'sandbox', project = 'project_id', use_legacy_sql = FALSE)
df <- NULL
```
```{sql, connection=db, output.var=df}
select * from example_dataset
limit 10
```
returns dataset
```{r}
head(df)
```
NULL
I've tried the solution here (R: Knitr gives error for SQL-chunk), but it didn't solve my problem.
Just ran into the same problem and it looks like you need to quote the variable you are assigning.
```{sql, connection=db, output.var="df"}
select * from example_dataset
limit 10
```
Source: http://rmarkdown.rstudio.com/authoring_knitr_engines.html#sql

Passing query as variable to Rmarkdown sql chunk

I'm trying to use the SQL chunk function available in the preview version of RStudio 1.0 to connect to a SQL Server (using the RSQLServer backend for DBI) and I'm having some difficulty passing variables.
If I connect to the server and then put the query in the chunk it works as expected
```{r, eval = F}
svr <- dbConnect(RSQLServer::SQLServer(), "Server_name", database = 'Database_name')
query <- 'SELECT TOP 10 * FROM Database_name.dbo.table_name'
```
```{sql, connection = svr, eval = F}
SELECT TOP 10 * FROM Database_name.dbo.table_name
```
But if I try to pass the query as a variable it throws an error
```{sql, connection = svr, eval = F}
?query
```
Error: Unable to retrieve JDBC result set for 'SELECT TOP 10 * FROM Database_name.dbo.table_name': Incorrect syntax near 'SELECT TOP 10 * FROM Database_name.dbo.table_name'.
Failed to execute SQL chunk
I think it's related to the way R wraps character vectors in quotes, because I get the same error if I run the following code.
```{sql, connection = svr, eval = F}
'SELECT TOP 10 * FROM Database_name.dbo.table_name'
```
Is there a way I can get around this error?
Currently I can achieve what I want by using inline expressions to print the query, using the pygments for highlighting and running the query in a R chunk with DBI commands, so using code chunks would be a bit nicer.
Looks like Using R variables in queries applies some kind of escaping and can therefore only be used in cases like the example from the documentation (SELECT * FROM trials WHERE subjects >= ?subjects) but not to dynamically set up the whole query.
Instead, the code chunk option can be used to achieve the desired behavior:
The example uses the SQLite sample database from sqlitetutorial.net. Unzip it to your working directory before running the code.
```{r}
library(DBI)
db <- dbConnect(RSQLite::SQLite(), dbname = "chinook.db")
query <- "SELECT * FROM tracks"
```
```{sql, connection=db, code = query}
```
I haven't been able to determine a way to print and execute in the same chunk however with a few extra lines of code it is possible to achieve my desired output.
Printing is solved by CL.'s answer and then I can use EXEC to run the code.
```{sql, code = query}
```
```{sql, connection = svr}
EXEC (?query)
```

Trouble running SQL queries via RODBC

I have a file called q_cleanup.sql that I am reading into R via readLines(). This file has lots of little queries we wrote to clean up some really ugly data. Once I read the into R and process the text, I run each query in the file.
All of the queries work when run directly through Oracle's SQL Developer and Tora.
Some of the queries fail when run via RODBC.
For example. The file contains the following two queries (cut and pasted out of the file)
update T_HH_TMP
set program_type = 'not able to contact'
where
program_type like '%n0t%'
or program_type like '%not able to%'
;
update T_HH_TMP
set program_type = 'hh substance use'
where program_type like '%hh substance abuse%'
;
The first query runs. The second query errors. Below is the relevant section out of my cleanup.R file. The command odbcStart() is a function I built to simplify opening and closing rodbc connections. It is not the problem.
odbcStart()
qry <- readLines("sql/q_cleanup.sql")
qry <- paste(qry[-grep("--", qry)] , collapse=" ")
qry <- unlist(strsplit(qry, ";"))
for(i in seq_along(qry)) {
print("------------------------------------------------------------")
print(qry[i])
print(sqlQuery(con, qry[i]))
}
odbcClose(com)
I am stripping off anything / everything that I can think of that might cause a problem and my string is wrapped in double quotes and my query contains ONLY single quotes. Yet, the output looks like this:
[1] "------------------------------------------------------------"
[1] " update T_HH_TMP set program_type = 'not able to contact' where program_type like '%n0t%' or program_type like '%not able to%' "
character(0)
[1] "------------------------------------------------------------"
[1] " update T_HH_TMP set program_type = 'hh substance use' where program_type like '\\%hh substance abuse\\%' "
[1] "[RODBC] ERROR: Could not SQLExecDirect ' update T_HH_TMP set program_type = 'hh substance use' where program_type like '\\%hh substance abuse\\%' '"
I do not feel that the % is the problem because the first query runs just fine.
Any help? I really would prefer to script the running of all these queries in R.
I thought I would share what I know. I have a solution, even though I consider it sub-optimal because it complicates my workflow unnecessarily.
I do not know if the problem is caused by Oracle server, SQL Plus or if it has something to do with R / Emacs on Windows. I am not an Oracle expert and the office I work for is moving to Vertica by the end of the summer, so I am not going to invest much more effort in fixing this.
I am using sqlplus.exe to run SQL syntax that creates either a view or stored procedure and I am then running the view / SP via R. Thus, the command I have to pass to Oracle via R is SIMPLE and it can handle it.
To script sqlplus from R, I am using the following function that I will someday improve. It has no error handling and it basically assumes you are being nice, but it does work.
#' queryFile() runs a longish series of queries in a .sql file.
#' It is very important to understand that the path to sqlplus is hardcoded
#' because Windows has a shitty path system. It may not run on another system
#' without being edited.
#'
#' #param file - The relative path to the .sql file.
#' #return output - Vector containing the results from sqlplush
#'
queryFile <- function(file){
cmd <- "c:/Oracle/app/product/11.2.0/client_1/sqlplus.exe %user/%password#%db #%file"
cmd <- gsub("%user", getOption("DataMart")$uid, cmd )
cmd <- gsub("%password", getOption("DataMart")$pwd, cmd )
cmd <- gsub("%db", getOption("DataMart")$db, cmd )
cmd <- gsub("%file", file, cmd )
print(cmd)
output <- system(cmd, intern=TRUE)
return(output)
}
Apparently Markdown does not like my Roxygen style comments. Sorry.
The point of this function is that you pass it the file with the SQL syntax. It uses SQL Plus to run the syntax. To store / access user name, password, etc. I use a file called ~/passwords.R. It has a series of options() commands that look like this:
## Fake example.
options( DataMart = list(
uid = "user_name"
,pwd = "user_password"
,db = "TNS Database"
,con_type = "ODBC"
,srvr_type = "Oracle"
)
)
The last two (cont_type and srvr_type) are just things that I like to have documented. They are not really needed. I have ~ 10 of these in my file and I use this to remind me which db server I am writing against. I have to write against SQL Server, Vertica, MySQL and Oracle (different projects / employers) and this helps me.
The function I provided uses options() to access that necessary information and then runs SQLPlus.exe. I could have added SQLPlus to my Window's path, but I was trying to make this function semi-independent and it seems like our IT people are consistent about where SQL Plus lives (of course there are different versions running around, but at least I don't have to explain the idea of path to someone who is not really a programmer.)

concatenate text files and import them into a SQLite DB

Let us say I have thousands of comma separated text files with 1050 columns each (no header). Is there a way to concatenate and import all the text files into one table, one database in SQLite (Ideally I'd use R and sqldf to communicate with SQlite).
I.e.,
Each file is called, table1.txt, table2.txt, table3.txt; all of different number of rows, but same column types, and different unique IDs in the IDs column (first column of each file).
table1.txt
id1,20.3,1.2,3.4
id10,2.1,5.2,9.3
id21,20.5,1.2,8.4
table2.txt
id2,20.3,1.2,3.4
id92,2.1,5.2,9.3
table3.txt
id3,1.3,2.2,5.4
id30,9.1,4.4,9.3
The real example is pretty much the same but with more columns and more rows. AS you can note the first column in each file corresponds to a unique ID.
Now I'd like my new table in supertable, in the DB, super.db to be (also uniquely indexed):
super.db - name of the DB
mysupertable - name of the table in the DB
myids,v1,v2,v3
id1,20.3,1.2,3.4
id10,2.1,5.2,9.3
id21,20.5,1.2,8.4
id2,20.3,1.2,3.4
id92,2.1,5.2,9.3
id3,1.3,2.2,5.4
id30,9.1,4.4,9.3
For reference, I am using SQLite3; and I am looking for a SQL command that I can run on the background without logging interactively into the sqlite3 interpreter, i.e., IMPORT bla INTO,...
I could try in unix:
cat *.txt > allmyfiles.txt
and then a .sql file,
CREATE TABLE test (myids varchar(255), v1 float, v2 float, v3 float);
.separator ,
.import output.csv test
But this command does not work since I am using R sqldf library, and dbGetQuery(db, sql) and I have no idea how to create such string in R without getting an error.
p.s. I asked a similar Q for appending tables from a DB but this time I need to append/import text files not tables from a DB.
If you are using sqlite database files anyway, you might want to consider working with RSQLite.
install.packages( "RSQLite" ) # will install package "DBI"
library( RSQLite )
db <- dbConnect( dbDriver("SQLite"), dbname = "super.db" )
You still can use the unix command within R which should be faster than any loop in R, using the system() command:
system( "cat *.txt > allmyfiles.txt" )
Provided that your allmyfiles.txt has a consistent format, you can import it as a data.frame into R
allMyFiles <- read.table( "allmyfiles.txt", header = FALSE, sep = "," )
and write it to your database, following #Martín Bel's advice, with something like
dbWriteTable( db, "mysupertable", allMyFiles, overwrite = TRUE, append = FALSE )
EDIT:
Or, if you don't want to route your data through R,you can again resort to using the system() command. This may get you started:
You have a file with the data you want to get into SQLite called allmyfiles.txt. Create a file called table.sql with this content (obviously the structure must match):
CREATE TABLE mysupertable (myids varchar(255), v1 float, v2 float, v3 float);
.separator ,
.import allmyfiles.txt mysupertable
and call it from R with
system( "sqlite3 super.db < table.sql" )
That should avoid routing the data through R but still do all the work from within R.
Take a look at termsql:
https://gitorious.org/termsql/pages/Home
cat *.txt | termsql -d ',' -t mysupertable -c 'myids,v1,v2,v3' -o mynew.db
This should do the job.