Sparklyr : sql temporary error : argument is not interpretable as logical - sql

Hi I'm new to sparklyr and I'm essentially running a query to create a temporary object in spark.
The code is something like
ts_data<-tbl(sc,"db.table") %>% filter(condition) %>% compute("ts_data")
sc is my spark connection.
I have run the same code before and it works but now I get the following error.
Error in if (temporary) sql("TEMPORARY ") : argument is not
interpretable as logical
I have tried changing filters, tried it with new tables, R versions and snapshots. Yet it still gives the same exact error. I am positive there are no syntax errors
Can someone help me understand how to fix this?

I ran into the same problem. Changing compute("x") to compute(name = "x") fixed it for me.
This was a bug of sparklyr, and is fixed with version 1.7.0. So either use matching by argument (name = x) or update your sparklyr version.

Related

Pandas diff-function: NotImplementedError

When I use the diff function in my snippet:
for customer_id, cus in tqdm(df.groupby(['customer_ID'])):
# Get differences
diff_df1 = cus[num_features].diff(1, axis = 0).iloc[[-1]].values.astype(np.float32)
I get:
NotImplementedError
The exact same code did run without any error before (on Colab), whereas now I'm using an Azure DSVM via JupyterHub and I get this error.
I already found this
pandas pd.DataFrame.diff(axis=1) NotImplementationError
but the solution doesnt work for me as I dont have any Date types. Also I did upgrade pandas but it didnt change anything.
EDIT:
I have found that the error occurs when the datatype is 'int16' or 'int8'. Converting the dtypes to 'int64' solves it.
However I leave the question open in case someone can explain it or show a solution that works with int8/int16.

Some clarification around sql verbage within R, specifically with dbListFields

I'm using R to connect to an Oracle Database. I'm able to do so with this code:
library(rJava)
library(RJDBC)
jdbcDriver <- JDBC(driverClass="oracle.jdbc.driver.OracleDriver", classPath="jar files/ojdbc8.jar")
jdbcConnection <- dbConnect(jdbcDriver, "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=blablablablabla.com)(PORT=1234))(CONNECT_DATA=(SERVICE_NAME=pdblhs)))", "myusername", "mypassword")
DataFrame <- dbGetQuery(jdbcConnection, "select protocol_id, disease_site_code, disease_site
From abc_place_prod.Rv_sip_PCL_disease_site
order by protocol_id")
# Close connection
dbDisconnect(jdbcConnection)
This snippet of code works. It downloads the data I want into a neat data frame and works as planned.
But I'm trying to learn how to navigate with R/SQL more and I wanted to use dbListFields to explore all the possible column names in that table I'm pulling from. Per the documentation here, you could type something like this:
dbListFields(con, "mtcars")
and it would work. So I tried:
dbListFields(jdbcConnection, "abc_place_prod.Rv_sip_PCL_disease_site")
and it returned this error:
Error in dbSendQuery(conn, paste("SELECT * FROM ", dbQuoteIdentifier(conn, :
Unable to retrieve JDBC result set
JDBC ERROR: ORA-00933: SQL command not properly ended
It seems like someone else experienced something similar here and I'm not sure it ever got answered. I've tried it with a semicolon at the end (that "ended" something else properly for me, and its mentioned in the comments of this question) and no luck.

Unable to query using file in Data Proc Hive Operator

I am unable to query with .sql file in DataProcHiveOperator.
Though the documentation tells that we can query using file. Link of the documentation Here
It is working fine when I give query directly
Here is my sample code which is working fine with writing query directly:
HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
gcp_conn_id='google_cloud_default',
query='CREATE TABLE TABLE_NAME(NAME STRING);',
cluster_name='cluster-name',
region='us-central1',
dag=dag)
Querying with file :
HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
gcp_conn_id='google_cloud_default',
query='gs://us-central1-bucket/data/sample_hql.sql',
query_uri="gs://us-central1-bucket/data/sample_hql.sql
cluster_name='cluster-name',
region='us-central1',
dag=dag)
There is no error on sample_hql.sql script.
It is reading file location as a query and throwing me the error as:
Query: 'gs://bucketpath/filename.q'
Error occuring - cannot recognize input near 'gs' ':' '/'
Similar issue has also been raised Here
The issue is because you have passed query='gs://us-central1-bucket/data/sample_hql.sql' as well.
You should pass exactly 1 of query or queri_uri.
The code in your question has both of them, so remove query or use the following code:
HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
gcp_conn_id='google_cloud_default',
query_uri="gs://us-central1-bucket/data/sample_hql.sql",
cluster_name='cluster-name',
region='us-central1',
dag=dag)

JVM is not ready after 10 seconds

I configured sparkr normally from the tutorials, and everything was working. I was able to read the database with read.df, but suddenly nothing else works, and the following error appears:
Error in sparkR.init(master = "local") : JVM is not ready after 10 seconds
Why does it appear now suddenly? I've read other users with the same problem, but the solutions given did not work. Below is my code:
Sys.setenv(SPARK_HOME= "C:/Spark")
Sys.setenv(HADOOP_HOME = "C:/Hadoop")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
#initialeze SparkR environment
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.2.0" "sparkr-shell"')
Sys.setenv(SPARK_MEM="4g")
#Create a spark context and a SQL context
sc <- sparkR.init(master = "local")
sqlContext <- sparkRSQL.init(sc)
Try to do few things below:
Check if c:/Windows/System32/ is there in the PATH.
Check if spark-submit.cmd has proper execute permissions.
If both the above things are true and even if it is giving the same error, then delete spark directory and again create a fresh one by unzipping spark gzip file.
I'm a beginner of R, and I have solved the same problem "JVM is not ready after 10 seconds" by installing JDK(version 7+) before installing sparkr in my mac. And it works well now. Hope this can help you with your problem.

R-SQL Invalid value from generic function ‘fetch’, class “try-error”, expected “data.frame”

I am having a problem to fetch some data from database using ROracle. Everything works perfect (I am getting the data from different tables without any problem), but one of the tables throws an error:
library(ROracle)
con <- dbConnect(dbDriver("Oracle"),"xxx/x",username="user",password="pwd")
spalten<- dbListFields(con, name="xyz", schema = "x") # i still get the name of the columns for this table
rs <- dbSendQuery(con, "Select * From x.xyz") # no error
data <- fetch(rs) # this line throws an error
dbDisconnect(con)
Fehler in .valueClassTest(ans, "data.frame", "fetch") : invalid
value from generic function ‘fetch’, class “try-error”, expected
“data.frame”
I followed this question: on stackoverflow, and i selected the columns
rs <- dbSendQuery(con, "Select a From x.xyz")
but none of it worked and gave me the same error.
Any ideas what am I doing wrong?
P.S. I have checked the sql query in Oracle SQL Developer, and I do get the data table there
Update:
If anyone can help me to locate/query my Oracle error log, then perhaps I can find out what is actually happening on the database server with my troublesome query.
This is for debugging purposes only. Try running your code in the following tryCatch construct. It will display all warnings and errors which are happening.
result <- tryCatch({
con <- dbConnect(dbDriver("Oracle"),"xxx/x",username="user",password="pwd")
spalten <- dbListFields(con, name="xyz", schema = "x")
rs <- dbSendQuery(con, "Select * From x.xyz") # no error
data <- fetch(rs) # this line throws an error
dbDisconnect(con)
}, warning = function(war) {
print(paste("warning: ",war))
}, error = function(err) {
print(paste("error: ",err))
})
print(paste("result =",result))
I know I'm late to the game on this question, but I had a similar issue and discovered the problem: My query also ran fine in SQL Developer, but that only fetches 50 rows at a time. ROracle fetches the whole data set. So, an issue that appears later in the data set won't show immediately in SQL Developer. Once I pages through results in SQL Developer, it threw and error at a certain point because there was a problem with the actual value stored in the table. Not sure how an invalid value got there, but fixing it fixed the problem in ROracle.