JVM is not ready after 10 seconds - jvm

I configured sparkr normally from the tutorials, and everything was working. I was able to read the database with read.df, but suddenly nothing else works, and the following error appears:
Error in sparkR.init(master = "local") : JVM is not ready after 10 seconds
Why does it appear now suddenly? I've read other users with the same problem, but the solutions given did not work. Below is my code:
Sys.setenv(SPARK_HOME= "C:/Spark")
Sys.setenv(HADOOP_HOME = "C:/Hadoop")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
#initialeze SparkR environment
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.2.0" "sparkr-shell"')
Sys.setenv(SPARK_MEM="4g")
#Create a spark context and a SQL context
sc <- sparkR.init(master = "local")
sqlContext <- sparkRSQL.init(sc)

Try to do few things below:
Check if c:/Windows/System32/ is there in the PATH.
Check if spark-submit.cmd has proper execute permissions.
If both the above things are true and even if it is giving the same error, then delete spark directory and again create a fresh one by unzipping spark gzip file.

I'm a beginner of R, and I have solved the same problem "JVM is not ready after 10 seconds" by installing JDK(version 7+) before installing sparkr in my mac. And it works well now. Hope this can help you with your problem.

Related

Camelot ignoring backend request

I recently got into camelot and but for some reason need to use my own backend to remove text during the imaging process.
Here is what I tried based on the documentation:
class ConversionBackend(object):
def convert(pdf_path, png_path):
arg2 = '-sOutputFile=' + png_path
p = subprocess.Popen(['/usr/bin/gs', '-sDEVICE=png16m', '-dNOPAUSE', '-dBATCH', '-dQUIET', "-dFILTERIMAGE", "-dFILTERTEXT", "-r300",str(arg2), pdf_path], stdout = subprocess.PIPE)
pass`
Aside from the fact that my ghostscript use is not optimal, all of this runs does work when I run it step by step in the python console. However when calling camelot using
tables = camelot.read_pdf(path, line_scale=100, split_text=True, flag_size=True, layout_kwargs={'detect_vertical': False, 'char_margin': 2.0}, pages='all', backend=ConversionBackend())
Camelot still executes my command but with complete disregard for the backend=ConversionBackend()
Any ideas on how to fix this?
Sh4yce

AsterixDB ERROR: Code: 1 "HYR0010: Node asterix_nc2 does not exist" M1 Mac

I'm trying to set up a sample cluster with asterixDB on my M1 mac. I have my environment up and running and I am able to successfully make SQL queries with the following code:
drop dataverse csv if exists;
create dataverse csv;
use csv;
create type csv_type as {
lat: int32,
long: int32
};
create dataset csv_set (csv_type)
primary key lat;
However, when I try to load the dataset with a CSV file it seems to brick my sample cluster and throws the error: Error Code: 1 "HYR0010: Node asterix_nc2 does not exist". The code which causes this is below.
use csv;
load dataset csv_set using localfs
(("path"="127.0.0.1:///Users/nicholassantini/Downloads/test.csv"),
("format"="delimited-text"));
Thus far I have tried both java's newest release of version 18 and 17.0.3 as well as a variety of ports for the queries. I'm not sure what else to try. Some logs that I think are relevant say that it is failing to connect to the node. Not sure if that's an issue with the port or the node itself. Here is a snippet of those logs.
image.png
Also in case it matters, my CSV is a simple 2 column 2 row file with all single-digit integer values.
I appreciate any and all help.
After consulting the developer help email thread, I was able to find that the issue stems from the release of asterixDB that I was using (0.9.7.1). Upgrading to the newest release(0.9.8) fixed this issue.
The link can be found here:
https://ci-builds.apache.org/job/AsterixDB/job/asterixdb-snapshot-integration/lastSuccessfulBuild/artifact/asterixdb/asterix-server/target/asterix-server-0.9.8-SNAPSHOT-binary-assembly.zip

Pylint: same pylint and pandas version on 2 machines, 1 fails

I have 2 places running the same linting job:
Machine 1: Ubuntu over SSH
pandas==1.2.3
pylint==2.7.4
python 3.8.10
Machine 2: Gitlab CI Docker image, python:3.8.12-buster
pandas==1.2.3
pylint==2.7.4
Python 3.8.12
The Ubuntu machine is able to lint all the code fine, and it has for many months. Same for the CI job, except it had been running Python 3.7.8. Now that I upgraded the Docker image to Python 3.8.12, it throws several no-member linting errors on some Pandas objects. I've tried clearing CI caches etc.
I wish I could provide something more reproducible. But, to check my understanding of what a linter is doing, is it theoretically possible that a small version difference in python messes up pylint like this? For something like a no-member error on Pandas objects, I would think the dominant factor is the pandas version, but those are equal, so I'm confused!
Update:
I've looked at the Pandas code for pd.read_sql_query, which is what's causing the no-member error. It says:
def read_sql_query(
sql,
con,
index_col=None,
coerce_float=True,
params=None,
parse_dates=None,
chunksize: Optional[int] = None,
) -> Union[DataFrame, Iterator[DataFrame]]:
In Docker, I get E1101: Generator 'generator' has no 'query' member (no-member) (because I'm running .query on the returned dataframe). So it seems Pylint thinks that this function returns a generator. But it does not make this assumption in my other setup. (I've also verified the SHA sum of pandas/io/sql.py matches). This seems similar to this issue, but I am still baffled by the discrepancy in environments.
A fix that worked was to bump a limit like:
init-hook = "import astroid; astroid.context.InferenceContext.max_inferred = 500"
in my .pylintrc file, as explained here.
I'm unsure why/if this is connected to my change in Python version, but I'm happy to use this and move on for now. It's probably complex.
(Another hack was to write a function that returns the passed arg if the passed arg is a dataframe, and returns 1 dataframe if the passed arg is an iterable of dataframes. So the ambiguous-type object could be passed through this wrapper to clarify things for Pylint. While this was more intrusive on our codebase, we had dozens of calls to pd.read_csv and pd.real_sql_query, and only about 3 calls caused confusion for Pylint, so we almost used this solution)

Sparklyr : sql temporary error : argument is not interpretable as logical

Hi I'm new to sparklyr and I'm essentially running a query to create a temporary object in spark.
The code is something like
ts_data<-tbl(sc,"db.table") %>% filter(condition) %>% compute("ts_data")
sc is my spark connection.
I have run the same code before and it works but now I get the following error.
Error in if (temporary) sql("TEMPORARY ") : argument is not
interpretable as logical
I have tried changing filters, tried it with new tables, R versions and snapshots. Yet it still gives the same exact error. I am positive there are no syntax errors
Can someone help me understand how to fix this?
I ran into the same problem. Changing compute("x") to compute(name = "x") fixed it for me.
This was a bug of sparklyr, and is fixed with version 1.7.0. So either use matching by argument (name = x) or update your sparklyr version.

Weblogic Exception after deploy: java.rmi.UnexpectedException

Just encountered a similar issue as described in the below article:
Question: Article with similar error description
java.rmi.UnmarshalException: cannot unmarshaling return; nested exception is:
java.rmi.UnexpectedException: Failed to parse descriptor file; nested exception is:
java.rmi.server.ExportException: Failed to export class
I found that the issue described is totally unrelated to any Java update and is rather an issue with the Weblogic bean-cache. It seems to use old compiled versions of classes when updating a deployment. I was hunting a similar issue in a related question (Question: Interface-Implementation-mismatch).
How can I fix this properly to allow proper automatic deployment (with WLST)?
After some feedback from the Oracle community it now works like this:
1) Shutdown the remote Managed Server
2) Delete directory "domains/#MyDomain#/servers/#MyManagedServer#/cache/EJBCompilerCache"
3) Redeploy EAR/application
In WLST (which one would need to automate this) this is quite tricky:
import shutil
servers=cmo.getServers()
domainPath = get('RootDirectory')
for thisServer in servers:
pathToManagedServer = domainPath + "\\servers\\" + thisServer.getName()
print ">Found managed server:" + pathToManagedServer
pathToCacheDir = pathToManagedServer + "\\" + "cache\\EJBCompilerCache"
if(os.path.exists(pathToCacheDir) and os.path.isdir(pathToCacheDir) ):
print ">Found a cache directory that will be deleted:" + pathToCacheDir
# shutil.rmtree(pathToCacheDir)
Note: Be careful when testing this, the path that is returned by "pathToCacheDir" depends on the MBean-context that is currently set. See samples for WLST command "cd()". You should first test the path output with "print domainPath" and later add the "rmtree" python command! (I uncommented the delete command in my sample, so that nobody accidentially deletes an entire domain!)