Datastax DSE: dse spark-submit ignores java properties passed - properties

I have setup a simple project to expirement with spark. I use dse 4.6.2 (3).
When i submit the jar with the following command (in stand-alone mode not yarn):
dse spark-submit --driver-java-options "-Dloggging.path=/home/$USER/log/" --class com.SimpleSparkApp my.jar
The property logging.path appears in spark.driver.extrajavaoptions so i can do sys.pros.get("spark.driver.extrajavaoptions") and parse the string, obviously not the best thing to do. On the other hand, the property does not exist when i search for it with sys.props.get("loggging.path")...
Shouldn't the properties be applied to the driver's jvm?
Any ideas?

Related

Unable to get image details : Environment version Autosave_(date)T(time)Z_******** provided in request doesn't match environ

On AzureML Batchendpoint, I'm recently hitting the following error:
Unable to get image details : Environment version Autosave_(date)T(time)Z_******** provided in request doesn't match environ.
when I setup the batch-endpoint with a yml config:
environment: azureml:env-name:env-version
So, AzureML creates and builds the environment with the version I specify env-version, which is just a number (in my case = 3).
and then for some weird reason, AzureML creates an extra environment version called Autosave_(date)T(time)Z_********, which is not built, but based on the previous one just created, and then it becomes the latest version of that environment.
In summary, AzureML instead of looking for the version that I specified as env-name:3 it seems to be looking for env-name:Autosave_(date)T(time)Z_******** and then throws the error message mentioned above.
I found the problem was that when creating an environment from a YAML specification file, one of my conda dependencies was cmake, which I needed to allow installation of another python module. The docker image is exactly the same as a previously created environment.
Removing the cmake dependency from the YAML file, eliminated the issue. So the workaround is to install it using a Dockerfile.
The error message was very misleading to start with, but got there in the end after understanding that AzureML reuses a cached image, based on the hash value, from the environment definition accordingly to this
So for that reason, the automatically created Autosave docker image references to that same build, which only happens once when the first job is sent.

Increase memory allocated to application deployed to payara micro

Am running my application from a payara micro UberJar and would like to increase the memory allocated to the application. How can I do this at the point of creating the uberJar?
There are a couple of ways you can do this. The first way I'll mention is the preferred way:
1. Use asadmin commands
The latest edition of Payara Micro introduces an option called --postbootcommandfile which allows you to run asadmin commands against Payara Micro. Your file should include something like this:
delete-jvm-options -Xmx=512m
create-jvm-options -Xmx=1g
create-jvm-options -Xms=1g
You will need to make sure you delete the existing options before applying new ones.
You can then use the file similar to this:
java -jar payara-micro.jar --postbootcommandfile myCommands.txt --deploy myApp.war --outputuberjar myPayaraMicroApp.jar
Your settings should now persist in the resulting Uber JAR.
2. Supply a custom domain.xml
The alternative to this would be modifying a domain.xml of your own and overriding the in-built domain.xml with your own.
You can use the --rootdir option to get Payara Micro to output its configuration to a directory so you can make changes there. This process is outlined in this blog:
http://blog.payara.fish/working-with-external-configuration-files-in-payara-micro
If you already have a custom domain.xml to hand, you can use the --domainconfig property to supply it, as follows:
java -jar payara-micro.jar --domainconfig myCustomDomain.xml --deploy myApp.war --outputuberjar myPayaraMicroApp.jar
After following either of these methods, you can simply start the resulting JAR and all the settings and configuration will be applied:
java -jar myPayaraMicroApp.jar
Payara Micro uber JAR is a plain JAR and it doesn't start a new JVM like Payara Server does. Therefore there's no way to modify JVM memory settings from within the JAR as the JVM is already started. Although it's possible to add the JVM settings into the Payara Micro configuration, they are ignored and not applied. Those configuration values are only used within Payara Server.
With Payara Micro uber JAR, you need to specify the JVM options on the command line, like this:
java -Xmx=1g -Xms=1g -jar myPayaraMicroApp.jar
If you need to specify JVM arguments in the uber JAR, you need to use a solution like capsule.io to wrap the JAR into a launcher JAR that would spawn a separate JVM for Payara Micro and pass the arguments to it.

ClassNotFoundException with Scalding on Zeppelin managed on YARN

I'm trying to get Scalding working on Zeppelin while using YARN. I followed the steps in the docs here to build the interpreter and set up the classpath override. When I run in local mode, code executes properly. However when I run on my cluster via YARN my jobs fail with:
Error: java.lang.ClassNotFoundException: cascading.CascadingException
or
Error: java.lang.ClassNotFoundException: cascading.tuple.TupleException
What is even stranger to me is that I can go into Zeppelin and execute:
import cascading.tuple.TupleException
import cascading.CascadingException
And both appear to have no problem finding those classes. It is only when I try to actually use scalding (on YARN), like loading data into a typed pipe and dumping that I get the ClassNotFoundException. Any ideas on how to debug or what to fix?
It looks like the cascading jars are not distributed to the YARN cluster. Please add "zeppelin/interpreter/scalding/*" to the args.string property of the scalding interpreter.
Here's the args.string we use:
-libjars /home/zeppelin-user/zeppelin/interpreter/scalding/,/home/zeppelin-user/deploy-bundle-201608111417/libs/ -Dscalding.reducer.estimator.classes=com.twitter.scalding.reducer_estimation.InputSizeReducerEstimator -Delephantbird.use.combine.input.format=true -Delephantbird.combine.split.size=134217728 --hdfs --repl
tmpjars contains jars that are distributed to the YARN cluster. You can see its contents with the command below:
%scalding
mode.asInstanceOf[Hdfs].conf.get("tmpjars").split(",").foreach(println)

Merging Solr 3.4.0 indexes using lucene Merge Tool

I have three solr 3.4.0 indexes that I want to merge, after searching I've found that there are two ways to do it:
Using Lucene Merge tool.
Merging through core admin
I am using lucene 3.4.0 and running following command:
java -cp d:/lucene/lucene-core-3.4.0.jar:./contrib/misc/lucene-misc-3.4.0.jarorg/apache/lucene/misc/IndexMergeTool ./newindex ./app1/solr/data/index ./app2/solr/data/index
but unfortunately it gives me following exception:
Exception in thread "Main Thread" java.lang.NoClassDefFoundError:
org/apache/lucene/misc/IndexMergeTool
Can anybody help me with this?
Couple of things :-
./contrib/misc/lucene-misc-3.4.0.jar
Are you running it from the correct directory for it to find the jar. Why not use full path ?
You are using :(colon) as jar classpath separator, and using windows it should be ; (semi-colon)
Also -
If you already have Solr running with the Solr indexes ready, I would recommend you to use the second option - merging through Solr Admin.
This is more easy to use with direct http interface without any additional overheads and would work out of the box.
I solved this particular problem by creating a new java application in net beans 7.1 and adding both the jar files as library. and inside my new application's main method i have called
IndexMergeTool.main
and pass all command line arguments to above mentioned method.
Regards
Ahsan

How to deactivate JLine for Jython interactive interpreter session?

Jython 2.5 comes with JLine per default.
I would prefer to use the interactive interpreter with rlwrap. It seems that rlwrap is not working if JLine is active.
In Scala I would use rlwrap scala -Xnojline.
Is there a similar option for Jython to deactivate JLine?
You can set the jython property python.console to
org.python.util.InteractiveConsole. This was the default in Jython
2.2 and is a simple history-less console. You can set this property via the command line like:
jython -Dpython.console=org.python.util.InteractiveConsole
or change the property in your local registry. See
http://wiki.python.org/jython/UserGuide#the-jython-registry