Spark application development on local cluster by IntelliJ

Spark application development on local cluster by IntelliJ - intellij-idea

I tried many things to execute the application on local cluster. However it did not work.
I am using CDH 5.7 and spark version is 1.6.
I am trying to create dataframe from hive on CDH 5.7.
If I use spark-shell, all codes works really well. However, I have no idea how can I set my intellJ configuration for efficient development environment.
Here is my code;
import org.apache.spark.{SparkConf, SparkContext}
object DataFrame {
def main(args: Array[String]): Unit = {
println("Hello DataFrame")
val conf = new SparkConf() // skip loading external settingg
.setMaster("local") // could be "local[4]" for 4 threads
.setAppName("DataFrame-Example")
.set("spark.logConf", "true")
val sc = new SparkContext(conf)
sc.setLogLevel("WARN")
println(s"Running Spark Version ${sc.version}")
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("From src select key, value").collect().foreach(println)
}
}
When I run this program on IntelliJ, the error messages are following;
Hello DataFrame
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/29 11:30:57 INFO Slf4jLogger: Slf4jLogger started
Running Spark Version 1.6.0
16/05/29 11:31:02 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0
16/05/29 11:31:02 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:329)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:239)
at org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:459)
at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:459)
at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:458)
at org.apache.spark.sql.hive.HiveContext$$anon$3.<init>(HiveContext.scala:475)
at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:475)
at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:474)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at org.corus.spark.example.DataFrame$.main(DataFrame.scala:25)
at org.corus.spark.example.DataFrame.main(DataFrame.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:539)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
... 24 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:624)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:573)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:517)
... 25 more
Process finished with exit code 1
Is there anyone know a solution?
Thanks.
I found several resources about this problem. But none of them did not work.
https://www.linkedin.com/pulse/develop-apache-spark-apps-intellij-idea-windows-os-samuel-yee
https://blog.cloudera.com/blog/2014/06/how-to-create-an-intellij-idea-project-for-apache-hadoop/

Thanks all. I solved the problem by my self.
The problem is that local spark(Maven version) did not know the information of hive on our cluster.
The solution is very simple.
Just add following codes on your source code.
conf.set("spark.sql.hive.thriftServer.singleSession", "true")
System.setProperty("hive.metastore.uris","thrift://hostname:serviceport")
It works!
Let's play with spark.

Related

Flink 1.10 not connecting to minio (s3)

I'm running locally a docker compose running flink and minio
When I try to connect to minio, I always get the following error:
caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
It seems that the plugin isn't loaded correctly.
my flink config (flink-conf.yaml):
state.backend: filesystem
s3.endpoint: http://minio:9000
s3.path.style.access: true
s3.access-key: minio
s3.secret-key: minio123
presto.s3.access-key: minio
presto.s3.secret-key: minio123
presto.s3.endpoint: http://minio:9000
presto.s3.path-style-access: true
I've copied the required plugin as following:
mkdir -p plugins/s3-fs-presto
cp opt/flink-s3-fs-presto-*.jar plugins/s3-fs-presto
Any suggestions?
Stack trace:
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7ae6657256719d8c32d76ba113fb35f0)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:820)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
... 21 more
Caused by: java.io.IOException: Error opening the Input Split s3://test/test.txt [0,3243]: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:450)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:995)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:446)
... 2 more
#

SparkSQL error: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

I have installed Spark in windows machine(standalone) and trying to connect HDP 2.6 hive metastore which is available in VM using Spark app.
I have used NAT as Network Adapter for HDP 2.6 VM.
While I am trying to connect hive metastore (HDP 2.6 VM) from Spark application (Local mode on Windows machine), I am getting below error message.
17/08/12 17:00:16 INFO metastore: Waiting 1 seconds before next connection attempt.
17/08/12 17:00:17 INFO metastore: Trying to connect to metastore with URI thrift://172.0.0.1:9083
17/08/12 17:00:38 WARN metastore: Failed to connect to the MetaStore Server...
17/08/12 17:00:38 INFO metastore: Waiting 1 seconds before next connection attempt.
17/08/12 17:00:39 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:193)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1050)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:129)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:126)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
at com.psl.spark.RemoteHiveConnSpark1_6$.main(RemoteHiveConnSpark1_6.scala:29)
at com.psl.spark.RemoteHiveConnSpark1_6.main(RemoteHiveConnSpark1_6.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
Spark application:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
/**
* #author prasanta_sahoo
*/
object RemoteHiveConnSpark1_6 {
def main(arg: Array[String]) {
//Create conf object
val conf = new SparkConf()
.setAppName("RemoteHiveConnSpark1_6")
.setMaster("local") // local mode
.set("spark.storage.memoryFraction", "1")
System.setProperty("hive.metastore.uris", "thrift://172.0.0.1:9083");
//create spark context object
val sc = new SparkContext(conf)
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
//hiveContext.setConf("hive.metastore.uris", "thrift://172.0.0.1:9083");
//disable case sensitivity of SQL
//hiveContext.sql("set spark.sql.caseSensitive=false");
hiveContext.sql("FROM default.sample_07 SELECT code, description, total_emp, salary").collect().foreach(println)
}
}
Can anyone please help me to solve the issue?

To connect with Hive metastore, following configurations are required,
spark.yarn.dist.files ///apps/spark/hive-site.xml,///apps/spark/datanucleus-rdbms-4.1.7.jar,///apps/spark/datanucleus-core-4.1.6.jar,///apps/spark/datanucleus-api-jdo-4.2.1.jar
spark.sql.hive.metastore.version
Please confirm those configuration are there.

Create Dataframe issue in Pyspark from Windows 10

I am unable to execute the below command from pyspark windows
schemaPeople = spark.createDataFrame(people)
I have set HADOOP_HOME to winutils
I have provide 77 permission to C:/tmp/hive
Still I am getting the below error -
Py4JJavaError: An error occurred while calling o23.applySchemaToPythonRDD.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:189)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
I have gone through a lot of similar questions before posting this , appreciate any help here

I got this error a bunch when trying to setup Spark on windows using the winutils file. I had to setup Spark differently to get around this.
I ended up downloading the Hadoop binary for my version of spark and going from there. I documented the whole thing with a walkthrough if you are interested. Spark on windows
The gist is that the official Hadoop release from Apache does not include a Windows binary and compiling from sources can be tedious so really helpful people have made compiled distributions available. If you want to use Spark 2.0.2 download the binaries from steve loughran's github for 2.1.0 you can download from here from there you should be able to set it up as expected.

Install Spark on existing Hadoop cluster (ISSUE with HIVE)

I am trying to get a Spark/Shark cluster up but keep running into the same problem. I have followed the instructions on https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster and addressed Hive as stated.
Here are the details, any help would be great.
I already installed the following package:
Spark/Shark 1.0.0
Apache Hadoop 2.4.0
Apache Hive 0.13
Scala 2.9.3
Java 7
I configure ~/spark/conf/spark-env.sh as:
export HADOOP_HOME=/path/to/hadoop/
export HIVE_HOME=/path/to/hive/
export MASTER=spark://xxx.xxx.xxx.xxx:7077
export SPARK_HOME=/path/to/spark
export SPARK_MEM=4g
export HIVE_CONF_DIR=/path/to/hive/conf/
source $SPARK_HOME/conf/spark-env.sh
When start spark with "./spark-withinfo", I get the following errors:
-hiveconf hive.root.logger=INFO,console
Starting the Shark Command Line Client
14/07/07 16:26:57 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
14/07/07 16:26:57 [main]: WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO SessionState:
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:344)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:128)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2444)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:338)
... 2 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)
... 7 more
Caused by: java.lang.NoSuchFieldError: METASTOREINTERVAL
at org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:78)
at org.apache.hadoop.hive.metastore.RetryingRawStore.<init>(RetryingRawStore.java:60)
at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:71)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:413)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:401)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:439)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:325)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:285)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4102)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:121)
... 12 more
I guess Spark can not find some libs ton connect metastore in Hive, but I have been stacked here for a couple days and don't know how to solve it. BTW, I use MYSQL for hive metadata, and everything works well in hive.
Any help is appreciated. Thanks in advance.

You may need to add mysql connector jar file before you start spark...
In my case, I added mysql connector jar like below.
$SPARK_HOME/bin/compute-classpath.sh
CLASSPATH=$CLASSPATH:/opt/big/hive/lib/mysql-connector-java-5.1.25-bin.jar

Glassfish startup failure

I have set up a glassfish server for learning about it. After setting up and configuring depending on the quickstart guide, I was able to run the server and domain1 without any problems. after some time, it started to log the lines below:
[#|2013-01-11T15:43:45.246+0800|WARNING|glassfish3.1.2|java.util.prefs|_ThreadID=105;_ThreadName=Thread-2;|Could not lock User prefs. Unix error code 5.|#]
[#|2013-01-11T15:43:45.246+0800|WARNING|glassfish3.1.2|java.util.prefs|_ThreadID=105;_ThreadName=Thread-2;|Couldn't flush user prefs: java.util.prefs.BackingStoreException: Couldn't get file lock.|#]
And I made a little googling about this and found this link and applied the option which was recommended there. After restarting glassfish although the server log says it started, I am seeing this in the commandline:
./asadmin start-domain domain1
Waiting for domain1 to start .............Error starting domain domain1.
The server exited prematurely with exit code 1.
Before it died, it produced the following output:
Launching GlassFish on Felix platform
ERROR: Error creating bundle cache. (java.lang.Exception: Unable to lock bundle cache: java.io.IOException: Input/output error)
java.lang.Exception: Unable to lock bundle cache: java.io.IOException: Input/output error
at org.apache.felix.framework.cache.BundleCache.<init>(BundleCache.java:176)
at org.apache.felix.framework.Felix.init(Felix.java:629)
at com.sun.enterprise.glassfish.bootstrap.osgi.OSGiFrameworkLauncher$1.run(OSGiFrameworkLauncher.java:88)
Exception in thread "Thread-1" java.lang.RuntimeException: org.osgi.framework.BundleException: Error creating bundle cache.
at com.sun.enterprise.glassfish.bootstrap.osgi.OSGiFrameworkLauncher$1.run(OSGiFrameworkLauncher.java:90)
Caused by: org.osgi.framework.BundleException: Error creating bundle cache.
at org.apache.felix.framework.Felix.init(Felix.java:634)
at com.sun.enterprise.glassfish.bootstrap.osgi.OSGiFrameworkLauncher$1.run(OSGiFrameworkLauncher.java:88)
Caused by: java.lang.Exception: Unable to lock bundle cache: java.io.IOException: Input/output error
at org.apache.felix.framework.cache.BundleCache.<init>(BundleCache.java:176)
at org.apache.felix.framework.Felix.init(Felix.java:629)
... 1 more
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.sun.enterprise.glassfish.bootstrap.GlassFishMain.main(GlassFishMain.java:97)
at com.sun.enterprise.glassfish.bootstrap.ASMain.main(ASMain.java:55)
Caused by: org.glassfish.embeddable.GlassFishException: java.lang.NullPointerException
at com.sun.enterprise.glassfish.bootstrap.osgi.OSGiGlassFishRuntimeBuilder.build(OSGiGlassFishRuntimeBuilder.java:164)
at org.glassfish.embeddable.GlassFishRuntime._bootstrap(GlassFishRuntime.java:157)
at org.glassfish.embeddable.GlassFishRuntime.bootstrap(GlassFishRuntime.java:110)
at com.sun.enterprise.glassfish.bootstrap.GlassFishMain$Launcher.launch(GlassFishMain.java:112)
... 6 more
Caused by: java.lang.NullPointerException
at com.sun.enterprise.glassfish.bootstrap.osgi.OSGiGlassFishRuntimeBuilder.newFramework(OSGiGlassFishRuntimeBuilder.java:230)
at com.sun.enterprise.glassfish.bootstrap.osgi.OSGiGlassFishRuntimeBuilder.build(OSGiGlassFishRuntimeBuilder.java:133)
... 9 more
Error stopping framework: java.lang.NullPointerException
java.lang.NullPointerException
at com.sun.enterprise.glassfish.bootstrap.GlassFishMain$Launcher$1.run(GlassFishMain.java:203)
Command start-domain failed.
I have tried to find a solution, removing the cache folder in the domain directory or changing access permissions but the problem keeps occuring and i cant start my domain.
any ideas how to fix this problem?

I had the same IO Error as in that stack after installing Glassfish and found out the following:
Glassfish 3.1.2 is using the felix library for OSGI stuff and this one wants to lock files using the core Java method java.nio.channels.FileChannel.tryLock(). This appears not to work when the file to be locked is on a filesystem residing on certain kinds of NAS and leads to an IO error after a long timeout.
Make sure to install critical parts or all of the Glassfish on local disks and this error will disappear.
The error can easily be reproduced running the following Java class:
import java.io.File;
import java.io.FileOutputStream;
import java.nio.channels.FileChannel;
public class TryLock {
/**
* #param args
*/
public static void main(String[] args) {
// name of a file is the only parameter
File lockFile = new File(args[0]);
FileChannel fc = null;
FileOutputStream fos = null;
try {
fos = new FileOutputStream(lockFile);
fc = fos.getChannel();
// This is the code that fails on some NAS (low-level operation?):
fc.tryLock();
} catch( Throwable th) {
th.printStackTrace();
}
System.out.println("Success");
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Spark application development on local cluster by IntelliJ - intellij-idea

Related

Flink 1.10 not connecting to minio (s3)

SparkSQL error: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Create Dataframe issue in Pyspark from Windows 10

Install Spark on existing Hadoop cluster (ISSUE with HIVE)

Glassfish startup failure

Categories

Resources