Pyspark Dataframe: Unable to Save As Hive Table - hive

I am trying to use 'SaveAsTable' on a dataframe - our hive metastore is in the external RDS and I am trying to store data in a S3 - but is failing with the following error:
py4j.protocol.Py4JJavaError: An error occurred while calling o87.saveAsTable.
: java.net.NoRouteToHostException: No Route to Host from ip-10-174-20-142/10.174.20.142 to ip-10-174-26-239.ec2.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
Here is the complete code and error:
[hbohra#ip-10-174-20-142 ~]$ pyspark --files /etc/hive/conf/hive-site.xml
Python 2.7.12 (default, Sep 1 2016, 22:14:00)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/01/30 21:05:39 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.2
/_/
Using Python version 2.7.12 (default, Sep 1 2016 22:14:00)
SparkSession available as 'spark'.
>>> from pyspark.sql.functions import lit
>>> df = sqlContext.read.json('s3://dl-rawdata-dev/reporting/impressions/platform/module/context/2017-01-27T00-00')
>>> asset_df = df.withColumn('cust_id', df['key']['cust_id']).withColumn('platform', lit('platform')).withColumn('context', lit('context')).withColumn('module', lit('context')).withColumn('impressions',df['metric']['impressions']).withColumn('orders', df['metric']['orders']).withColumn('subtotal', df['metric']['subtotal']).withColumn('adfee', df['metric']['adfee']).withColumn('clicks', df['metric']['clicks']).withColumn('views', df['metric']['views']).withColumn('report_date', lit('2017-01-27')).drop('key').drop('metric')
>>> asset_df.write.format('orc').partitionBy('report_date', 'platform').saveAsTable('hbohra.reporting', path='s3://dl-data-assets-dev/hbohra.db/reporting/', mode='overwrite')
17/01/30 21:06:07 WARN command.CreateDataSourceTableUtils: Persisting partitioned data source relation `hbohra`.`reporting` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. Input path(s):
s3://dl-data-assets-dev/hbohra.db/reporting
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 585, in saveAsTable
self._jwrite.saveAsTable(name)
File "/usr/lib/spark/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o87.saveAsTable.
: java.net.NoRouteToHostException: No Route to Host from ip-10-174-20-142/10.174.20.142 to ip-10-174-26-239.ec2.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.delete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:703)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply$mcV$sp(HiveExternalCatalog.scala:185)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:152)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:152)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:152)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:226)
at org.apache.spark.sql.execution.command.CreateDataSourceTableUtils$.createDataSourceTable(createDataSourceTables.scala:504)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:259)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:378)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 48 more

I think the issue here is that the table in the Hive metastore have been created using a different cluster.
We had the same issue, and the host mentioned in the stack trace was nowhere to be found in our EC2 dashboard.
This link explains that managed (not external) tables keep a reference to the filesystem uri (which contains the hostname that does not exist anymore).

Did you follow the wiki entry? Did it help?
update to anyone voting this down
The Hadoop network stack error messages were reworked a few years back so every low level network failure is caught and wrapped to add the bit Sun left out: src & dest hostnames and ports, and, at the same time, [we put in a wiki link. so that people who aren't familiar with things like NoRouteToHost have a nice list of common causes. People like myself add new ones when we find new ways to get the stack.
As a co-author of that patch, I really despair when people file bug reports etc about something not working, without actually following the wiki link. They are skipping a core piece of the self-help diagnostics and troubleshooting we've added to the OSS project.
As the wiki entries note, "your network, your problem". You are going to have to work out what's up, the (hostname, port) pairs are a good cue to work out the services not working, so it's time to use the low level tools like telnet and ping.
To close then: the exception includes diagnostics info and a link to a hadoop wiki entry on the failure. Follow the link

Related

Flink 1.10 not connecting to minio (s3)

I'm running locally a docker compose running flink and minio
When I try to connect to minio, I always get the following error:
caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
It seems that the plugin isn't loaded correctly.
my flink config (flink-conf.yaml):
state.backend: filesystem
s3.endpoint: http://minio:9000
s3.path.style.access: true
s3.access-key: minio
s3.secret-key: minio123
presto.s3.access-key: minio
presto.s3.secret-key: minio123
presto.s3.endpoint: http://minio:9000
presto.s3.path-style-access: true
I've copied the required plugin as following:
mkdir -p plugins/s3-fs-presto
cp opt/flink-s3-fs-presto-*.jar plugins/s3-fs-presto
Any suggestions?
Stack trace:
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7ae6657256719d8c32d76ba113fb35f0)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:820)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
... 21 more
Caused by: java.io.IOException: Error opening the Input Split s3://test/test.txt [0,3243]: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:450)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:995)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:446)
... 2 more
#

Apache tomcat unable to create a JDBC connection to Apache drill

I am using Mondrian OLAP FOR Analytics, We've been using Mysql but have decided to migrate to Apache drill.
Now I have configured Apache Drill and I am able to connect is to Apache drill over external clients like SQuirreL Client, but When I am trying to connect it Via Saiku, I get following stacktrace in Catalina.out
INFO: Deploying web application directory saiku
name:Lists
driver:mondrian.olap4j.MondrianOlap4jDriver
url:jdbc:mondrian:Jdbc=jdbc:drill:drillbit=localhost:31010;schema=Saiku.root;Catalog=../webapps/saiku/WEB-INF/classes/imi_India/INDIA.xml;JdbcDrivers=org.apache.drill.jdbc.Driver;
23:59:21,878 WARN [RolapUtil] Mondrian: Warning: JDBC driver sun.jdbc.odbc.JdbcOdbcDriver not found
23:59:21,879 WARN [RolapUtil] Mondrian: Warning: JDBC driver oracle.jdbc.OracleDriver not found
23:59:22,162 WARN [DrillMetrics] Removing old metric since name matched newly registered metric. Metric name: drill.allocator.root.used
23:59:22,162 WARN [DrillMetrics] Removing old metric since name matched newly registered metric. Metric name: drill.allocator.root.peak
23:59:25,246 WARN [ThreadLocalRandom] Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
mondrian.olap.MondrianException: Mondrian Error:Internal error: Error while creating SQL connection: Jdbc=jdbc:drill:drillbit=localhost:31010; JdbcUser=admin; JdbcPassword=admin
at mondrian.resource.MondrianResource$_Def0.ex(MondrianResource.java:972)
at mondrian.olap.Util.newInternal(Util.java:2403)
at mondrian.olap.Util.newError(Util.java:2419)
at mondrian.rolap.RolapConnection.<init>(RolapConnection.java:246)
at mondrian.rolap.RolapSchema.<init>(RolapSchema.java:188)
at mondrian.rolap.RolapSchema.<init>(RolapSchema.java:216)
at mondrian.rolap.RolapSchemaPool.get(RolapSchemaPool.java:214)
at mondrian.rolap.RolapSchemaPool.get(RolapSchemaPool.java:66)
at mondrian.rolap.RolapConnection.<init>(RolapConnection.java:160)
at mondrian.rolap.RolapConnection.<init>(RolapConnection.java:90)
at mondrian.olap.DriverManager.getConnection(DriverManager.java:112)
at mondrian.olap.DriverManager.getConnection(DriverManager.java:68)
at mondrian.olap4j.MondrianOlap4jConnection.<init>(MondrianOlap4jConnection.java:153)
at mondrian.olap4j.FactoryJdbc4Plus$AbstractConnection.<init>(FactoryJdbc4Plus.java:323)
at mondrian.olap4j.FactoryJdbc41Impl$MondrianOlap4jConnectionJdbc41.<init>(FactoryJdbc41Impl.java:118)
at mondrian.olap4j.FactoryJdbc41Impl.newConnection(FactoryJdbc41Impl.java:32)
at mondrian.olap4j.MondrianOlap4jDriver.connect(MondrianOlap4jDriver.java:134)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.saiku.datasources.connection.SaikuOlapConnection.connect(SaikuOlapConnection.java:75)
at org.saiku.datasources.connection.SaikuOlapConnection.connect(SaikuOlapConnection.java:46)
at org.saiku.datasources.connection.SaikuConnectionFactory.getConnection(SaikuConnectionFactory.java:29)
at org.saiku.web.impl.SecurityAwareConnectionManager.connect(SecurityAwareConnectionManager.java:284)
at org.saiku.web.impl.SecurityAwareConnectionManager.getInternalConnection(SecurityAwareConnectionManager.java:99)
at org.saiku.datasources.connection.AbstractConnectionManager.getConnection(AbstractConnectionManager.java:110)
at org.saiku.datasources.connection.AbstractConnectionManager.getAllConnections(AbstractConnectionManager.java:136)
at org.saiku.web.impl.SecurityAwareConnectionManager.init(SecurityAwareConnectionManager.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:291)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:288)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:190)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:574)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
at org.springframework.web.context.ContextLoader.createWebApplicationContext(ContextLoader.java:276)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:197)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:47)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:3972)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4467)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at org.apache.catalina.core.StandardService.start(StandardService.java:516)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Could not create a validated object, cause: null
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:104)
at mondrian.rolap.RolapConnection.<init>(RolapConnection.java:226)
... 66 more
Caused by: java.util.NoSuchElementException: Could not create a validated object, cause: null
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1008)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:96)
... 67 more
now while I have added dependency in pom.xml of the application
<dependency>
<groupId>org.apache.drill.exec</groupId>
<artifactId>drill-jdbc</artifactId>
<version>1.1.0</version>
</dependency>
and used following connection string
jdbc:drill:drillbit=localhost:31010;schema=Saiku.root
I also have bunch of parquet files in HDFS directory /dashboard/saiku3//.parquet
There are 17 such tables. I have created a storage engine called Saiku with root storage in /saiku3 and created Views for all the 17 hadoop directories. Those directories have been imported from mysql to HDFS via Sqoop as parquet.
I have been stuck here for hours and Couldn't find a solution, Am I doing something wrong here?
Thanks in advance.
"Not enough entropy" ? What version of Java is this? You seem to be using /dev/random iso. /dev/urandom. Anyway, you can change this with -Djava.security.egd if I'm not mistaken.
Check if you have drill-jdbc-all-1.8.0.jar in tomcat/lib

Create Dataframe issue in Pyspark from Windows 10

I am unable to execute the below command from pyspark windows
schemaPeople = spark.createDataFrame(people)
I have set HADOOP_HOME to winutils
I have provide 77 permission to C:/tmp/hive
Still I am getting the below error -
Py4JJavaError: An error occurred while calling o23.applySchemaToPythonRDD.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:189)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
I have gone through a lot of similar questions before posting this , appreciate any help here
I got this error a bunch when trying to setup Spark on windows using the winutils file. I had to setup Spark differently to get around this.
I ended up downloading the Hadoop binary for my version of spark and going from there. I documented the whole thing with a walkthrough if you are interested. Spark on windows
The gist is that the official Hadoop release from Apache does not include a Windows binary and compiling from sources can be tedious so really helpful people have made compiled distributions available. If you want to use Spark 2.0.2 download the binaries from steve loughran's github for 2.1.0 you can download from here from there you should be able to set it up as expected.

Install Spark on existing Hadoop cluster (ISSUE with HIVE)

I am trying to get a Spark/Shark cluster up but keep running into the same problem. I have followed the instructions on https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster and addressed Hive as stated.
Here are the details, any help would be great.
I already installed the following package:
Spark/Shark 1.0.0
Apache Hadoop 2.4.0
Apache Hive 0.13
Scala 2.9.3
Java 7
I configure ~/spark/conf/spark-env.sh as:
export HADOOP_HOME=/path/to/hadoop/
export HIVE_HOME=/path/to/hive/
export MASTER=spark://xxx.xxx.xxx.xxx:7077
export SPARK_HOME=/path/to/spark
export SPARK_MEM=4g
export HIVE_CONF_DIR=/path/to/hive/conf/
source $SPARK_HOME/conf/spark-env.sh
When start spark with "./spark-withinfo", I get the following errors:
-hiveconf hive.root.logger=INFO,console
Starting the Shark Command Line Client
14/07/07 16:26:57 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
14/07/07 16:26:57 [main]: WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO SessionState:
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:344)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:128)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2444)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:338)
... 2 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)
... 7 more
Caused by: java.lang.NoSuchFieldError: METASTOREINTERVAL
at org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:78)
at org.apache.hadoop.hive.metastore.RetryingRawStore.<init>(RetryingRawStore.java:60)
at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:71)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:413)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:401)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:439)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:325)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:285)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4102)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:121)
... 12 more
I guess Spark can not find some libs ton connect metastore in Hive, but I have been stacked here for a couple days and don't know how to solve it. BTW, I use MYSQL for hive metadata, and everything works well in hive.
Any help is appreciated. Thanks in advance.
You may need to add mysql connector jar file before you start spark...
In my case, I added mysql connector jar like below.
$SPARK_HOME/bin/compute-classpath.sh
CLASSPATH=$CLASSPATH:/opt/big/hive/lib/mysql-connector-java-5.1.25-bin.jar

Impossible to generate metamodel in spagoBI studio with hive(CDH5|CDH4)

While creating a JDBC connection between spagoBI studio and hive(CDH5/CDH4),This is my log :-
eclipse.buildId=unknown
java.version=1.7.0_45
java.vendor=Oracle Corporation
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=en_US
Command-line arguments: -os linux -ws gtk -arch x86_64 -consoleLog
This is a continuation of log file /home/cloudera/SpagoBIStudio_4.2.0_linux64/workspace/.metadata/.bak_0.log
Created Time: 2014-06-30 01:38:31.422
Error
Mon Jun 30 02:51:46 PDT 2014
Impossible to generate metamodel
java.lang.RuntimeException: Impossible to initialize the model
at it.eng.spagobi.meta.editor.multi.wizards.SpagoBIModelEditorWizard.createModel(SpagoBIModelEditorWizard.java:241)
at it.eng.spagobi.meta.editor.multi.wizards.SpagoBIModelEditorWizard.performFinish(SpagoBIModelEditorWizard.java:217)
at org.eclipse.jface.wizard.WizardDialog.finishPressed(WizardDialog.java:811)
at org.eclipse.jface.wizard.WizardDialog.buttonPressed(WizardDialog.java:430)
at org.eclipse.jface.dialogs.Dialog$2.widgetSelected(Dialog.java:624)
at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:234)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3540)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3161)
at org.eclipse.jface.window.Window.runEventLoop(Window.java:825)
at org.eclipse.jface.window.Window.open(Window.java:801)
at it.eng.spagobi.studio.core.views.actionProvider.ResourceNavigatorActionProvider$13.run(ResourceNavigatorActionProvider.java:411)
at org.eclipse.jface.action.Action.runWithEvent(Action.java:498)
at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection(ActionContributionItem.java:584)
at org.eclipse.jface.action.ActionContributionItem.access$2(ActionContributionItem.java:501)
at org.eclipse.jface.action.ActionContributionItem$5.handleEvent(ActionContributionItem.java:411)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3540)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3161)
at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2640)
at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2604)
at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2438)
at org.eclipse.ui.internal.Workbench$7.run(Workbench.java:671)
at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:664)
at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:115)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:369)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:620)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:575)
at org.eclipse.equinox.launcher.Main.run(Main.java:1408)
at org.eclipse.equinox.launcher.Main.main(Main.java:1384)
Caused by: java.lang.RuntimeException: Impossible to initialize the physical model
at it.eng.spagobi.meta.editor.multi.wizards.SpagoBIModelEditorWizard.createPhysicalModel(SpagoBIModelEditorWizard.java:340)
at it.eng.spagobi.meta.editor.multi.wizards.SpagoBIModelEditorWizard.createModel(SpagoBIModelEditorWizard.java:237)
... 41 more
Caused by: java.lang.reflect.InvocationTargetException
at org.eclipse.jface.operation.ModalContext.run(ModalContext.java:421)
at org.eclipse.jface.dialogs.ProgressMonitorDialog.run(ProgressMonitorDialog.java:507)
at it.eng.spagobi.meta.editor.multi.wizards.SpagoBIModelEditorWizard.createPhysicalModel(SpagoBIModelEditorWizard.java:295)
... 42 more
Caused by: java.lang.RuntimeException: Impossible to initialize physical model
at it.eng.spagobi.meta.initializer.PhysicalModelInitializer.initialize(PhysicalModelInitializer.java:121)
at it.eng.spagobi.meta.editor.multi.wizards.SpagoBIModelEditorWizard$1.run(SpagoBIModelEditorWizard.java:321)
at org.eclipse.jface.operation.ModalContext$ModalContextThread.run(ModalContext.java:121)
Caused by: java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveDatabaseMetaData.getIdentifierQuoteString(HiveDatabaseMetaData.java:342)
at it.eng.spagobi.meta.initializer.PhysicalModelInitializer.initialize(PhysicalModelInitializer.java:112)
... 2 more
some of the related questions hive methos not supported ,"java.sql.SQLException: Method not supported which says
Your original error results from using Cloudera's Hive driver which does not implement many JDBC API methods that PDI needs to function properly. That's why we have our own version of the hive driver in the cdh4 folder (called hive-0.7.0-pentaho-1.0.2 or something like that). Simply put, there should be no JARs copied from your cluster to your PDI client, the cdh4 folder already contains the correct versions of all necessary JARs.
but I didnt find any spagoBI hive driver for CDH5/CDH4.I am able to connect to hive but while accessing table getting above error in studio ,I am able to access the table on spagoBI server.any help,thanks.
You have to use Cloudera's .jar files in order to be able to connect to CDH. Have a look here and put all of the .jar files in the CLASSPATH. If running SpagoBI put it in /lib folder of the SpagoBI (Tomcat) server.