Use saiku with apache hive

Use saiku with apache hive - hive

have you ever used Saiku to make data analysis on BigData Platform (Hadoop)? My recent work need to integrate some legacy BI tools with Hadoop to support common OLAP queries on HDFS/HBase.
I found a solution implemented with Phoenix & Hbase here, which bridges saiku and Hbase with SQL Dialect in Phoenix and it worked. However, this method can only handle data within HBase through HBase-API. It cannot boost any Map-Reduce style job when building the data cube. I prefer some more BigData compatible alternatives, like through Apache Hive.
Saiku is based on Mondrian. My version of Saiku use Mondrian-4.0.0.0-SNAPSHOT.jar, which I found can already work well with Hive. And I found that there are many Hive-0.13 jars within Saiku's lib directory. So I thought a simple config of hive2 datasource can work. I started an hiveserver2 in the namenode of my HDFS cluster and add following datasource into saiku.
Name: hive2
Connection Type: Mondrian
URL: jdbc:hive2://localhost:10000/default
Schema: /datasources/movie.xml
Jdbc Driver: org.apache.hive.jdbc.HiveDriver
Username: ubuntu
Password: XXXX
The saiku indeed successfully connected to the hiveserver2 but failed to load the datasource. I found following error in the saiku log:
name:hive2
driver:mondrian.olap4j.MondrianOlap4jDriver
url:jdbc:mondrian:Jdbc=jdbc:hive2://localhost:10000/default;Catalog=mondrian:///datasources/movie.xml;JdbcDrivers=org.apache.hive.jdbc.HiveDriver
12:41:48,110 WARN [RolapSchema] Model is in legacy format
12:41:50,464 ERROR [SecurityAwareConnectionManager] Error connecting: hive2
mondrian.olap.MondrianException: Mondrian Error:Internal error: while quoting identifier
at mondrian.resource.MondrianResource$_Def0.ex(MondrianResource.java:992)
at mondrian.olap.Util.newInternal(Util.java:2543)
at mondrian.spi.impl.JdbcDialectImpl.deduceIdentifierQuoteString(JdbcDialectImpl.java:245)
at mondrian.spi.impl.JdbcDialectImpl.<init>(JdbcDialectImpl.java:146)
at mondrian.spi.DialectManager$DialectManagerImpl$1.createDialect(DialectManager.java:210)
...
Caused by: java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveDatabaseMetaData.getIdentifierQuoteString(HiveDatabaseMetaData.java:342)
at org.apache.commons.dbcp.DelegatingDatabaseMetaData.getIdentifierQuoteString(DelegatingDatabaseMetaData.java:306)
at mondrian.spi.impl.JdbcDialectImpl.deduceIdentifierQuoteString(JdbcDialectImpl.java:238)
... 99 more
I looked into the hive 0.13 source. I found the getIdentifierQuoteString isn't implemented yet and simply throw an exception.
public String getIdentifierQuoteString() throws SQLException {
throw new SQLException("Method not supported");
}
Till now I'm puzzled. Is it practical to use the saiku with a hive? It has Hive 0.13 jars in its lib dir but cannot load a simple hive datasource? Should I simply modify the source of hive. I found in the newly released Hive 1.0. This function is implemented by simple return an empty string.
Does anyone has good idea? Thanks!

Related

facing hive query error on show databases query - Unable to instantiate

I initialized hive and its worked, later I gave SHOW DATABASES command, but I got below error.
I am using mysql for metadata.
adminn#master:~$ hive
Hive Session ID = e9e9145a-0c38-4007-a9af-ded86a4226ea
Logging initialized using configuration in jar:file:/home/adminn/apache-hive-3.1.1-bin/lib/hive-common-3.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

I added the below property to the hive-site.xml file, and this resolved the issue.
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

Error when trying to start HiveServer2: NullPointerException in ThriftBinaryCLIService

When I start hiveserver2 with the following command:
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console
I receive the following error before the program exits:
2022-09-12T14:46:53,713 ERROR [Thrift Server] transport.TServerSocket: Could not set socket timeout.
java.net.SocketException: Socket is closed
at java.net.ServerSocket.setSoTimeout(ServerSocket.java:666) ~[?:1.8.0_292]
at org.apache.thrift.transport.TServerSocket.listen(TServerSocket.java:117) ~[hive-exec-3.1.3.jar:3.1.3]
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:146) ~[hive-exec-3.1.3.jar:3.1.3]
at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:169) ~[hive-service-3.1.3.jar:3.1.3]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Hive Session ID = 56c28481-2b0c-4712-808d-ff7ccf31b543
Hive Session ID = 9771e219-095c-4524-b34a-b8e05c335fc0
2022-09-12T14:48:03,871 ERROR [Thrift Server] thrift.ThriftCLIService: Exception caught by ThriftBinaryCLIService. Exiting.
java.lang.NullPointerException: null
at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:169) ~[hive-service-3.1.3.jar:3.1.3]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Here is a brief explanation of my setup:
I am using vagrant and VirtualBox to create a "virtual" cluster.
This is very loosely (since the repository hasn't been updated in a while, I have had to make many changes to get it to work) based on this repository - https://github.com/njvijay/vagrant-jilla-hadoop
I have created 5 nodes (1 name node and 4 data nodes). The namenode also contains yarnm hive, pig, spark, mysql, python etc.
I am using Ubuntu 14.04.6, Hadoop 2.10.1, Hive 3.1.3, Spark 3.3.0 and Pig 0.15

It seems that there may be some compatibility issue between Hadoop 2 and Spark 3. I was able to resolve the error after updating Hadoop, Hive and Spark to the latest versions.

java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails Exception

I am trying to read a table that is on a azure blob storage via pyspark and the below exception is raised even though I have added the below jars in the pyspark --jars.
azure-storage-2.0.0.jar
hadoop-azure-2.7.0.jar
Exception:
py4j.protocol.Py4JJavaError: An error occurred while calling o38.showString.
: java.lang.NoClassDefFoundError: com/microsoft/azure/storage/blob/BlobListingDetails
Caused by: java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails
Any idea as which specific jar needs to be added to resolve the issue and read azure tables in spark?

My suggestion is that as below.
Please download the jar files of the newest version of Azure Storage Java Client & Hadoop Azure Support instead of their old version.
Check whether the path of these jars were added into the SPARK_CLASSPATH environment variable in the conf/spark-env file, or you can programmatically add the jar path via code SparkContext.addJar("Path to jar created from maven [hint: mvn package]").
Hope it helps.

Invalid operation: relation "fs_fsentry" already exists;

I have an icCube setup, icCubeRepository.xml, to use Postgres as JCR repository. When starting icCube I get the following error
java.sql.SQLException: Amazon Invalid operation: relation "fs_fsentry" already exists;
Looks as the driver used in the JCR is redshift instead of the expected Postgres.

This was hapenning because the postgres driver was not added to the classpath in bin/iccube.sh file . Adding postgres jdbc driver to the classpath already available in lib folder of iccube works.

Apache hive error Merging of credentials not supported in this version of hadoop

I am using hadoop 1.2.1, hbase 0.94.14 and hive 1.0.0. There are three datanodes in my clsuter and three regionservers also. I have to import some data from hbase to hive. I have configured hive successfully but when I ran a command to count no. of rows in hive table, its gives following
ERROR [main]: exec.Task (SessionState.java:printError(833)) - Job Submission failed with exception 'java.lang.RuntimeException(java.io.IOException: Merging of credentials not supported in this version of hadoop)'
java.lang.RuntimeException: java.io.IOException: Merging of credentials not supported in this version of hadoop
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureJobConf(HBaseStorageHandler.java:485)
at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobConf(PlanUtils.java:856)
at org.apache.hadoop.hive.ql.plan.MapWork.configureJobConf(MapWork.java:540)
I have changed version of hive to 0.14 but same error.
What is the solution of it?
Note: I cannot upgrade hadoop.

Although your version of Hive is current, this is not the source of your error. You need to upgrade your Hadoop version, to 2.4.0 or above.
The error originates from here https://github.com/apache/hive/blob/3b6825b5b61e943e8e41743f5cbf6d640e0ebdf5/shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java#L579

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Use saiku with apache hive - hive

Related

facing hive query error on show databases query - Unable to instantiate

Error when trying to start HiveServer2: NullPointerException in ThriftBinaryCLIService

java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails Exception

Invalid operation: relation "fs_fsentry" already exists;

Apache hive error Merging of credentials not supported in this version of hadoop

Categories

Resources