Apache Drill - Hive Integration: Drill Not listing Tables - hive

I have been trying to integrate Apache Drill with Hive using Hive Storage Plugin configuration. I configured the storage plugin with all the necessary properties required. On Drill Shell, I can view the Hive Databases using:
Show Databases;
But when i try to list tables using:
Show Tables;
I get no results (No List of Tables).
Below are the steps i have followed from Apache Drill documentation and other sources:
I created a Drill Distributed Cluster by updating drill-override.conf with same cluster id on all nodes along with ZK IP with Port and then invoking drillbit.sh on each node.
Started Drill shell using drill-conf, Ensured that the Hive metastore service is active as well.
Below is configuration made in Hive Storage Plugin for Drill (from its Web-UI):
{
"type": "hive",
"configProps": {
"hive.metastore.uris": "thrift://node02.cluster.com:9083",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://node02.cluster.com/hive",
"hive.metastore.warehouse.dir": "/apps/hive/warehouse",
"fs.default.name": "hdfs://node01.cluster.com:8020",
"hive.metastore.sasl.enabled": "false"
},
"enabled": true
}
All the properties are set after referring to hive-site.xml
So, That's what all others have done to integrate Drill with Hive. Am i missing something here?
Regarding Versions-
Drill: 1.14 ,Hive : 1.2 (Hive Metastore: MySQL)
We also have Hive Server2 on the same nodes, is that causing any issues?
I just want to integrate Drill with Hive 1.2, am i doing it right?
Any pointers will be helpful, have spent nearly 2 days to get it right.
Thanks for your time.

Starting from Drill 1.13 version Drill leverages Hive client 2.3.2 version.
It is recommended to use Hive 2.3 version to avoid unpredictable issues.
Regarding your setup, please remove all configProps except hive.metastore.uris.
The other configs can be default (it is in HiveConf.java) or can be specified in your hive-site.xml.
Also in case of empty result after usage Show Tables; even after executing use hive, check for errors in Drill's log files. If some error is there, you can create the Jira ticket to improve the output from Drill to reflect that issue.

Related

How to see table definition or download table script in apache ignite 2.8.1

I am using ignite 2.8.1 and trying to see table definition from ignite web console by using command like desc table_name. But it does not work. Did a detail study but did not find any commands or any approach which helps to download table creation script or see the table definition.
Please let me know if there is any approach by which we can download table script or see table definition in ignite (preferably from ignite web console)
I'm not sure if WebConsole can do the trick, this product is not supported anymore. But you can achieve it using any DB manager with JDBC support.
For example, here is the screenshot from DBeaber that has an embedded template for Apache Ignite JDBC drier. Check this example on how to set it up (could be outdated though).
.

How to setup cartridge applications replication between few Tarantool routers?

How to setup cartridge applications replication between few Tarantool routers on different phisical/virtual servers?
Same as any other nodes - you should press Join existing replicaset instead of Create new replicaset when configuring a new router in admin UI.
If you' experiencing any particular issues, please specify them too.
BTW, the screenshot you posted, seems to contain the answer already.

PDI connect to MongoDB Atlas

Using Pentaho Data Integration 9 community edition trying to connect to mongodb atlas but without success.
Tried the url mongodb provides:
mongodb+srv://<username>:<password>#something.XYZ.mongodb.net/<dbname>?retryWrites=true&w=majority
Which gives me the following error:
org.pentaho.mongo.MongoDbException: Malformed host spec: mongodb+srv://<username>:<password>#something.XYZ.mongodb.net/<dbname>?retryWrites=true&w=majority
I saw a tips to change to old connection string, something similar to the following:
mongodb://user:password#cluster0-shard-00-00-wuhae.mongodb.net:27017,cluster0-shard-00-01-wuhae.mongodb.net:27017,cluster0-shard-00-02-wuhae.mongodb.net:27017/shop?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin&retryWrites=true
but also without success.
Any ideas?
Need to specify the replicaset instead since it doesnt seem to support the mongodb+srv syntax.
So in my case I had to add the following:
test-shard-00-01.XYZ.mongodb.net,test-shard-00-00.XYZ.mongodb.net,test-shard-00-02.XYZ.mongodb.net

Superset with Apache Spark on Hive

I have Apache Superset installed via Docker on my local machine. I have a separate production 20 Node Spark cluster with Hive as the Meta-Store. I want my SuperSet to be able to connect to Hive and run queries via Spark-SQL.
For connecting to Hive, I tried the following
**Add Database --> SQLAlchemy URI ***
hive://hive#<hostname>:10000/default
but it is giving some error when I test connection. I believe I have to do some tunneling, but I am not sure how.
I have the Hive thrift server as well.
Please let me know how to proceed.
What is the error you are receiving? Although the docs do not mention this, the best way to provide the connection URL is in the following format :
hive://<url>/default?auth=NONE ( when there is no security )
hive://<url>/default?auth=KERBEROS
hive://<url>/default?auth=LDAP
first you should connect the 2 containers together.
lets say you have the container_superset that runs superset and container_spark running spark.
run : docker network ls # display containers and its network
select the name of the superset network (should be something like superset_default )
run : docker run --network="superset_default" --name=NameTheConatinerHere --publish port1:port2 imageName
---> port1:port2 is the port mapping and imageName is the image of spak

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.
My Environment:
Scala: 2.10.4
Spark: 1.5.1
JDK: 1.8.40
OS: CentOS Linux release 7.1.1503 (Core)
The local cluster:
Spark Master: spark://local1:7077
The remote cluster:
Spark Master: spark://remote1:7077
I want to finish this:
Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.
But I got some problem:
When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.
But When I use the remote cluster, I got a warning log:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
It is sufficient resources not sufficient memory!
And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.
I want to know:
Is it possible to submit codes from IntelliJ IDEA to remote cluster?
If it's OK, does it need configuration?
What are the possible reasons that can cause my problem?
How can I handle this problem?
Thanks a lot!
Update
There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.
I want to know whether the security policy or fireworks can cause this?
Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.
My answer here is about a portion of your post: specifically the
TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources
The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the
$SPARK_HOME/conf/spark-defaults.conf
were not properly matching the parameters required for your task on the existing cluster. You may need to update:
spark.driver.memory 4g
spark.executor.memory 8g
spark.executor.cores 8
You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.