What is the difference between Apache Impala and Cloudera Impala? - impala

I'm a little bit confused by Apache Impala and Cloudera Impala.
Is there any big difference between them?
Thanks,
Xianyi Ye

Cloudera donated Impala to Apache at Nov, 2015. They are just different names of the same project. If you decide to use Impala, you should try the Apache one since it's being actively maintained.

Related

connecting to hive to execute queries wih kerberos

I am trying to connect to hive databases with a client, I have tried using DBeaver and downloaded the hive driver, but after that I have noticed that there is a kerbero's instance in the middle, and it seems that the dbeaver driver doesn't supoort kerberos.
¿There is some windows client suitable to query hive databases easy to plug in, considering the kerbero's instance?
Thanks in advance.

Configure Hive Metastore for presto and query data from s3 and apache kudu

I am pretty new to Presto and hive. In one of our application we want to use presto to query data from apache kudu and aws s3. As per my knowledge presto has its own catalog(meta) service, but we want to configure hive metastore(without hadoop and hive) so that in future other application(e.g spark) can use hive metastore to query data from Kudu and s3. I have been using latest version of presto and kudu.
Could someone help me to configure this system?
Thanks and regards

HIVE - How it works without a meta store?

I installed Hive 1.2.1 and configured to work with Hadoop 2.7.
But I didn't setup meta store for Hive with Derby or MySQL.
And also I don't have a copy of hive-site.xml under $HIVE_HOME/conf.
My question is how still I am able to create database & tables in Hive. Where all these meta data is stored?
Appreciate your insight.
Thanks in advance.
By default Hive uses Derby and starts metastore (based on derby) in embedded mode. The metastore and hiveserver runs in the same process. I believe hive initializes the metastore for you in embedded mode.
http://www.cloudera.com/documentation/archive/cdh/4-x/4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

SQL Server Connected to Hadoop - Thoughts and Challenges of Implementation

I wanted to broach the issue of SQL Server's Hadoop distribution called HDInsight.
Given that there is a connection provided to Hadoop, does anyone have experience with HDInsight and particularly a comparison between the Hadoop / SQL Server connector and HDIinsight / SQL Server from a real life DTP scenario or personal 1 node installation?
http://sqlmag.com/blog/use-ssis-etl-hadoop
http://www.microsoft.com/en-us/download/details.aspx?id=27584
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/big-data.aspx
HDInsight is the distribution of Hadoop that Microsoft maintains for use in Azure. You could roughly compare this to Amazon Elastic MapReduce. They both serve the purpose of being a hosted Hadoop service that has almost no management overhead.
The Hortonworks Data Platform for Windows contains the open source changes that Hortonworks and Microsoft have collaborated on to make Hadoop run well on Windows. HDP isn't HDInsight.
In short - you don't need to use HDInsight if you want to run Hadoop in a Windows environment.
While I can't speak directly to using HDInsight and moving data back and forth between SQL Server, I've done implemented a data processing solution using SQL Server, Hadoop, and Elastic MapReduce. Barring some data quality issues and BULK INSERT weirdness, the process was painless.
Finally, you ask "do we really want to run Hadoop size datasets on Windows servers?" - Windows performs well and has solid tooling around it. I've been somewhat skeptical about running Hadoop and other Java platform software on Windows because of legacy Java I/O issues and a lack of community support, not because of any performance issues.
The largest issues that Windows companies will find moving to Hadoop is there will be limited support in community forums and channels when the problem becomes a Hadoop + Windows issue. It's very easy for people to throw their hands up and say "Nope, not helping out, don't have Windows." With time and adoption, this problem goes away. Besides, nothing says you have to finish on the same platform you start with. You could easily deploy with HDP on Windows and move to HDP on Linux at a later date.
I have put together some SQL Server and Hadoop basics for DBAs that should be helpful.

Accessing Hive through web browser using thrift php

I ve hive installed in my ubuntu. Installed PHP5 and apache2 server as well.
Started thrift server using hive --service hiveserver .
Querying hive tables from php file in Command line interface(CLI) giving me expected results.
but from the web browser(http://localhost:10000/) i'm not able to invoke hive.
Tried googling the problem couldn't find it. please help me the solution.
Hive thrift server just provide a thrift service for hive query but not web service.
I think what you need is HWI (hive web interface). I recommend this project. We use it in production environment.