how to setup spark to use with logi analytics? - apache-spark-sql

I am Trying to configure Spark to use with Logi Analytics. So that I could use Spark SQL to query data and visualize it in logi analytics. Any suggestion on connecting Apache spark with logi Analytics will be helpful to me.

The Spark SQL is not specifically supported by Logi Info, but you may be able to use an ODBC driver and Connection element to connect to the Spark Thrift Server.
First of all, you will need to download ODBC driver SimbaSpark/Cdata. Then you should configure the DSN of the ODBC driver.
After that in Logi connections element add ODBC connection element and set connection string attribute to the name of your DSN (e.g. 'DSN=NameOfYourDSN;').

Related

connecting to hive to execute queries wih kerberos

I am trying to connect to hive databases with a client, I have tried using DBeaver and downloaded the hive driver, but after that I have noticed that there is a kerbero's instance in the middle, and it seems that the dbeaver driver doesn't supoort kerberos.
¿There is some windows client suitable to query hive databases easy to plug in, considering the kerbero's instance?
Thanks in advance.

How to configure hive-site.xml for ODBC connection

I have configured an ODBC connection in Windows ODBC Data Source Administrator and test successfully.
I want to connect to HIVE database (the ODBC connection) using Spark. As my understanding, the connection should be configured in hive-site.xml and put to folder /config. But how to configure hive-site.xml using ODBC connection? Can anyone provide an example for hive-site.xml?
I am using cloudera ODBC driver.
Environment: Windows, spark 2.3.3

How to connect presto using logstash to fetch data from Cassandra DB

I am trying to connect presto using logstash to pull data from Cassandra DB. I am looking for presto-cassandra connectivity using logstash.
I tried few jdbc drivers but it could not create the connectivity.
Could you please let us know if connection with Presto is possible? If yes, could you please help with correct JDBC driver and online resources.

Can not connect to Spark Thrift Server using JDBC, keeps using Hive

I am using Azure HDInsight and want to connect to Thrift Server using JDBC in similar way as described here: Thrift JDBC/ODBC Server.
However it always connects to Hive and not Spark Thrift Server. While they both look similar and I can query data, I want to exploit Spark execution engine as I am using mainly Spark2 and sometimes need JDBC connection. Spark engine is also probably faster than Hive/TEZ.  
Connection string looks like this:
jdbc:hive2://hdinsight-name.azurehdinsight.net:443/default;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2
Drivers tried:
1. maven:/org.spark-project.hive:hive-jdbc:1.2.1.spark2
2. maven:/org.apache.hive:hive-jdbc
Update: Looks like Spark Thrift Server is not exposed to public: Ports used in HDInsight
I was able to connect to Spark Thrift Server from JDBC client with following workaround.
Spark Thrift Server is running on port 10002, which is not publicly accessible as documented here in Azure HDInsight docs. Thus, here is alternative way to connect to Spark SQL from local JDBC client.
Background:
I connected to cluster head node via SSH.
ssh user#cluster-name-ssh.azurehdinsight.net
From here, I was able to connect to Spark Thrift Server using Beeline client.
beeline -u 'jdbc:hive2://localhost:10002/;transportMode=http'
With Beeline, I can run SQL queries using Spark engine.
Solution:
So I set up SSH port forwarding in my local machine (forward local port 10002 to cluster head node)
ssh -L 10002:localhost:10002 user#cluster-name-ssh.azurehdinsight.net
Now, I can use this port in JDBC client to connect to Spark SQL.
jdbc:hive2://localhost:10002/;transportMode=http
With that, you can use Spark SQL from your local JDBC client.

How to load SQL data into the Hortonworks?

I have Installed Hortonworks SandBox in my pc. also tried with a CSV file and its getting in a table structerd manner its OK (Hive + Hadoop), nw I want to migrate my current SQL Databse into Sandbox (MS SQL 2008 r2).How I will do this? Also want to connect to my project (VS 2010 C#).
Is it possible to connect through ODBC?
I Heard sqoop is using for transferring data from SQL to Hadoop so how I can do this migration with sqoop?
You could write your own job to migrate the data. But Sqoop would be more convenient. To do that you have to download Sqoop and the appropriate connector, Microsoft SQL Server Connector for Apache Hadoop in your case. You can download it from here.Please go through the Sqoop user guide. It contains all the information in proper detail.
And Hive does support ODBC. You can find more on this at this page.
I wrote down the steps you need to go through in the Hortonworks Sandbox to install the JDBC driver and get it to work: http://hortonworks.com/community/forums/topic/import-microsoft-sql-data-into-sandbox/
To connect to Hadoop in your C# project you can use the Hortonworks Hive ODBC driver from http://hortonworks.com/thankyou-hdp13/#addon-table. Read the PDF (which is also on that page) to see how it works (I used Hive Server Type 2 with user name sandbox)