Connect Tableau to plain Hbase - hive

Is there any way to connect Tableau Desktop to plain Apache Hbase or plain Hive ?
I could only find Tableau drivers for Hortonworks/ MapR/ Cloudhera etc.

Install drivers in desktop installed machine.
You can't directly connect to hbase table via tableau you need to connect to hive table and hive internally mapped to hbase table.
follow links http://thinkonhadoop.blogspot.in/2014/01/access-hbase-table-with-tableau-desktop.html
http://grokbase.com/t/cloudera/cdh-user/141px9aqg5/hbase-connectivity-with-tableau

Our ODBC Driver for HBase will allow you to connect to your HBase data from Tableau. The driver is currently in Beta, so you can download it for free from here.
You can read about setting up the connection in our Knowledge Base, but in short, you'll need to:
Create/configure a DSN from the ODBC Driver (set the server address and port)
Click through the Connect to Data options to find Other Database (ODBC) and select the DSN you configured
Select CData as the database
Enter a Table name (or leave the Table field blank and click search to see a list of Tables).
Once you have access to the tables, you can work with them exactly as you would any other table in Tableau (drag the table to the join area, manipulate Measures and Dimensions to view your data, etc.). If you have any questions, I or our Support Team will be happy to help.

Tableau internally use SQL to fetch raw data, so theoretically it can support any data source comes with a SQL interface, such as Hive.
Plain Hbase does not provide a SQL interface, so you must add an intermediate layer to translate SQL query into Hbase query. The layer could be an ODBC Driver, or other open source projects such as Apache Drill.

Related

Connecting to Cloud SQL server instance from BigQuery

There is an option to connect a Cloud mySQL instance from BigQuery. I just wanted to know how we can connect a Cloud SQL Server instance to BigQuery.
SQL Server:
There are a bunch of third-party extensions/tools that provide this service. One of them is SSIS Data Flow Source & Destination for Google BigQuery, which is Visual Studio extension that connects SQL Server with Google BigQuery data through SSIS Workflows.:
https://www.cdata.com/drivers/bigquery/ssis/
https://marketplace.visualstudio.com/items?itemName=CDATASOFTWARE.SSISDataFlowSourceDestinationforGoogleBigQuery
In regards to using SQL Server Integration Services to load the data from the on-premises SQL Server to BigQuery, you can take a look for this site. You can also perform ETL from a relational database into BigQuery using Cloud Dataflow, the official documentation details how it can be done, you might need to use Cloud Storage as an intermediate data sink.
Cloud SQL:
BigQuery allows to query data from Cloud SQL by using federated query. The connection must be created within the same project where your Cloud SQL instance is located. If you want to query your data stored in your Cloud SQL instance from BigQuery located in another project, please follow the steps listed below:
Enable the BigQuery API and the BigQuery connection API within your project.
Create a connection to your Cloud SQL instance within the project by following this documentation.
Once you have created the connection, please locate and select it within BigQuery.
Click on the SHARE CONNECTION button and grant permissions to the users that will be use that connection. Please note that the BigQuery Connection User role is the only needed to use a shared connection.
Additionally, please notice that the "Cloud SQL federated queries" feature is in a Beta stage and might change or have limited support (is no available for certain regions, in which case, it is required to use one the supported options mentioned here). Please remember, that to use Cloud SQL Federated queries in BigQuery, the intances need to have a public IP.
If you are limited e.g. by region, one good option might be exporting the data from CloudSQL to Storage as a CSV, and then load it into BigQuery. If you need, it is possible to automate this process using Cloud Composer, refer to this article.
Other approach is to extract information from Cloud SQL (with exports) and import it into BigQuery through load jobs, or streaming inserts.
I hope you find the above pieces of information useful.
It is possible, but be warned the feature is currently Beta
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries

Accessing Spark Tables (Parquet on ADLS) using Presto from a Local Linux Machine

Would like to know if we can access the Spark External tables with MS SQL as metastore and external files on Azure Data lake using Hive Metastore service (Presto) from a Linux Machine.
We are trying to access the spark delta tables having parquet files on ADLS through Presto. Below is the scenario. I would like to know if there is a possible way to achieve this. We are doing this as a POC only and we believe knowing the answer will take us to next step.
Our central data repository is all spark Delta tables created by many pipelines. The data is stored in Parquet format. MS SQL is the external metastore. Data in these spark tables are used by other teams/applications and they would like to access these data through Presto.
We learnt that Presto uses the metastore service of Hive to access the hive table details. We tried accessing the tables from Hive (thinking if this works Presto also works). But we find problems with different filesystems. We have setup Hadoop and Hive in one single Linux machine. The versions are 3.1.2 & 3.1.1. The hive service is connecting to the SQL metastore and showing the results of few basic commands. However when it comes to accessing the actual data stored in parquet in a ADLS path, it fails saying File system exception. I understand this problem that it is an interaction of many file systems like (ADFS, HDFS, linux) but not finding any blogs that guides us. Kindly help.
Hive Show Database command:
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> SHOW DATABASES;
OK
7nowrtpocqa
c360
default
digital
Hive Listing tables:
hive> SHOW TABLES;
OK
amzn_order_details
amzn_order_items
amzn_product_details
Query data from Orders table:
hive> select * from dlvry_orders limit 3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "dbfs"
Time taken: 3.37 seconds
How can I make my setup access the Datalake files and bring in the data?
I believe my metastore should have the exact full path of the ADLS where files are stored. If it is, how will my Hive/Hadoop in Linux will understand the path.
If it can recognize the path also, in which configuraion file should I give the credentials for accessing the data lake (in any .XML)
How can the different file systems interact
Kindly help. Thanks for all the inputs.

How to connect GCP SQL instance by using Bigquery?

I need help for something, I'm new in Bigquery..
I want to see my SQL tables over Bigquery connection at the Data Studio. However I couldn't figure it out how to connect GCP mySQL instance from Bigquery.
I tried to change region/location of my SQL instance.(I think it helped a little) And using this query at the new BigQuery web UI.. but I receive below error.
SELECT *
FROM
myinstanceid.my_database_01.TABLE_YS ys
Error: ((Not found: Project myinstanceid))
You cannot directly connect MySQL databases the only external data sources supported are: GCS files, Cloud Bigtable, Google Drive (even directly google spreadsheets).
To run analytics on data inside your MySQL DB you will need to export data into a supported external data source or even better into a BigQuery native table.

Connect a MSSQL created database with PostgreSQL

I use Mac and PostgreSQL is the choice for db management. I can not install MSSQL. There's a db which is created and managed by MSSQL.
I must not copy the entire data via a script, to my database (because of using real time data)
My only option is, connecting that MSSQL created db with PostgreSQL.
Is it possible? If yes, how? Will there be any relevant limitations for my queries?
Thanks.
Additional details:
I'll make selections, calculations, statistically.
I won't modify the existing data.
The feature, which allows you to connect to a different database from within PostgreSQL itself, is called foreign data wrapper.
Here, there is a list of available foreign data wrappers, but mssql is not included. But ODBC is, so (in theory) if you install odbc_fdw, you can access foreign mssql tables in your PostgreSQL instance.
There is also a tds_fdw for SQL Server which uses FreeTDS for the connection rather than ODBC

Access remote Oracle from SQL Server in query only? (Crystal Reports command)

We have a peculiar challenge with overly-strict use restrictions, and I'm trying to find a way to accomplish it.
We have data in two locations, on different platforms. We are extracting data from application tables, and we aren't allowed to create our own views/procs/etc.
Is there a way to run a query into a remote Oracle DB from within an SQL Server query?
To further complicate issues, we have to make it run through a Crystal Reports database command.
We have ODBC connections defined at the BOXI platform (using Oracle ODBC for the Oracle connection).
I am hoping to use the SQL WITH clause to build in-memory views (for lack of a better term) to:
Initially extract some circuit IDs from the local SQL Server system,
Extract ticket numbers based on those circuit IDs, from the remote Oracle system,
Extract the core of our data from the SQL Server system, joined with the ticket data and return that to Crystal as a result dataset.
If we had our own space, this would be trivial.
BOXI doesn't let us do multiple-server universes.
You would need to some way to write and store connection strings, but it doesn't sound like you're able to do this.
If you can't make ANY changes to either source system, you might try creating an MS Access DB and using linked tables to bring in all the information you need and have your Crystal Report run from that. You would then only need to make sure that the machine you're running this on has the ODBC drivers to be able to connect, which are simple enough to configure.