Can we access Data from ADLA tables creating ODATA source using REST API? - azure-data-lake

Is there a way if ODATA can be established on top of azure data lake analytics table via REST API's?
It seems there are REST API's to get ADLA job informations, account information etc.,
Is there any such existing API's to get data or is it possible to create API to access data via ODATA concept?

If you want to access data in ADLS files, there are REST APIs to get data from the lake. ADLS supports WebHDFS APIs with OAuth.
If you want to send queries and see their results or get U-SQL table data, you would have to write your own shim that translates the query you submit via your API into a U-SQL Script that outputs a file and then transparently download the file and returns it as the result.
Note that so-called interactive support is on the roadmap and being worked on. Once that is available, you can access the data using standard query APIs (such as ODBC, JDBC etc).

Related

Using Dataverse Synapse Link as reference data for Azure Stream Analytics

We are trying to use our Dataverse data as reference data for our Azure Stream Analytics. The idea is to couple customer acitivities with their CRM profile to create meaningful actions for them. We are currently moving from DES to the Dataverse Synapse Link, and have created the data lake where data gets dumped and can see it in Synapse Studio. However, Stream Analytics does not take CDM format out-of-the-box. It seems it can only handle CSV (with headers) and Json formats.
What is the best approach to get our Dataverse data in as reference for Stream Analytics (and in real time as possible)? Should we create a custom deserializer, or use ADF or something else?

How can I export data from azure storage table to .csv file in .Net core C#

is there an azure API to import/export an existing collection from Azure Table Storage in .csv?
The Table Storage REST API does not provide a response as CSV directly, so it's always necessary to transform the data accordingly, as for example the Azure Storage Explorer does using an older version of the azcopy v7.3.
I've built a little C# library that basically does the same. It currently caches all rows in memory though to create the CSV headers so that's something to be aware of.

API Key use w/ Tableau

We would like to use the Podio API Key to directly connect to Tableau and have the data refreshed at a cadence set in Tableau. Is this possible?
Yes, connecting to data via an API is possible and there are a couple of ways to do it:
Option #1: Web Data Connector
A WDC is a hosted web application built with JavaScript that connects to an API, converts the data to a JSON format, and passes the data to Tableau. You'll require a webserver to host your WDC and JavaScript skills to write it. Once set up, anyone in your org can just grab the link and use it in Desktop. With WDCs, since the data connection is made when the end-user is requesting the WDC, you can build in customizations for your users. (Ex: users can add filter parameters or authenticate with their own user/pass to only get what they have access to). WDC connections are extracts and can be refreshed on Tableau Server and Online. If you're using Tableau Online you'll need to use Bridge to auto-refresh.
Option #2: Hyper API
The Hyper API allows you to create, modify, and update extract (.hyper) files that you can then publish to Tableau Server/Online on a regular cadence. It's available for Python, Java, .NET, and C++ so you will need skills in one of those languages. I suggest Python as we have the most samples for it. You'll also need a server where you can run the extract refresh and publish scripts on a schedule. With the Hyper API you are creating a single extract for everyone to connect to. Once published to Tableau Server/Online, end users can just connect to this data source directly without having to do any input but this also means the connection can't be customized per user or use case.
Option #3: Use a 3rd-party connector
If building your own connector doesn't appeal to you there are also plenty of services out there you can pay for that can bring your data into Tableau. Ex: tray.io, dataddo, and skyvia are a couple I found after a quick google search.

Connecting to Cloud SQL server instance from BigQuery

There is an option to connect a Cloud mySQL instance from BigQuery. I just wanted to know how we can connect a Cloud SQL Server instance to BigQuery.
SQL Server:
There are a bunch of third-party extensions/tools that provide this service. One of them is SSIS Data Flow Source & Destination for Google BigQuery, which is Visual Studio extension that connects SQL Server with Google BigQuery data through SSIS Workflows.:
https://www.cdata.com/drivers/bigquery/ssis/
https://marketplace.visualstudio.com/items?itemName=CDATASOFTWARE.SSISDataFlowSourceDestinationforGoogleBigQuery
In regards to using SQL Server Integration Services to load the data from the on-premises SQL Server to BigQuery, you can take a look for this site. You can also perform ETL from a relational database into BigQuery using Cloud Dataflow, the official documentation details how it can be done, you might need to use Cloud Storage as an intermediate data sink.
Cloud SQL:
BigQuery allows to query data from Cloud SQL by using federated query. The connection must be created within the same project where your Cloud SQL instance is located. If you want to query your data stored in your Cloud SQL instance from BigQuery located in another project, please follow the steps listed below:
Enable the BigQuery API and the BigQuery connection API within your project.
Create a connection to your Cloud SQL instance within the project by following this documentation.
Once you have created the connection, please locate and select it within BigQuery.
Click on the SHARE CONNECTION button and grant permissions to the users that will be use that connection. Please note that the BigQuery Connection User role is the only needed to use a shared connection.
Additionally, please notice that the "Cloud SQL federated queries" feature is in a Beta stage and might change or have limited support (is no available for certain regions, in which case, it is required to use one the supported options mentioned here). Please remember, that to use Cloud SQL Federated queries in BigQuery, the intances need to have a public IP.
If you are limited e.g. by region, one good option might be exporting the data from CloudSQL to Storage as a CSV, and then load it into BigQuery. If you need, it is possible to automate this process using Cloud Composer, refer to this article.
Other approach is to extract information from Cloud SQL (with exports) and import it into BigQuery through load jobs, or streaming inserts.
I hope you find the above pieces of information useful.
It is possible, but be warned the feature is currently Beta
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries

How do I create a BigQuery dataset out of another BigQuery dataset?

I need to understand the below:
1.) How does one BigQuery connect to another BigQuery and apply some logic and create another BigQuery. For e.g if i have a ETL tool like Data Stage and we have some data been uploaded for us to consume in form of a BigQuery. So in DataStage or using any other technology how do i design the job so that the source is one BQ and the Target is another BQ.
2.) I want to achieve like my input will be a VIEW (BigQuery) and then need to run some logic on the BigQuery View and then load into another BigQuery view.
3.) What is the technology used to connected one BigQuery to another BigQuery is it https or any other technology.
Thanks
If you have a large amount of data to process (many GB), you should do the transformation of the data directly in the Big Query database. It would be very slow to extract all the data, run it through something locally, and send it back. You don't need any outside technology to make one view depend on another view, besides access to the relevant data.
The ideal job design will be an SQL query that Big Query can process. If you are trying to link tables/views across different projects then the source BQ table must be listed in fully-specified form projectName.datasetName.tableName in the FROM clauses of the SQL query. Project names are globally unique in Google Cloud.
Permissions to access the data must be set up correctly. BQ provides fine-grained control over who can access, and it is in the BQ documentation. You can also enable public access to all BQ users if that is appropriate.
Once you have that SQL query, you can create a new view by sending your SQL to Google BigQuery either through the command line (the bq tool), the web console, or an API.
1) You can use BigQuery Connector in DataStage to read and write to bigquery.
2) Bigquery use namespaces in the format project.dataset.table to access tables across projects. This allows you to manipulate your data in GCP as it were in the same database.
To manipulate your data you can use DML or standard SQL.
To execute your queries you can use the GCP Web console or client libraries such as python or java.
3) BigQuery is a RESTful web service and use HTTPS