How to convert a Hive SQL script to a Spark SQL Synapse Notebook - azure-synapse

I have a large number of Hive SQL scripts that I want to import into Azure Synapse Analytics to run as Spark SQL notebooks.
Is there any easy way to do this?
Synapse notebooks are json files with lots of extra information.

Related

Metastore in Azure Synapse

Is there a recommended solution for storing metadata of table/file schemas while using Azure Synapse Analytics Spark Pools. data will be written in either parquet/delta table format.
Thank you
Please Follow below reference it has a detailed explanation about storing metadata of table using Azure Synapse Spark Pool and configure Spark to use the external Hive Metastore.
Reference:
External Hive Metastore for Synapse Spark Pool by Microsoft

Export data table from Databricks dbfs to azure sql database

I am quite new to databricks and looking for a smart way to export a data table from databricks gold scheme to an azure sql database.
I am using databricks as a part of azure resource group, however I do not find data from databricks in any of the storage accounts that are within the same resource group. Does it mean that is is physically stored at en implicit databricks storage account/data lake?
Thanks in advance :-)
The tables you see in Databricks could be have the data stored within that Databricks Workspace file system (DBFS) or somewhere external (e.g. Data Lake, which could be in a different Azure Resource Group) - see here: Databricks databases and tables
For writing data from Databricks to Azure SQL, I would suggest the Apache Spark connector for SQL.

AWS SageMaker Spark SQL

I know that for example, with Qubole's Hive offering which uses Zeppelin notebooks, that I can use Spark SQL to execute native SQL commands to interact with Hive tables. I can read from external tables and create internal tables, or just run ad-hoc queries.
I am working on a project in AWS. I have data in S3, with external tables created in Athena. I have found articles, and followed them to setup some Jupyter notebooks, but I don't see how I can have notebooks running Spark SQL. Is this possible?
If not, what is the best mechanism in the AWS ecosystem for encapsulating logic to create internal tables from external tables, for secondary data processing.
You have two options:
1) run Jupyter notebooks on EMR: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks.html
2) run Jupyter notebooks on SageMaker: https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html
Both support PySpark, so you should be able to run SQL queries on whatever backend your data lives in.

U SQL: direct output to SQL DB

Is there a way to output U-SQL results directly to a SQL DB such as Azure SQL DB? Couldn't find much about that.
Thanks!
U-SQL only currently outputs to files or internal tables (ie tables within ADLA databases), but you have a couple of options. Azure SQL Database has recently gained the ability to load files from Azure Blob Storage using either BULK INSERT or OPENROWSET, so you could try that. This article shows the syntax and gives a reminder that:
Azure Blob storage containers with public blobs or public containers
access permissions are not currently supported.
wasb://<BlobContainerName>#<StorageAccountName>.blob.core.windows.net/yourFolder/yourFile.txt
BULK INSERT and OPENROWSET with Azure Blob Storage is shown here:
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/23/loading-files-from-azure-blob-storage-into-azure-sql-database/
You could also use Azure Data Factory (ADF). Its Copy Activity could load the data from Azure Data Lake Storage (ADLS) to an Azure SQL Database in two steps:
execute U-SQL script which creates output files in ADLS (internal tables are not currently supported as a source in ADF)
move the data from ADLS to Azure SQL Database
As a final option, if your data is likely to get into larger volumes (ie Terabytes (TB) then you could use Azure SQL Data Warehouse which supports Polybase. Polybase now supports both Azure Blob Storage and ADLS as a source.
Perhaps if you can tell us a bit more about your process we can refine which of these options is most suitable for you.

How to schedule a U-SQL Query in Azure Data Lake?

I want to execute a query in azure data lake daily. Can we schedule a U-SQL query in azure data lake?
Currently, there is no built-in way inside Data Lake Analytics to schedule a U-SQL job. Instead, you can use other services or tools to perform the scheduling. A popular one for Azure customers is Azure Data Factory.
Simple scheduling of U-SQL jobs inside Data Lake Analytics is something we are considering adding as a native capability.
There's two ways to execute a query in azure data lake daily:
Using ADF and Store the U-SQL script in Blob Storage and reference it via a Blob Storage linked service.
Create a SSIS Package using visual studio then import this package in SqlServer Agent serves Job . see Schedule U-SQL jobs