How to specify database when submit a data lake job from Powershell? - azure-data-lake

the cmdlet Submit-AzureRmDataLakeAnalyticsJob doesn't have an option to allow database. do we have to hard code the database in script? I want to run my script on different databases.

Yes, today when using Azure PowerShell, CLI, SDK, or REST API, you have to hardcode the database that you'd like to use in the script, if it is different from the default database.

Related

Migrate from H2 to PostgreSQL

I need to replace H2 with PostgreSQL at the WSO2 API Manager. Since there is currently data saved on H2, I need to pass it to PostgreSQL.
I found the command
SCRIPT TO 'dump.sql'
to export the data to .sql files, but I could not use it because I was not given the credentials to access the database, so I had to retrieve the data from the .mv.db files that H2 generates. On those files the data is not encrypted, but the password obviously is. To export the data to .sql files I used the command
java -cp h2-*.jar org.h2.tools.Recover -dir file_path -db file_name.
The .sql files are generated correctly, but when I try to import them into PostgreSQL with the command
psql -U db_user db_name < dump_name.sql
numerous syntax errors come up, probably due to the imcompatibility of H2 and PostgreSQL dialects. Is there a way to export the data so that it can then be imported into PostgreSQL? Alternatively, would there be an alternative way to migrate the data?
This is changing the database vendor and we don't support such use cases. There are different scripts in the /[PRODUCT_HOME]/dbscripts folder and you need to setup the target database (in your case PostgreSQL) using the correct scripts. This is due to the nature of differences between different database vendors. The datatypes and schema are different from one database vendor to another.
The correct approach is to go through the migration. You can setup a new environment with PostgreSQL and use a 3rd-party tool or a tool provided by the database vendor to migrate data from H2 to ProstgreSQL. There is no straightforward method to change the database from H2 to PostgreSQL.
For more information on the product migration - https://apim.docs.wso2.com/en/latest/install-and-setup/upgrading-wso2-api-manager/upgrading-guidelines/
WSO2 does not have any scripts or tools for cross-db migrations. However, you can use the API controller[1] to migrate APIs, Applications from the previous environment with H2 DB to a new one with PostgreSQL.
[1] - https://apim.docs.wso2.com/en/latest/install-and-setup/setup/api-controller/getting-started-with-wso2-api-controller/

Export SQL query result as a txt file to Azure blob storage automatically

I am trying to export the result of a query to a container in the Azure blob storage. I did much research and it seems there are services that can do this, but they are paid services; is there any way to automate this without any paid service at all? I can already push the files from my computer to the storage automatically, but if I could find a way to directly do this it will be great. Essentially I want to extract some data on daily basis to the storage and make it possible for a simple download using browsers or from withing Excel
fictitious Example:
SELECT name, salary FROM dbo.Employees
Export to https://mystorage.blob.core.windows.net/mycontainer/myresults.txt
If you are use on-premise SQL Server and want to run the script which save the query result to Blob Storage automatically, SSIS is the good solution. It's free and very effective.
You can ref this tutorial Azure Blob Storage Data Upload with SSIS: it teaches us run a SQL query and upload it to Blob Storage.
Then you can schedule run the SSIS package with a SQL Server agent job. Ref this document: How to Execute SSIS Packages from SQL Server Agent:
SSIS is indeed a good choice for implementing ETL processes. The
typical process is scheduled to run on a periodic basis. SQL Server
Agent is a good tool for executing SSIS packages as well as
scheduling jobs to run at the appropriate times.
You can combine these two documents and achieve your purpose.

How to run a T-SQL query daily on an Azure database

I am trying to migrate a database from a sql server into Azure. This database have 2 rather simple TSQL script that inserts data. Since the SQL Agent does not exist on Azure, I am trying to find an alternative.
I see the Automation thing, but it seems really complex for something as simple as running SQL scripts. Is there any better or at least easier way to do this ?
I was under the impression that there was a scheduller for that for I can't find it.
Thanks
There are several ways to run a scheduled Task/job on the azure sql database for your use case -
If you are comfortable using the existing on-premise sql sever agent you can connect to your azure sql db(using linked servers) and execute jobs the same way we used to on on-premise sql server.
Use Automation Account/Runbooks to create sql jobs. If you see marketplace you can find several examples on azure sql db(backup,restore,indexing jobs..). I guess you already tried it and does not seem a feasible solution to you.
Another not very famous way could be to use the webjobs(under app service web app) to schedule tasks(can use powershell scripts here). The disadvantage of this is you cannot change anything once you create a webjob
As #jayendran suggested Azure functions is definitely an option to achieve this use case.
If some how out of these if you do not have options to work with the sql directly , there is also "Scheduler Job Collection" available in azure to schedule invocation of HTTP endpoints, and the sql operation could be abstracted/implemented in that endpoint. This would be only useful for less heavy sql operations else if the operation takes longer chances are it might time out.
You can use Azure Functions to Run the T-SQL Queries for Schedule use Timely Trigger.
You can use Microsoft Flow (https://flow.microsoft.com) in order to create a programmed flow with an SQL Server connector. Then in the connector you set the SQL Azure server, database name, username and password.
SQL Server connector
There are many options but the ones that you can use to run a T-SQL query daily are these:
SQL Connector options
Execute a SQL Query
Execute stored procedure
You can also edit your connection info in Data --> Connections menu.

Deploying USQL project

I am new to data lake analytics and using USQL.
I am currently setting up data factory pipeline which would replace an existing SSIS workflow. The data factory pipeline would essentially
Extract data transactional database into ADLS
Transform raw entities using USQL
Load the data into SSAS using custom activity
Question
I have a USQL project set up and wanted if there was a standard way of deploying them to ADLA other than just uploading the scripts to a folder in the store.
Great question!
I'm not sure about a standard way, or even a way that might be considered best practice yet. But I use all of the tools you mention to perform very similar tasks.
To try and answer your question: What I do is create the U-SQL scripts as stored procedures within the logical ADLA database. In the VS USQL project I have 1 script per stored proc. The ADF activities then call the proc name. This gives you the right level of disconnection between services and also means you don't need additional blob storage for USQL files.
In my VS solution I often also have a PowerShell project to help manage things. Specifically one what takes all my 'usp_' U-SQL scripts to create one big DDL style thing that can be deployed to the logical ADLA database.
The PowerShell then does the deployment for me using the submit job cmdlet. Example below.
Submit-AzureRmDataLakeAnalyticsJob `
-Name $JobName `
-AccountName $DLAnalytics `
–Script $USQLProcDeployAll `
-DegreeOfParallelism $DLAnalyticsDoP
Hope this gives you a steer. I also accept that these tools are still fairly new. So open to other suggestions.
Cheers

Azure SQL DB - data file export (.csv) from azure sql

I am new to Azure SQL.
We have a client db which is in Azure SQL. We need to set up a process automation which extract query results to .CSV files and load it in our server (on premise SQL server 2008 R2).
What is the best method to generate csv files from Azure sql and make it accessible for the on premise server?
Honestly the best in terms of professional approach is to use Azure Data Factory and installation of Integration Runtime on the on premises.
You of course can use BCP but it will be cumbersome in the long run. A lot of scripts, tables, maintenance. No logging, no metrics, no alerts... Don't do it honestly.
SSIS is another option butin my opinion it takes more effort than ADF solution.
Azure Data Factory will allow you to do this in professional way using user interface with no coding. It also can be parametrized so you just change name of table name parameter and suddenly you are exporting 20, 50 or 100 tables at ease.
Here is video example and intro into data factory if you want to see quick overview. In this overview there is also demo which imports CSV to Azure SQL, you can just change it a little bit to make Azure SQL -> CSV and CSV > SQL server or just directly Azure SQL > SQL server.
https://youtu.be/EpDkxTHAhOs
It really is straightforward.
Consider using simple bcp from the on prem environment save the results to csv and then load the csv into the on prem server.
You can also use SSIS to implement an automated task.
Though I would like to know why you need the intermediate csv file? you can simply just copy data between databases (cloud -> On prem) with a scheduled SSIS package.
If you have on-prem SQL access then a simple SSIS package is probably the quickest and easiest way to go. If your source is Azure SQL and the ultimate destination is On-Prem SQL, you could use SSIS and skip the CSV all together.
If you want to stick to an Azure PAAS solution you could consider using Azure Data Factory. You can setup a gateway to access the on-prem SQL server directly or if you really want to stick to a CSC then look into using a Logic App.
Azure Data Factory is surely option.
Simple solution would be pyodbc driver with little bit of python. https://learn.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-2017
You can also try sqlcmd and bit of powershell or bash on top.
https://learn.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-2017