Data Migration from different sources and schema to MS SQL server - migration

We are government organisation and receiving data from different stakeholders usually data is not in single for format some like CSV, Excel, database views Also schema is not same always for source data. Some data is continuous streaming of data in CSV format at FTP folders. Is there any software that will automate this all work.

This is where ETL tools comes into picture. Microsoft has SSIS(SQL Server Integration Services) which provides connectors to these systems like another SQL Server or CSV Files or Excel or a shared location where the data files are getting dumped. You might have to design the ETL pipeline to get all the data into your SQL Server database.
SSIS

Related

Excel into Azure Data Factory into SQL

I read a few threads on this but noticed most are outdated, with excel becoming an integration in 2020.
I have a few excel files stored in Drobox, I would like to automate the extraction of that data into azure data factory, perform some ETL functions with data coming from other sources, and finally push the final, complete table to Azure SQL.
I would like to ask what is the most efficient way of doing so?
Would it be on the basis of automating a logic app to extract the xlsx files into Azure Blob, use data factory for ETL, join with other SQL tables, and finally push the final table to Azure SQL?
Appreciate it!
Before using Logic app to extract excel file Know Issues and Limitations with respect to excel connectors.
If you are importing large files using logic app depending on size of files you are importing consider this thread once - logic apps vs azure functions for large files
Just to summarize approach, I have mentioned below steps:
Step1: Use Azure Logic app to upload excel files from Dropbox to blob storage
Step2: Create data factory pipeline with copy data activity
Step3: Use blob storage service as a source dataset.
Step4: Create SQL database with required schema.
Step5: Do schema mapping
Step6: Finally Use SQL database table as sink

General question about ETL solutions for Azure for a small operation

The way we use data is either retrieving survey data from other organizations, or creating survey instruments ourselves and soliciting organizations under our organization for data.
We have a database where our largest table is perhaps 10 million records. We extract and upload most of our data on an annual basis, with occasionally needing to ETL over large numbers of tables from organizations such as the Census, American Community Survey, etc. Our database is all on Azure and currently the way that I get databases from Census flat files/.csv files is by re-saving them as Excel and using the Excel import wizard.
All of the 'T' in ETL is happening within programmed procedures within my staging database before moving those tables (using Visual Studio) to our reporting database.
Is there a more sophisticated technology I should be using, and if so, what is it? All of my education in this matter comes from perusing Google and watching YouTube, so my grasp on all of the different terminology is lacking and searching on the internet for ETL is making it difficult to get to what I believe should be a simple answer.
For a while I thought we wanted to eventually graduate to using SSIS, but I learned that SSIS was something that was used primarily if you had a database on prem. I've tried looking at dynamic SQL using BULK INSERT to find that BULK INSERT doesn't work with Azure DBs. Etc.
Recently I've been learning about Azure Data Factory and something called Bulk Copy Program using Windows Power Shell.
Does anybody have any suggestions as to what technology I should look at for a small-scale BI reporting solution?
I suggest you using the Data Factory, it has good performance for the large data transfer.
Refence here: Copy performance and scalability achievable using ADF
Copy Active supports you using table data, query or stored procedure to filter data in Source:
Sink support you select the destination table, stored procedure or auto create table(bulk insert) to receive the data:
Data Factory Mapping Data Flow provides more features for the data convert.
Ref: Copy and transform data in Azure SQL Database by using Azure Data Factory.
Hope this helps.

Access Azure Data Lake Analytics Tables from SQL Server Polybase

I need to export a multi terabyte dataset processed via Azure Data Lake Analytics(ADLA) onto a SQL Server database.
Based on my research so far, I know that I can write the result of (ADLA) output to a Data Lake store or WASB using built-in outputters, and then read the output data from SQL server using Polybase.
However, creating the result of ADLA processing as an ADLA table seems pretty enticing to us. It is a clean solution (no files to manage), multiple readers, built-in partitioning, distribution keys and the potential for allowing other processes to access the tables.
If we use ADLA tables, can I access ADLA tables via SQL Polybase? If not, is there any way to access the files underlying the ADLA tables directly from Polybase?
I know that I can probably do this using ADF, but at this point I want to avoid ADF to the extent possible - to minimize costs, and to keep the process simple.
Unfortunately, Polybase support for ADLA Tables is still on the roadmap and not yet available. Please file a feature request through the SQL Data Warehouse User voice page.
The suggested work-around is to produce the information as Csv in ADLA and then create the partitioned and distributed table in SQL DW and use Polybase to read the data and fill the SQL DW managed table.

Azure SQL DB - data file export (.csv) from azure sql

I am new to Azure SQL.
We have a client db which is in Azure SQL. We need to set up a process automation which extract query results to .CSV files and load it in our server (on premise SQL server 2008 R2).
What is the best method to generate csv files from Azure sql and make it accessible for the on premise server?
Honestly the best in terms of professional approach is to use Azure Data Factory and installation of Integration Runtime on the on premises.
You of course can use BCP but it will be cumbersome in the long run. A lot of scripts, tables, maintenance. No logging, no metrics, no alerts... Don't do it honestly.
SSIS is another option butin my opinion it takes more effort than ADF solution.
Azure Data Factory will allow you to do this in professional way using user interface with no coding. It also can be parametrized so you just change name of table name parameter and suddenly you are exporting 20, 50 or 100 tables at ease.
Here is video example and intro into data factory if you want to see quick overview. In this overview there is also demo which imports CSV to Azure SQL, you can just change it a little bit to make Azure SQL -> CSV and CSV > SQL server or just directly Azure SQL > SQL server.
https://youtu.be/EpDkxTHAhOs
It really is straightforward.
Consider using simple bcp from the on prem environment save the results to csv and then load the csv into the on prem server.
You can also use SSIS to implement an automated task.
Though I would like to know why you need the intermediate csv file? you can simply just copy data between databases (cloud -> On prem) with a scheduled SSIS package.
If you have on-prem SQL access then a simple SSIS package is probably the quickest and easiest way to go. If your source is Azure SQL and the ultimate destination is On-Prem SQL, you could use SSIS and skip the CSV all together.
If you want to stick to an Azure PAAS solution you could consider using Azure Data Factory. You can setup a gateway to access the on-prem SQL server directly or if you really want to stick to a CSC then look into using a Logic App.
Azure Data Factory is surely option.
Simple solution would be pyodbc driver with little bit of python. https://learn.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-2017
You can also try sqlcmd and bit of powershell or bash on top.
https://learn.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-2017

Is it possible to transfer data to SQL Server from an unconventional database

My company has a SQL Server database which they would like to populate with data from a hierarchical database (as opposed to relational). I have already written a .net application to map its schema to a relational database and have successfully done that. However my problem is that the tech being used here is so old that I see no obvious way of data transfer.
However, I have certain ideas about how I can do this. This involves having to write file scans in my unconventional database and dump out files as csv. Then do a bulk upload into SQL Server. I do not appreciate this as there is the element of invalid data involved which terminates the bulk upload quite so often.
I was hoping to explore options around service broker. I was hoping to dump out live transactions where a record has changed in my database and then this can somehow be picked up?
Secondly I was also hoping to use something which if I dump out live or changed records in a file (I can format the file to whatever format is needed), can something suck it into SQL Server?
Any help would be greatly appreciated.
Regards,
Waqar
Service Broker is a very powerful queue/messaging management system. I am not sure why you want to use it for this.
You can set up an SSIS job that keeps checking a folder for csv files and when detects a new one it reads it into SQL Server and then zips it and archives it somewhere else. This is very common. SSIS can then either process the data (its a wonderful ETL tool) or invoke procedures in SQL Server to process the data. SSIS is very fast and is rarely overwhelmed so why would you use Service Broker?
If its IMS (mainframe) type data you have to convert it to flat tables and then as csv type text tables for SQL Server to read.
SQL server is very good at processing XML and, as of 2016, JSON shaped data, so if that is your data type you can directly import into SQL Server.
Skip bulk insert. The SQL Server xml data type lends itself to doing what you're doing. If you can output data from your other system into an XML format, you can push that XML directly into an argument for a stored procedure.
After that, you can use the functions for the XML type to iterate through the data as needed.