I am currently migrating a sql server db to redshift and have a downstream ETL process which uses variable sql scripts in SSIS to extract data from various schemas and loads them back to a sql server.
I am currently trying to refactor the SSIS component to also use variables but this does not seem to be an option for ODBC connections.
Do anyone know if there is a way in which I can achive this?
Related
I understand this may not be too precise, so feel free to delete.
I am currently using Azure Data Factory as an ETL pipeline to transform JSON files that come in daily, with the same schema, and load them into Azure SQL- ADF Dataflow ADF Pipeline Output to SQL table
This process is working exactly as I want, but we would like to move everything on premise.
I have a powershell script that will move files from sftp to local, I'd like to then use a stored procedure to transform them into their respective tables.
However, I have very limited experience with SQL and creating such a procedure to parse these rather large files seems intimidating
My current ADF process is entirely automated- copied from sftp server to blob storage, ran through parametrized dataflow pipeline for transformations on trigger, and loaded into SQL
I would like to maintain this same functionality, ie. no manual input to run process
I'm aware of OPENJSON + cross apply and made an attempt, if anyone can point me in the right direction (if what im asking is possible using t-SQL)- my code is moot just using as an example.
tldr; Can I replicate my ADF pipeline + dataflow by using a SQL stored procedure to parse complex json file?
I need to create recurring job in which I have to map two SQL Server databases which are on two different servers. I need to check the data mismatch in both the tables in regular intervals because new data keeps on adding every second.
I am thing to use anyone of the ETL tool like kettle pentaho which will actually do the data mapping. Do we have any other better option to handle this scenario.
This seems a ETL job approach, as long as you're using SQL Server I would recommend you use SSIS, is the Microsoft ETL tool. Of course you can use Pentaho and I think it will work very good also.
Another approach would be use linked servers and a job, writing the script as a stored procedure, but in my opinion this is not a recommended way to address the problem (SSIS or any ETL tool is so much versatile).
I am trying to migrate a database from a sql server into Azure. This database have 2 rather simple TSQL script that inserts data. Since the SQL Agent does not exist on Azure, I am trying to find an alternative.
I see the Automation thing, but it seems really complex for something as simple as running SQL scripts. Is there any better or at least easier way to do this ?
I was under the impression that there was a scheduller for that for I can't find it.
Thanks
There are several ways to run a scheduled Task/job on the azure sql database for your use case -
If you are comfortable using the existing on-premise sql sever agent you can connect to your azure sql db(using linked servers) and execute jobs the same way we used to on on-premise sql server.
Use Automation Account/Runbooks to create sql jobs. If you see marketplace you can find several examples on azure sql db(backup,restore,indexing jobs..). I guess you already tried it and does not seem a feasible solution to you.
Another not very famous way could be to use the webjobs(under app service web app) to schedule tasks(can use powershell scripts here). The disadvantage of this is you cannot change anything once you create a webjob
As #jayendran suggested Azure functions is definitely an option to achieve this use case.
If some how out of these if you do not have options to work with the sql directly , there is also "Scheduler Job Collection" available in azure to schedule invocation of HTTP endpoints, and the sql operation could be abstracted/implemented in that endpoint. This would be only useful for less heavy sql operations else if the operation takes longer chances are it might time out.
You can use Azure Functions to Run the T-SQL Queries for Schedule use Timely Trigger.
You can use Microsoft Flow (https://flow.microsoft.com) in order to create a programmed flow with an SQL Server connector. Then in the connector you set the SQL Azure server, database name, username and password.
SQL Server connector
There are many options but the ones that you can use to run a T-SQL query daily are these:
SQL Connector options
Execute a SQL Query
Execute stored procedure
You can also edit your connection info in Data --> Connections menu.
I am new to Azure SQL.
We have a client db which is in Azure SQL. We need to set up a process automation which extract query results to .CSV files and load it in our server (on premise SQL server 2008 R2).
What is the best method to generate csv files from Azure sql and make it accessible for the on premise server?
Honestly the best in terms of professional approach is to use Azure Data Factory and installation of Integration Runtime on the on premises.
You of course can use BCP but it will be cumbersome in the long run. A lot of scripts, tables, maintenance. No logging, no metrics, no alerts... Don't do it honestly.
SSIS is another option butin my opinion it takes more effort than ADF solution.
Azure Data Factory will allow you to do this in professional way using user interface with no coding. It also can be parametrized so you just change name of table name parameter and suddenly you are exporting 20, 50 or 100 tables at ease.
Here is video example and intro into data factory if you want to see quick overview. In this overview there is also demo which imports CSV to Azure SQL, you can just change it a little bit to make Azure SQL -> CSV and CSV > SQL server or just directly Azure SQL > SQL server.
https://youtu.be/EpDkxTHAhOs
It really is straightforward.
Consider using simple bcp from the on prem environment save the results to csv and then load the csv into the on prem server.
You can also use SSIS to implement an automated task.
Though I would like to know why you need the intermediate csv file? you can simply just copy data between databases (cloud -> On prem) with a scheduled SSIS package.
If you have on-prem SQL access then a simple SSIS package is probably the quickest and easiest way to go. If your source is Azure SQL and the ultimate destination is On-Prem SQL, you could use SSIS and skip the CSV all together.
If you want to stick to an Azure PAAS solution you could consider using Azure Data Factory. You can setup a gateway to access the on-prem SQL server directly or if you really want to stick to a CSC then look into using a Logic App.
Azure Data Factory is surely option.
Simple solution would be pyodbc driver with little bit of python. https://learn.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-2017
You can also try sqlcmd and bit of powershell or bash on top.
https://learn.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-2017
I have several large SQL queries that I need to run against a Postgres data source. I am using SSIS on SQL Server 2008 R2 to move the data. Because of the way our system is set up, I have to use a tunnel via PuTTY and set up local port redirection.
In the SSIS package, I am using ADO.NET source and destination. I have PostgreSQL drivers installed, and we were able to get the 32-bit version working. My package runs, I am getting the data, but the data transformation tasks run painfully slow ... about 2,000 records per second.
Does anyone have experience making a trip to Postgres with static queries and dumping the results into a SQL Server? Any tips / best practices?
You should try to get the data and store it in a ssis raw file.
Then make your transformation and whatever you like on the raw file data.
After that send it back to DB.
General try not to have many calls to the database.