Problem importing data from azure sql db using azure data factory into Dynamics 365 createdon and modifiedon fields - azure-sql-database

I want to copy account records from an azure sql db to D365. I am using Azure data factory to achieve this.
I am having problems with a couple of D365 fields, createdon and modifiedon. When I try to populate these with the values from my db they are ignored and D365 instead uses the datetime when I actually insert them into D365.
I've done some reading up and it seems these are locked fields. There are a number of ways to get round this from other tools but I haven't seen a way to get around this issue when importing records from azure data factory.
I've tried making the source field in the db that I want to use, nVarChar, Date and DateTime but all are ignored.
Does anyone know if this is possible and if so how I can achieve it? Can I do it through ADF alone or do I need to configure D365 to allow these fields to be created with a custom value?
Thanks
Macca

Related

Azure Data Factory - Rerun Failed Pipeline Against Azure SQL Table With Differential Date Filter

I am using ADF to keep an Azure SQL DB in sync with an on-prem DB. The on-prem DB is read only and the direction is one-way, from the Azure SQL DB to the on-prem DB.
My source table in the Azure SQL Cloud DB is quite large (10's of millions of rows) so I have the pipeline set to use an UPSERT (merge, trying to create a differential merge). I am using a filter on the Source table and the and the Filter Query has a WHERE condition that looks like this:
[HistoryDate] >= '#{formatDateTime(pipeline().parameters.windowStart, 'yyyy-MM-dd HH:mm' )}'
AND [HistoryDate] < '#{formatDateTime(pipeline().parameters.windowEnd, 'yyyy-MM-dd HH:mm' )}'
The HistoryDate column is auto-maintained in the source table with a getUTCDate() type approach. New records will always get a higher value and be included in the WHERE condition.
This works well, but here is my question: I am testing on my local machine before deploying to the client. When I am not working, my laptop hibernates and the pipeline rightfully fails because my local SQL Instance is "offline" during that run. When I move this to production this should not be an issue (computer hibernating), but what happens if the clients connection is temporarily lost (i.e, the client loses internet for a time)? Because my pipeline has a WHERE condition on the source to reduce the table size upsert to a practical number, any failure would result in a loss of any data created during that 5 minute window.
A failed pipeline can be rerun, but the run time would be different at that moment in time and I would effectively miss the block of records that would have been picked up if the pipeline had been run on time. pipeline().parameters.windowStart and pipeline().parameters.windowEnd will now be different.
As an FYI, I have this running every 5 minutes to keep the local copy in sync as close to real-time as possible.
Am I approaching this correctly? I'm sure others have this scenario and it's likely I am missing something obvious. :-)
Thanks...
Sorry to answer my own question, but to potentially help others in the future, it seems there was a better way to deal with this.
ADF offers a "Metadata-driven Copy Task" utility/wizard on the home screen that creates a pipeline. When I used it, it offers a "Delta Load" option for tables which takes a "Watermark". The watermark is a column for an incrementing IDENTITY column, increasing date or timestamp, etc. At the end of the wizard, it allows you to download a script that builds a table and corresponding stored procedure that maintains the values of each parameters after each run. For example, if I wanted my delta load to be based on an IDENTITY column, it stores the value of the max value of a particular pipeline run. The next time a run happens (trigger), it uses this as the MIN value (minus 1) and the current MAX value of the IDENTITY column to get the added records since the last run.
I was going to approach things this way, but it seems like ADF already does this heavy lifting for us. :-)

Updating rows in a SQL table with Azure Data Factory

I have taken over a project with minimal knowledge on how to use Azure Data Factory so need some help. The data factory is copying data from one postgres sql server over to my azure sql server. It is running 3 times a day and inserts new rows perfectly. But when data has changed in postgres it does not update the row as needed in the sink database. Can anyone point me in the right direction?
Since the source are on-premise, you can't use data flow. It means that the tutorial #Mark kromer provided for you doesn't works.
Per my experience in Copy active, we only can copy(insert) the data to sink table, won't update it. I'm afraid to say we can't update rows with copy active.

How to sync/update a database connection from MS Access to SQL Server

Problem:
I need to get data sets from CSV files into SQL Server Express (SSMS v17.6) as efficiently as possible. The data sets update daily into the same CSV files on my local hard drive. Currently using MS Access 2010 (v14.0) as a middleman to aggregate the CSV files into linked tables.
Using the solutions below, the data transfers perfectly into SQL Server and does exactly what I want. But I cannot figure out how to refresh/update/sync the data at the end of each day with the newly added CSV data without having to re-import the entire data set each time.
Solutions:
Upsizing Wizard in MS Access - This works best in transferring all the tables perfectly to SQL Server databases. I cannot figure out how to update the tables though without deleting and repeating the same steps each day. None of the solutions or links that I have tried have panned out.
SQL Server Import/Export Wizard - This works fine also in getting the data over to SSMS one time. But I also cannot figure out how to update/sync this data with the new tables. Another issue is that choosing Microsoft Access as the data source through this method requires a .mdb file. The latest MS Access file formats are .accdb files so I have to save the database in an older .mdb version in order to export it to SQL Server.
Constraints:
I have no loyalty towards MS Access. I really am just looking for the most efficient way to get these CSV files consistently into a format where I can perform SQL queries on them. From all I have read, MS Access seems like the best way to do that.
I also have limited coding knowledge so more advanced VBA/C++ solutions will probably go over my head.
TLDR:
Trying to get several different daily updating local CSV files into a program where I can run SQL queries on them without having to do a full delete and re-import each day. Currently using MS Access 2010 to SQL Server Express (SSMS v17.6) which fulfills my needs, but does not update daily with the new data without re-importing everything.
Thank you!
You can use a staging table strategy to solve this problem.
When it's time to perform the daily update, import all of the data into one or more staging tables. Execute SQL statement to insert rows that exist in the imported data but not in the base data into the base data; similarly, delete rows from the base data that don't exist in the imported data; similarly, update base data rows that have changed values in the imported data.
Use your data dependencies to determine in which order tables should be modified.
I would run all deletes first, then inserts, and finally all updates.
This should be a fun challenge!
EDIT
You said:
I need to get data sets from CSV files into SQL Server Express (SSMS
v17.6) as efficiently as possible.
The most efficient way to put data into SQL Server tables is using SQL Bulk Copy. This can be implemented from the command line, an SSIS job, or through ADO.Net via any .Net language.
You state:
But I cannot figure out how to refresh/update/sync the data at the end
of each day with the newly added CSV data without having to re-import
the entire data set each time.
It seems you have two choices:
Toss the old data and replace it with the new data
Modify the old data so that it comes into alignment with the new data
In order to do number 1 above, you'd simply replace all the existing data with the new data, which you've already said you don't want to do, or at least you don't think you can do this efficiently. In order to do number 2 above, you have to compare the old data with the new data. In order to compare two sets of data, both sets of data have to be accessible wherever the comparison is to take place. So, you could perform the comparison in SQL Server, but the new data will need to be loaded into the database for comparison purposes. You can then purge the staging table after the process completes.
In thinking further about your issue, it seems the underlying issue is:
I really am just looking for the most efficient way to get these CSV
files consistently into a format where I can perform SQL queries on
them.
There exist applications built specifically to allow you to query this type of data.
You may want to have a look at Log Parser Lizard or Splunk. These are great tools for querying and digging into data hidden inside flat data files.
An Append Query is able to incrementally add additional new records to an existing table. However the question is whether your starting point data set (CSV) is just new records or whether that data set includes records already in the table.
This is a classic dilemma that needs to be managed in the Append Query set up.
If the CSV includes prior records - then you have to establish the 'new records' data sub set inside the CSV and append just those. For instance if you have a sequencing field then you can use a > logic from the existing table max. If that is not there then one would need to do a NOT compare of the table data with the csv data to identify which csv records are not already in the table.
You state you seek something 'more efficient' - but in truth there is nothing more efficient than a wholesale delete of all records and write of all records. Most of the time one can't do that - but if you can I would just stick with it.

Create an Azure Data Factory pipeline to copy new records from DocumentDB to Azure SQL

I am trying to find the best way to copy yesterday's data from DocumentDB to Azure SQL.
I have a working DocumentDB database that is recording data gathered via a web service. I would like to routinely (daily) copy all new records from the DocumentDB to an Azure SQL DB table. In order to do so I have created and successfully executed an Azure Data Factory Pipeline that copies records with a datetime > '2018-01-01', but I've only ever been able to get it to work with an arbitrary date - never getting the date from a variable.
My research on DocumentDB SQL querying shows that it has Mathematical, Type checking, String, Array, and Geospatial functions but no date-time functions equivalent to SQL Server's getdate() function.
I understand that Data Factory Pipelines have some system variables that are accessible, including utcnow(). I cannot figure out, though, how to actually use those by editing the JSON successfully. If I try just including utcnow() within the query I get an error from DocumentDB that "'utcnow' is not a recognized built-in function name".
"query": "SELECT * FROM c where c.StartTimestamp > utcnow()",
If I try instead to build the string within the JSON using utcnow() I can't even save it because of a syntax error:
"query": "SELECT * FROM c where c.StartTimestamp > " + utcnow(),
I am willing to try a different technology than a Data Factory Pipeline, but I have a lot of data in our DocumentDB so I'm not interested in abandoning that, and I have much greater familiarity with SQL programming and need to move the data there for joining and other analysis.
What is the easiest and best way to copy those new entries over every day into the staging table in Azure SQL?
Are you using ADF V2 or V1?
For ADF V2.
I think that you can follow the incremental approach that they recommend, for example you could have a watermark table (it could be in your target Azure SQL database) and two lookups activities, one of the lookups will obtain the previous run watermark value (it could be date, integer, whatever your audit value is) and another lookup activity to obtain the MAX (watermark_value, i.e. date) of your source document and have a CopyActivity that gets all the values where the c.StartTimeStamp<=MaxWatermarkValueFromSource AND c.StartTimeStamp>LastWaterMarkValue.
I followed this example using the Python SDK and worked for me.
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-powershell

Azure Machine Learning Write output to Azure SQL Database

I am using Azure Machine Learning to clustering data.
The input data is from an Azure SQL Database, and it works fine.
At the end of everything I want to write the output to a table in the same Azure SQL Database, but I get this error:
Error: Error 1000: AFx Library library exception:
Sql encountered an error: Login failed for user
Anyone any idea?
Thank you very much!
Please follow the instructions and examine the examples provided here to properly use the Export Data module to save the data of ML to Azure SQL Database.
How to Export Data to an Azure SQL Database
Add the Export Data module to your experiment. You can find this module in the Data Input and Output group in the experiment items list in Azure Machine Learning Studio.
Connect it to the module that produces the data that you want to export to Azure SQL DB.
For Data destination, select Azure SQL Database. This option supports Azure SQL Data Warehouse as well.
Set the following options specific to Azure SQL Database or Azure SQL Data Warehouse.
Database server name
Type the server name that is generated by Azure. Typically it has the form <generated_identifier>.database.windows.net.
Database name
Type the name of a database on the server you just specified.The database must already exist; the Export Data cannot create it.
Server user account name
Type the user name of an account that has access permissions for the database.
Server user account password
Provide the password for the specified user account.
Comma-separated list of columns to be saved
Type the names of the columns in the experiment that you want to write to the database.
Data table name
Type the name of the table where data will be stored.
For Azure SQL Database, if the table does not exist, it will be created. For Azure SQL Data Warehouse, the table must already exist and have the correct schema, so be sure to create it in advance.
Comma-separated list of datatable columns
Type the names of the columns as you wish them to appear in the destination table. The columns should correspond in order with the column names that you list in Comma-separated list of columns to be saved.
if you are writing to Azure SQL Data Warehouse, the columns names must match those already in the destination table schema.
Number of rows written per SQL Azure operation
Indicate how many rows should be written to the destination table in each batch. By default, the value is set to 50, which is the default batch size for Azure SQL Database. However, you should increase this value if you have a large number of rows to write.
TIP:
For Azure SQL Data Warehouse, we recommend that you set this value to 1. If you use a larger batch size, the size of the command string that is sent to Azure SQL Data Warehouse can exceed the allowed string length, causing an error.
If you don't want to write new results each time you run the experiment, select the Use cached results option. If there are no other changes to module parameters, the experiment will write the data the first time the module is run, and thereafter not perform writes.
However, a write will always be performed if any parameters have been changed in Export Data that would change the results.
Run the experiment.
Find the issue!
I needed to create an specific user with this SQL code:
CREATE USER AMLApplicationUser WITH PASSWORD = '************';
and then add the user to these roles on the database I want to write.
ALTER ROLE db_datareader ADD MEMBER AMLApplicationUser;
ALTER ROLE db_datawriter ADD MEMBER AMLApplicationUser;
I guess only the datawriter role is enough, but I needed datareader too.
So in conclusion, seems that database admin role can be used to read data, but not to write data from AML.
Thank you for your help!