Does HCL OneTest Data generate the data for auditable fields when a user imports a schema from database? - schema

When I import a schema or sample data set from a database, then does HCL OneTest Data generate the data for auditable fields, such as INSERTED TIME, UPDATE TIME) in the table?

Currently, HCL OneTest Data does not support the data insertion to fields with the DateTime data type.

Related

Azure Data Factory Incremental Load data by using Copy Activity

I would like to load incremental data from data lake into on premise SQL, so that i created data flow do the necessary data transformation and cleaning the data.
after that i copied all the final data sink to staging data lake to stored CSV format.
I am facing two kind of issues here.
when ever i am trigger / debug to loading my dataset(data flow full activity ), the first time data loaded in CSV, if I load second time similar pipeline, the target data lake, the CSV file loaded empty data, which means, the column header loaded but i could not see the any value inside file.
coming to copy activity, which is connected to on premise SQL server, i am trying to load the data but if we trigger this pipeline again and again, the duplicate data loaded, i want to load only incremental or if updated data comes from data lake CSV file. how do we handle this.
Kindly suggest.
When we want to incrementally load our data to a database table, we need to use the Upsert option in copy data tool.
Upsert helps you to incrementally load the source data based on a key column (or columns). If the key column is already present in target table, it will update the rest of the column values, else it will insert the new key column with other values.
Look at following demonstration to understand how upsert works. I used azure SQL database as an example.
My initial table data:
create table player(id int, gname varchar(20), team varchar(10))
My source csv data (data I want to incrementally load):
I have taken an id which already exists in target table (id=1) and another which is new (id=4).
My copy data sink configuration:
Create/select dataset for the target table. Check the Upsert option as your write behavior and select a key column based on which upsert should happen.
Table after upsert using Copy data:
Now, after the upsert using copy data, the id=1 row should be updated and id=4 row should be inserted. The following is the final output achieved which is inline with expected output.
You can use the primary key in your target table (which is also present in your source csv) as the key column in Copy data sink configuration. Any other configuration (like source filter by last modified configuration) should not effect the process.

How to deal with data structure changes when performing a full historical data load?

I'm dealing with a SQL Server database which contains a column "defined data" with JSON data in it (and some other simple columns). The data builds up over time, right now we have about 8 million rows.
The data from this db is periodically read by an ETL system which then reads the JSON data in the "defined data" column and maps the data to a new SQL Server table based on the columns names contained in the JSON data.
This SQL Server table is prone to changes, meaning that about every 4 months additional columns are needed or column names change. Whenever this SQL Server table changes its data structure, a new version is introduced, which also forces the JSON data structure to change.
However, the ETL system should still be able to load all historical (JSON) data from the SQL Server database, regardless of the changing version throughout time. How can I make this work, taking into consideration version changes of the SQL Server tables and the JSON data?
!example]1
So in this example my question is:
How can I ensure that I can load both client 20 and 21 into one SQL Server table without getting errors because the JSON data structure is not reflecting version 2 in the case of historical data?
Given the size of the SQL Server database, it doesn't seem like an option to update all historical JSON data according to the latest version (in this example that would mean adding "AssetType" for the 01-01-2021 data and filling it in with NULL).
Many, many thanks in advance!
First I would check if json fields exist in the table as column names by looking them up in the information schema. If not exists then alter table add column.
How can I ensure that I can load both client 20 and 21 into one SQL Server table without getting errors because the JSON data structure is not reflecting version 2 in the case of historical data?
You maintain 2 separate tables. A Raw/Staging/Bronze table that has the same schema as the source, and a Cleansed/Warehouse/Silver table that has the desired schema for reporting. If you have multiple separate sources, you may have separate Raw tables.
Periodically you enhance the schema of the Cleansed table to add new data that has appeared in the Raw table.

Best way to merge JSON blob files to SQL table using Azure Data Factory

I have a bunch of JSON files coming into Azure data lake gen 2, the JSON files contains new data as well as updates.
The data needs to be merged into a SQL table so I can start to do some reporting. The way I solved the problem has been to create a Azure Data factory that looks like this
Create and copy to temp table:
First I use the copy data to take the JSON and create a table from the schema and dump the content into the table.
Create delivery table:
Creates a table with the right schema if it doesn't already exsist
Merge temp with delivery:
Here I use a merge clause to cast and merge the data from the table that was created at step 1 with the table from step 2.
Delete temp data:
Deletes the table from step 1
This data factory gets triggered each time there's a new file in the data lake.
The pipeline solves my problem but I feel like there's a lot of unnecessary overhead by creating and dropping a new table each time I process a file.
Is there a way to optimize this flow, maybe by merging the JSON directly to the "Delivery" table?
Thanks in advance

Copy Data from Blob to SQL via Azure data factory

I have two sample files in blob as sample1.csv and sample2.csv as below
data sample
SQL table name sample2, with column Name,id,last name,amount
Created a ADF flow without schema, it results as below
preview data
source settings are allow schema drift checked.
sink setting are auto mapping turned on. allow insert checked. table action none.
I have also tried setting a define schema in dataset, its result are same.
any help here?
my expected outcome would be data in sample1 will inserted null into the column "last name"
If I understand correctly, you said: "my expected outcome would be data in sample1 will inserted null into the column last name", you only need to add a derived column to you sample1.csv file.
You could follow my steps:
I create a sample1.csv file in Blob Storage and a sample2 table in my SQL database:
Using DerivedColumn to create new column last name with null value:
expression: toString(null())
Sink settings:
Run the pipeline and check the data in table:
Hope this helps.
You cannot mix schemas in the same source in the same data flow execution.
Schema Drift will handle changes to the schema on an execution-per-execution basis.
But if you are reading multiple different schemas from a folder, you will get non-deterministic results.
Instead, if you loop through those files in a pipeline ForEach one-by-one, data flow will be able to handle the evolving schema.

Problem saving data in a database column with enum data type

Have created a Laravel schema and the column has enum data type. By default, it should save Agent in the column which works fine. When a user inserts some data am trying to change it to Agency but when checking on the DB it still remains as Agent:
Kindly assist?
Database schema
$table->enum('agent_or_agency', array('Agent','Agency'))->default('Agent');
Saving data in the above column from the logic
$data = new Agents();
$data->agent_or_agency = 'Agency';
$data->save();