Data Lake Analytics: Custom Outputter to write to different files? - azure-data-lake

I am trying to write a custom outputter for U-SQL that writes rows to individual files based on the data in one column.
For example, if the column has a date "2016-01-01", it writes that row a file with that name, and a the next row to a file with the value in the same column.
I am aiming to do this by using the Data Lake Store SDK within the outputter, which creates a client and uses the SDK functions to write to individual files.
Is this a viable and possible solution?
I have seen that the function to be overriden for outputters is
public override void Output (IRow row, IUnstructuredWriter output)
In which the IUnstructuredWriter is casted to a StreamWriter(I saw one such example), so I assume this IUnstructuredWriter is passed to this function by the U-SQL script. So that doesn't leave for me any control over this what is passed here, also it will remain constant for all rows and can't change.

This is currently not possible but we are working on this functionality in reply to this frequent customer request. For now, please add your vote to the request here: https://feedback.azure.com/forums/327234-data-lake/suggestions/10550388-support-dynamic-output-file-names-in-adla
UPDATE (Spring 2018): This feature is now in private preview. Please contact us via email (usql at microsoft dot com) if you want to try it out.

Related

How to update insert new record with updated value from staging table in Azure Data Explorer

I have requirement, where data is indigested from the Azure IoT hub. Sample incoming data
{
"message":{
"deviceId": "abc-123",
"timestamp": "2022-05-08T00:00:00+00:00",
"kWh": 234.2
}
}
I have same column mapping in the Azure Data Explorer Table, kWh is always comes as incremental value not delta between two timestamps. Now I need to have another table which can have difference between last inserted kWh value and the current kWh.
It would be great help, if anyone have a suggestion or solution here.
I'm able to calculate the difference on the fly using the prev(). But I need to update the table while inserting the data into table.
As far as I know, there is no way to perform data manipulation on the fly and inject Azure IoT data to Azure Data explorer through JSON Mapping. However, I found a couple of approaches you can take to get the calculations you need. Both the approaches involve creation of secondary table to store the calculated data.
Approach 1
This is the closest approach I found which has on-fly data manipulation. For this to work you would need to create a function that calculates the difference of Kwh field for the latest entry. Once you have the function created, you can then bind it to the secondary(target) table using policy update and make it trigger for every new entry on your source table.
Refer the following resource, Ingest JSON records, which explains with an example of how to create a function and bind it to the target table. Here is a snapshot of the function the resource provides.
Note that you would have to create your own custom function that calculates the difference in kwh.
Approach 2
If you do not need a real time data manipulation need and your business have the leniency of a 1-minute delay, you can create a query something similar to below which calculates the temperature difference from source table (jsondata in my scenario) and writes it to target table (jsondiffdata)
.set-or-append jsondiffdata <| jsondata | serialize
| extend temperature = temperature - prev(temperature,1), humidity, timesent
Refer the following resource to get more information on how to Ingest from query. You can use Microsoft Power Automate to schedule this query trigger for every minute.
Please be cautious if you decide to go the second approach as it is uses serialization process which might prevent query parallelism in many scenarios. Please review this resource on Windows functions and identify a suitable query approach that is better optimized for your business needs.

How to import updated records from XML files into SQL Database?

So from last few weeks I was trying to design a SSIS package that would read some XML files that I have and move the data from it to the multiple tables I want.
These file contains different nodes like Individual (parent node) and Address, Alias, Articles (all child nodes of Individual) etc.
Data in those files look like this:
<Individuals>
<Individual>
<UniqueID>1001</UniqueID>
<Name>Ben</Name>
<Soft_Delete>N</Soft_Delete>
<Soft_Delete_Date>NULL</Soft_Delete_Date>
</Individual>
<Addresses>
<Address>
<Address_Line_1>House no 280</Address_Line_1>
<Address_Line_2>NY</Address_Line_2>
<Country>US</Country>
<Soft_Delete>N</Soft_Delete>
<Soft_Delete_Date>NULL</Soft_Delete_Date>
</Address>
<Address>
<Address_Line_1>street 100</Address_Line_1>
<Address_Line_2>California</Address_Line_2>
<Country>US</Country>
<Soft_Delete>N</Soft_Delete>
<Soft_Delete_Date>NULL</Soft_Delete_Date>
</Address>
</Addresses>
</Individuals>
I was successful in designing it and now I have a different task.
The files I had were named like this: Individual_1.xml,Individual_2.xml,Individual_3.xml etc.
Now I have received some new files which are named like this:
Individual_UPDATE_20220716.xml,Individual_UPDATE_20220717.xml,Individual_UPDATE_20220718.xml,Individual_UPDATE_20220720.xml etc
Basically these files contains the updated information of previously inserted records
OR
There are totally new records
For example:
A record or a particular information like Address of an Individual was Soft Deleted.
Now I am wondering how would I design or modify my current SSIS package to update the data from these new files into my database?
Any guidance would be appreciated....
Thank you...
It looks like you have no problem reading the XML so I won't really talk about that. #Yitzack's comment to his prior answer is "a" way to do it. However, his answer assumes you can create staging tables. To do this entirely inside SSIS, the way to do it is as follows...
I would treat all the files the same (as long as they have the same data structure and it seems like the case.
Read the XML as a source.
Proceed to a lookup.
Set the lookup to ignore errors (this is handled in the next step)
Set your lookup to the destination table and lookup uniqueID and add to the data flow. Since you said ignore errors, SSIS will insert a null in that field if the lookup fails to find a match.
Condition Split based on the destination.UniqueID == null and call that output inserts and change the default to updates.
Add a SQL statement to update the existing record and map your row to it (this is somewhat slow and why the merge is better with large data sets).
Connect from Condition Split to update SQL statement and map appropriately.
Add an insert, connect the proper output from Conditional Split and map.
Note: It looks like you are processing from a file system, and it is likely that order is important. You may have to order you foreach loop. I will provide a simple example that you can modify.
Create a filesInOrder variable of type object.
Create a script task.
Add filesInOrder as a read/write variable
Enter script...
var diFiles = new System.IO.DirectoryInfo(#"path to your folder").GetFiles("*.xml");
var files = diFiles.OrderBy(o => o.CreationTime).Select(s => s.FullName);
Dts.Variables["filesInOrder"].Value = files.ToArray();
Make sure you add Using System.Linq; to your code.
Finally, use filesInOrder as your base to a foreach component based on an ADO enumerator (filesInOrder).

How to store and serve coupons with Google tools and javascript

I'll get a list of coupons by mail. That needs to be stored somewhere somehow (bigquery?) where I can request and send it to the user. The user should only be able to get 1 unique code, that was not used beforehand.
I need the ability to get a code and write, that it was used, so the next request gets the next code...
I know it is a completely vague question but I'm not sure how to implement that, anyone has any ideas?
thanks in advance
Thr can be multiples solution for same requirement, one of them is given below :-
Step 1. Try to get coupons over a file (CSV, JSON, and etc) as per your preference/requirement.
Step 2. Load Source file to GCS (storage).
Step 3. Write a Dataflow code which read data from GCS (file) an load data to a different Bigquery table (tentative name: New_data). Sample code.
Step 4. Create a Dataflow code to read data from Bigquery table New_data and compare it with History_data and identify new coupons and write data to a file on GCS or Bigquery table. Sample code.
Step 5. Schedule entire process over an orchestrator/Cloud scheduler/Cron tab job.
Step 6. Once you have data you can send it to consumers through any communication channel.

Migration from Oracle to CDS using ADF

I am trying to migrate data from Oracle to an entity in Common Data Service( CDS) through Azure Data Factory Copy Activity. As CDS comes with GUID as a primary key and Oracle doesnt have primary key, my pipeline always fails.
I tried to create an additional column in source data set with value as #guid() however it throws that column must be of type guid
also tried
select REGEXP_REPLACE(SYS_GUID(), '(.{8})(.{4})(.{4})(.{4})(.{12})', '\1-\2-\3-\4-\5') MSSQL_GUID,c. * from table_name c;
guid is coming as string in the mapping
How do we automatically generate guid in this scenario
Could you please try updating your additional column (#guid()) data type from "type": "String" to "type": "Guid" by editing the JSON payload of your pipeline (look for {} symbol at top right corner of your pipeline). See below GIF:
Update:
After further analysis by collaborating with product team, it (type coversion) is identified as an unsupported feature with dynamics sink, where UX disables type conversion for dynamics sink. UX hasn't supported it since the release of type conversion feature.
Product team has opened a work item as a feature improvement for type conversion with Dynamics sink . The ETA for this feature support is mid of September (Note: This is tentative date), but product team is actively working on it. I will closely monitor the work item and will update this post as soon as I have additional information.
As a workaround, please try to split pipeline into 2 copies (copy activities). Oracle -> csv & csv -> dynamics. In first copy, add an additional column to write empty guid column in csv file. In second copy, change the type of guid column in csv to Guid and do the copy.
Please let us know how it goes.

Web Test Conditional Flow

I have created a webtest which is a series of web service requests. My data source contains a list of mobile numbers and these mobile numbers can be of two types - A and B. The problem is that data source contains the mix of A and B. When the test runs, it loads one mobile number from the data source (XML file). I want to determine when the test is running as to what is the type of the mobile number (A or B)! Because depending on that I will be sending appropriate message to the web server.
It is however possible for me to create a text file which contains key value pairs (mobile number, type) before running the tests. However adding a plugin which reads the whole file and then finds the mobile number type will be too slow. Is it possible to have these mappings stored in memory during the entire duration of the test? So that I can just query them?
Thanks
Amare
Instead of using the XML file as the data source, use your new text file as the data source.
For example, if your data source is DataSource1 and your file is numbers.csv, and you have columns mobile number and type then in your test you can refer to the following context parameters:
DataSource1.numbers#csv.mobile#number
DataSource1.numbers#csv.type
Use a pair of String Comparison Conditional Rules to decide which request to execute depending on the value of DataSource1.numbers#csv.type.