Is it possible to rename file name with MD5 hash encoding in Azure data factory? - azure-data-factory-2

I have this scenario where I have a blob with multiple binary files and I would like to copy these files into another blob storage using Azure data factory.
The issue is my destination file name should be MD5 encoded . Now I would like to know if it is possible in ADF to change the file name to MD5 encoded type in destination and if yes can someone outline the steps on how to do it.
I can copy the files but i cant rename it to MD5 encoding.
Waiting for experts comments!!!
Thanks
zzz

As I know about DF, no, we can't.
The file name doesn't support MD5 expression or functions, it only support GUID(). Some Ideas are that you may try it with code level.
Ref: Expressions and functions in Azure Data Factory

Related

Trouble loading data into Snowflake using Azure Data Factory

I am trying to import a small table of data from Azure SQL into Snowflake using Azure Data Factory.
Normally I do not have any issues using this approach:
https://learn.microsoft.com/en-us/azure/data-factory/connector-snowflake?tabs=data-factory#staged-copy-to-snowflake
But now I have an issue, with a source table that looks like this:
There is two columns SLA_Processing_start_time and SLA_Processing_end_time that have the datatype TIME
Somehow, while writing the data to the staged area, the data is changed to something like 0:08:00:00.0000000,0:17:00:00.0000000 and that causes for an error like:
Time '0:08:00:00.0000000' is not recognized File
The mapping looks like this:
I have tried adding a TIME_FORMAT property like 'HH24:MI:SS.FF' but that did not help.
Any ideas to why 08:00:00 becomes 0:08:00:00.0000000 and how to avoid it?
Finally, I was able to recreate your case in my environment.
I have the same error, a leading zero appears ahead of time (0: 08:00:00.0000000).
I even grabbed the files it creates on BlobStorage and the zeros are already there.
This activity creates CSV text files without any error handling (double quotes, escape characters etc.).
And on the Snowflake side, it creates a temporary Stage and loads these files.
Unfortunately, it does not clean up after itself and leaves empty directories on BlobStorage. Additionally, you can't use ADLS Gen2. :(
This connector in ADF is not very good, I even had problems to use it for AWS environment, I had to set up a Snowflake account in Azure.
I've tried a few workarounds, and it seems you have two options:
Simple solution:
Change the data type on both sides to DateTime and then transform this attribute on the Snowflake side. If you cannot change the type on the source side, you can just use the "query" option and write SELECT using the CAST / CONVERT function.
Recommended solution:
Use the Copy data activity to insert your data on BlobStorage / ADLS (this activity did it anyway) preferably in the parquet file format and a self-designed structure (Best practices for using Azure Data Lake Storage).
Create a permanent Snowflake Stage for your BlobStorage / ADLS.
Add a Lookup activity and do the loading of data into a table from files there, you can use a regular query or write a stored procedure and call it.
Thanks to this, you will have more control over what is happening and you will build a DataLake solution for your organization.
My own solution is pretty close to the accepted answer, but I still believe that there is a bug in the build-in direct to Snowflake copy feature.
Since I could not figure out, how to control that intermediate blob file, that is created on a direct to Snowflake copy, I ended up writing a plain file into the blob storage, and reading it again, to load into Snowflake
So instead having it all in one step, I manually split it up in two actions
One action that takes the data from the AzureSQL and saves it as a plain text file on the blob storage
And then the second action, that reads the file, and loads it into Snowflake.
This works, and is supposed to be basically the same thing the direct copy to Snowflake does, hence the bug assumption.

How to store files on mysql server

I need to send files like an image or pdf to a mysql database from a vb.NET form. How can i do this? is there a specific column type? what type of sql query should i write to send the file?
You can upload and save file in a folder in the server, and the filename or path of the file in the database along with an unique identifier.
If you really want to save the file in database (which is highly not recommended) you can use blob datatype. but it is mostly recommended to save the file pathname in the database column as a string, while you store the file to a local storage, and when you need to display the file, you just get the path saved in your database column, this should give you the file.

how to read data stored in blobs

my application datbase in postgressql and from the document i understand that it store few data in a blob and from the table i can only get the oid of it.
is there any possibility to read the content from these blobs? if yes, could someone share the knowhow?
From the OID, a file with the contents of the large object can be exported.
Either client-side (psql):
\lo_export oid-to-export /path/to/a/file
Or server-side in SQL (creates the file on the server, beware that postgres must have the permission to write into the destination directory).
SELECT lo_export(oid-to-export, '/path/to/a/file');

copy blob data into on-premise sql table

My problem statement is that I have a csv blob and I need to import that blob into a sql table. Is there an utility to do that?
I was thinking of one approach, that first to copy blob to on-premise sql server using AzCopy utility and then import that file in sql table using bcp utility. Is this the right approach? and I am looking for 1-step solution to copy blob to sql table.
Regarding your question about the availability of a utility which will import data from blob storage to a SQL Server, AFAIK there's none. You would need to write one.
Your approach seems OK to me. Though you may want to write a batch file or something like that to automate the whole process. In this batch file, you would first download the file on your computer and the run the BCP utility to import the CSV in SQL Server. Other alternatives to writing batch file are:
Do this thing completely in PowerShell.
Write some C# code which makes use of storage client library to download the blob and once the blob is downloaded, start the BCP process in your code.
To pull a blob file into an Azure SQL Server, you can use this example syntax (this actually works, I use it):
BULK INSERT MyTable
FROM 'container/folder/folder/file'
WITH ( DATA_SOURCE = 'ds_blob',BATCHSIZE=10000,FIRSTROW=2);
MyTable has to have identical columns (or it can be a view against a table that yields identical columns)
In this example, ds_blob is an external data source which needs to be created beforehand (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql)
The external data source needs to use a database contained credential, which uses an SAS key which you need to generate beforehand from blob storage https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql)
The only downside to this mehod is that you have to know the filename beforehand - there's no way to enumerate them from inside SQL Server.
I get around this by running powershell inside Azure Automation that enumerates blobds and writes them into a queue table beforehand

How to detemine content type of binary data in the image field of SQL Server 2008?

I need to determine file type (i.e., MimeType) of stored data in the SQL Server 2008.
Is there anyway, if possible using SQL query, to identify the content type or MimeType of the binary data stored in the image column?
I think that, if you need that information, it would probably be better to store it in a separate column. Once it's in the DB, your only options really are guessing it from the file name (if you happen to store that) or by detecting the signature from the first few bytes of data.
There is no direct way in SQL Server to do that - there's no metadata on binary columns stored inside SQL Server, unless you've done it yourself.
For SQL Server, a blob is a blob is a blob - it's just a bunch of bytes, and SQL Server knows nothing about it, really. You need to have that information available from other sources, e.g. by storing a file name, file extension, mime type or something else in a separate column.
Marc