How to update file data in azure data lake - azure-data-lake

i uploaded a file in azure data lake.I want to upload same file again with updated data and want to update existing file. So what is the process?

Azure Data Lake store is not recommended for overwrite existing files(i.e. random writes).
Note: If you want to use the same filename and location, you can replace a file by overwriting it or delete the existing file and then upload.
Here are the steps how to upload file data in Azure data lake store:
Make sure to select "Allow overwrite existing files".

Related

How to write to Blob Storage in Azure SQL Server using TSql?

I'm creating a stored procedure which gets executed when a CSV is uploaded to Blob Storage. This file is then processed using TSQL and wish to write the result to a file
I have been able to read a file and process it using DATA_SOURCE, database scoped credential and external data source. I'm however stuck on writing the output back to a different blob container. How would I do this?
If it was me, I'd use Azure Data Factory, you can create a pipeline that's activated when a file is added to a blob, have it import that file, run an SP and export the results to a blob.
That maybe an Azure function that is activated on changes to a blob container.

Azure function to convert csv to excel file

I have requirement to read data from azure sql server and write in excel blob using data factory. i created csv file from azure sql server using datafactory copy activity. i have no idea how to convert csv to excel or directly read excel from azure sql using data factory. I searched on internet and found azure functions as an option.
Any suggestions you all have about saving CSV to XLSX via Azure Functions?
Excel format is supported for the following connectors: Amazon S3,
Azure Blob, Azure Data Lake Storage Gen1, Azure Data Lake Storage
Gen2, Azure File Storage, File System, FTP, Google Cloud Storage,
HDFS, HTTP, and SFTP. It is supported as source but not sink.
As the MSDN says, Excel format is not supported as sink by now. So you can't directly convert csv file to excel file using Copy activity.
In Azure function, you can create a python function and use pandas to read csv file. Then convert it to excel file as #Marco Massetti comments.

Add meta data to a data lake file using ADF

Azure Data Factory v2 has a Get Metadata activity which can read meta data on the files stored in ADLS. It can preserve the meta data on files when it moves/copies the files.
But is there a way to add or modify meta data on the lake files using ADF?
Yes there's a way.
You can make use of Azure Blob Storage API:
set-blob-metadata method for Blob Storage
Data lake is just an extension to underlying Blob Storage engine
So, you can hook up a web activity in your pipeline and call the rest api pointing at your blob and it will set metadata for you.
The meta data are created by Data Lake(Storage) once the files are uploaded on.
These properties can not be changed unless you delete and re-upload them to Data Lake(or Stroage). Some others have asked the same questions about how to change this meta data in Stack overflow. You could easily find these by seraching.
But if you modify the content of the file in Data Lake, such as add or delete columns, the size, columnCount and structure can be changed.
So for the question "is there a way to add or modify meta data on the lake files using ADF?", the answer is no, there isn't.
HTP.

Quickest way to import a large (50gb) csv file into azure database

I've just consolidated 100 csv.files into a single monster file with a total size of about 50gb.
I now need to load this into my azure database. Given that I have already created my table in the database what would be the quickest method for me to get this single file into the table?
The methods I've read about include: Import Flat File, blob storage/data factory, BCP.
I'm looking for the quickest method that someone can recommend please?
Azure data factory should be a good fit for this scenario as it is built to process and transform data without worrying about the scale.
Assuming that you have the large csv file stored somewhere on the disk you do not want to move it to any external storage (to save time and cost) - it would be better if you simply create a self integration runtime pointing to your machine hosting your csv file and create linked service in ADF to read the file. Once that is done, simply ingest the file and point it to the sink which is your SQL Azure database.
https://learn.microsoft.com/en-us/azure/data-factory/connector-file-system

BigQuery - load a datasource in Google big query

I have a MySQL DB in AWS and can I use the database as a data source in Big Query.
I m going with CSV upload to Google Cloud Storage bucket and loading into it.
I would like to keep it Synchronised by directly giving the data source itself than loading it every time.
You can create a permanent external table in BigQuery that is connected to Cloud Storage. Then BQ is just the interface while the data resides in GCS. It can be connected to a single CSV file and you are free to update/overwrite that file. But not sure if you can link BQ to a directory full of CSV files or even are tree of directories.
Anyway, have a look here: https://cloud.google.com/bigquery/external-data-cloud-storage