Azure Storage Calculation of Blob Container Size - azure-storage

Below i have function with an external which calculate blob container size
function getBlobContainerSize(container, callback) {
storageClient.listBlobsSegmented(container, null, null, function(error, result) {
var length=0,counter=result.entries.length;
for (var i = 0; i < result.entries.length; i++) {
getBlobSize(container, result.entries[i].name, function(size){
length+=size;
if(--counter===0) callback(length);
});
};
});
}
function getBlobSize(container, blob, callback){
storageClient.getBlobProperties(container, blob, function(err, result, response) {
if(result==null) callback(0);
else callback(parseInt(result.contentLength));
});
}
And the result is for an specific container 20907510, and in the Azure Storage Explorer i see 84.03M. Correct me if i am wrong but this two is different by 64mb, Azure Storage Explorer show much more mb in compare with my function.

Considering you're using standard storage, the method you're using to calculate the size of container, may not be the size which Azure Storage bills you for. For page and block blobs in standard storage, Azure storage bills for the data stored. More detailed information can be found here:
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/07/09/understanding-windows-azure-storage-billing-bandwidth-transactions-and-capacity.aspx
Your method will give you the sum of content lengths of all the blobs in your container. Azure-Storage might still be billing you for less.

Related

How to update an existing Blob in Azure Storage in .NET 6 or in ASP.NET Core

I have prepared some C# code to create a container in the Azure Storage and then I am uploading a file into that azure container. The code is below:
var connectionString = _settings.appConfig.StorageConnectionString;
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
BlobContainerClient blobContainer = blobServiceClient.GetBlobContainerClient("nasir-container");
await blobContainer.CreateIfNotExistsAsync(); // Create the container.
BlobClient blobClient = blobContainer.GetBlobClient(fileName); // Creating the blob
string fileName = "D:/Workspace/Adappt/MyWordFile.docx";
FileStream uploadFileStream = System.IO.File.OpenRead(fileName);
blobClient.Upload(uploadFileStream);
uploadFileStream.Close();
Now I have updated my MyWordFile.docx with more content. Now I would like to upload this updated file to the same blob storage. How can I do this? I also want to create versioning too so that I can get the file content based on the version.
Now I have updated my MyWordFile.docx with more content. Now I would
like to upload this updated file to the same blob storage. How can I
do this?
To update a blob, you simply upload the same file (basically use the same code you wrote to upload the file in the first place). Upload operation will overwrite an existing blob.
I also want to create versioning too so that I can get the file
content based on the version.
There are two ways you can implement versioning for blobs:
Automatic versioning: If you want Azure Blob Storage service to maintain versions of your blobs, all you need to do is enable versioning on the storage account. Once you enable that, anytime a blob is modified a new version of the blob will be created automatically for you by service. Please see this link to learn more about blob versioning: https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview.
Manual versioning: While automatic versioning is great but there could be many reasons why you would want to opt for manual versioning (e.g. you only want to version a few blobs and not all blobs, you are not using V2 Storage account etc.). If that's the case, then you can create a version of the blob by taking a snapshot of the blob before you update the blob. Snapshot creates a read-only copy of the blob at the point of time snapshot was taken. Please see this link to learn more about blob snapshot: https://learn.microsoft.com/en-us/azure/storage/blobs/snapshots-overview.
First you need to enable versioning in the blob storage through the portal in the storage account.
Just click on Disable it will take you to a different page and select enable version and click save
Here after uploading the blob when you update the blob it will automatically trigger the creating of versions.
public static async Task UpdateVersionedBlobMetadata(BlobContainerClient blobContainerClient,
string blobName)
{
try
{
// Create the container.
await blobContainerClient.CreateIfNotExistsAsync();
// Upload a block blob.
BlockBlobClient blockBlobClient = blobContainerClient.GetBlockBlobClient(blobName);
string blobContents = string.Format("Block blob created at {0}.", DateTime.Now);
byte[] byteArray = Encoding.ASCII.GetBytes(blobContents);
string initalVersionId;
using (MemoryStream stream = new MemoryStream(byteArray))
{
Response<BlobContentInfo> uploadResponse =
await blockBlobClient.UploadAsync(stream, null, default);
// Get the version ID for the current version.
initalVersionId = uploadResponse.Value.VersionId;
}
// Update the blob's metadata to trigger the creation of a new version.
Dictionary<string, string> metadata = new Dictionary<string, string>
{
{ "key", "value" },
{ "key1", "value1" }
};
Response<BlobInfo> metadataResponse =
await blockBlobClient.SetMetadataAsync(metadata);
// Get the version ID for the new current version.
string newVersionId = metadataResponse.Value.VersionId;
// Request metadata on the previous version.
BlockBlobClient initalVersionBlob = blockBlobClient.WithVersion(initalVersionId);
Response<BlobProperties> propertiesResponse = await initalVersionBlob.GetPropertiesAsync();
PrintMetadata(propertiesResponse);
// Request metadata on the current version.
BlockBlobClient newVersionBlob = blockBlobClient.WithVersion(newVersionId);
Response<BlobProperties> newPropertiesResponse = await newVersionBlob.GetPropertiesAsync();
PrintMetadata(newPropertiesResponse);
}
catch (RequestFailedException e)
{
Console.WriteLine(e.Message);
Console.ReadLine();
throw;
}
}
static void PrintMetadata(Response<BlobProperties> propertiesResponse)
{
if (propertiesResponse.Value.Metadata.Count > 0)
{
Console.WriteLine("Metadata values for version {0}:", propertiesResponse.Value.VersionId);
foreach (var item in propertiesResponse.Value.Metadata)
{
Console.WriteLine("Key:{0} Value:{1}", item.Key, item.Value);
}
}
else
{
Console.WriteLine("Version {0} has no metadata.", propertiesResponse.Value.VersionId);
}
}
The above code is from the following documentation.

Upload multipart blob with storage class RRS

Right now jcloud provides the ability to upload a blob in aws s3 storage with the following two storage classes : STANDARD and RRS. As I see there is the case when multipart upload is performed and RRS is selected as a storage class the blob will be uploaded with the default storage class i.e STANDARD.
For example:
blobStore.putBlob(null,blob,storageClass(ObjectMetadata.StorageClass.REDUCED_REDUNDANCY).multipart())
This will be uploaded with STANDARD storage class.
Is there a reason to ignore selected storage class when multipart is performed?
Why does everything different than RRS is considered to be STANDARD? If I select STANDARD_IA the storage class in use will be again STANDARD.
EDIT:
This is the code which is executed when blob is uploaded. As you can see using multipart excludes RRS storage class.
#Override
public String putBlob(String container, Blob blob, PutOptions options) {
if (options.isMultipart()) {
return putMultipartBlob(container, blob, options);
} else if ((options instanceof AWSS3PutOptions) &&
(((AWSS3PutOptions) options).getStorageClass() == REDUCED_REDUNDANCY)) {
return putBlobWithReducedRedundancy(container, blob);
} else {
return super.putBlob(container, blob, options);
}
}
jclouds 2.1.0 will improve support for storage classes in S3 and add support to the portable abstraction. Can you test against 2.1.0-SNAPSHOT and report any issues against JCLOUDS-1337?

How to limit blob storage file size in ASA output

I'm working with an Azure solution where there is an output to a blob storage in ASA. I'm getting output files in a folder tree structure like this: yyyy/mm/dd/hh (e.g. 2017/10/26/07). Sometimes, files in the blob storage are saving in the hour folder after that hour is past and, as the result, files can be very big. Is there a way to limit the size of those files from ASA?
There is no way to limit the size today, size limitation is based only on blob's limit. However ASA will create a new folder for every hour if your path is yyyy/mm/dd/hh though. Please note that this is based on System.Timestamp column, not wall clock time.
Yes you limit the file size and create new file once the existing file size reaches the limit by using below length property.
namespace Microsoft.Azure.Management.DataLake.Store.Models {
...
// Summary:
// Gets the number of bytes in a file.
[JsonProperty(PropertyName = "length")]
public long? Length { get; }
...
}
Below is the example with scenario:
scenario If file size exceeds 256MB OR 268435456 bytes then create new file or use existing file.
Create a function and use this function to determine the file path, below is the sample code snippet for function.
Code Snippet:
public static async Task<string> GetFilePath(DataLakeStoreClient client, string path) {
var createNewFile = false;
......
if (await client.GetFileSize(returnValue) >= 256 * 1024 * 1024)
{
returnValue = GetFilePath(path);
createNewFile = true;
}
......
}
public async Task<long?> GetFileSize(string filepath) {
return (await this._client.FileSystem.GetFileStatusAsync(_connectionString.AccountName, path)).FileStatus.Length;
}

Search and move files in S3 bucket according to metadata

I currently have a setup where audio files are uploaded to a bucket with user defined metadata. My next goal is to filter through the metadata and move the files to a different folder. Currently I have lambda a lambda function that converts the audio to mp3. So I need help to adjust the code so that the metadata persists through the encoding and is also stored in a database. And to create another function that searches for a particular metadata value and moves the corresponding files to another bucket.
'use strict';
console.log('Loading function');
const aws = require('aws-sdk');
const s3 = new aws.S3({ apiVersion: '2006-03-01' });
const elastictranscoder = new aws.ElasticTranscoder();
// return basename without extension
function basename(path) {
return path.split('/').reverse()[0].split('.')[0];
}
// return output file name with timestamp and extension
function outputKey(name, ext) {
return name + '-' + Date.now().toString() + '.' + ext;
}
exports.handler = (event, context, callback) => {
const bucket = event.Records[0].s3.bucket.name;
const key = event.Records[0].s3.object.key;
var params = {
Input: {
Key: key
},
PipelineId: '1477166492757-jq7i0s',
Outputs: [
{
Key: basename(key)+'.mp3',
PresetId: '1351620000001-300040', // mp3-128
}
]
};
elastictranscoder.createJob(params, function(err, data) {
if (err){
console.log(err, err.stack); // an error occurred
context.fail();
return;
}
context.succeed();
});
};
I also have done some research and know that metadata should be able to be pulled out by
s3.head_object(Bucket=bucket, Key=key)
S3 does not provide a mechanism for searching metadata.
The only way to do what you're contemplating using only native S3 capabilities is to iterate through the list of objects and send a HEAD request for each object but of course this does not scale well for large buckets and each of those requests comes with a charge, although it is a small one.
Currently there exists the S3 inventory tool that allow to extract information out of the S3 objects including metadata and the information can be retrieved for instance using Athena queries.
Details can be found here.

Windows Azure Storage Blobs to zip file with Express

I am trying to use this pluggin (express-zip). At the Azure Storage size we have getBlobToStream which give us the file into a specific Stream. What i do now is getting image from blob and saving it inside the server, and then res.zip it. Is somehow possible to create writeStream which will write inside readStream?
Edit: The question has been edited to ask about doing this in express from Node.js. I'm leaving the original answer below in case anyone was interested in a C# solution.
For Node, You could use a strategy similar to what express-zip uses, but instead of passing a file read stream in this line, pass in a blob read stream obtained using createReadStream.
Solution using C#:
If you don't mind caching everything locally while you build the zip, the way you are doing it is fine. You can use a tool such as AzCopy to rapidly download an entire container from storage.
To avoid caching locally, you could use the ZipArchive class, such as the following C# code:
internal static void ArchiveBlobs(CloudBlockBlob destinationBlob, IEnumerable<CloudBlob> sourceBlobs)
{
using (Stream blobWriteStream = destinationBlob.OpenWrite())
{
using (ZipArchive archive = new ZipArchive(blobWriteStream, ZipArchiveMode.Create))
{
foreach (CloudBlob sourceBlob in sourceBlobs)
{
ZipArchiveEntry archiveEntry = archive.CreateEntry(sourceBlob.Name);
using (Stream archiveWriteStream = archiveEntry.Open())
{
sourceBlob.DownloadToStream(archiveWriteStream);
}
}
}
}
}
This creates a zip archive in Azure storage that contains multiple blobs without writing anything to disk locally.
I'm the author of express-zip. What you are trying to do should be possible. If you look under the covers, you'll see I am in fact adding streams into the zip:
https://github.com/thrackle/express-zip/blob/master/lib/express-zip.js#L55
So something like this should work for you (prior to me adding support for this in the interface of the package itself):
var zip = zipstream(exports.options);
zip.pipe(express.response || http.ServerResponse.prototype); // res is a writable stream
var addFile = function(file, cb) {
zip.entry(getBlobToStream(), { name: file.name }, cb);
};
async.forEachSeries(files, addFile, function(err) {
if (err) return cb(err);
zip.finalize(function(bytesZipped) {
cb(null, bytesZipped);
});
});
Apologize if I've made horrible errors above; I haven't been on this for a bit.