Replication between two storage accounts in different regions, must be read/writeable and zone redundant - azure-storage

We are setting up an active/active configuration using either front door or traffic manager as our front end. Our services are located in both Central and East US 2 paired regions. There is an AKS cluster in each region. The AKS clusters will write data to a storage account located in their region. However, the files in the storage accounts must be the same in each region. The storage accounts must be zone redundant and read/writeable in each region at all times, thus none of the Microsoft replication strategies work. This replication must be automatic, we can't have any manual process to do the copy. I looked at Data Factory but it seems to be regional, so I don't think that would work, but it's a possibility....maybe. Does anyone have any suggestions on the best way to accomplish this task?

I have tested in my environment.
Replication between two storage accounts can be implemented using the Logic App.
In the logic app, we can create two workflows. One for replicating data from storage account 1 to storage account 2. Other for replicating data from storage account 2 to storage account 1.
I have tried to replicate blob data between storage accounts in different regions.
The workflow is :
When a blob is added or modified in the storage account 1, the blob will be copied to the storage account 2
Trigger : When a blob is added or modified (properties only) (V2) (Use connection setting of storage account1)
Action : Copy blob (V2) ) (Use connection setting of storage account2)
Similar way, we can create another workflow for replication of data from Storage Account 2 to Storage Account 1.
Now, the data will be replicated between the two storage accounts.

Related

Azure Sentinel referencing large sets of data

I've been trying to find the most effective (elegant) solution to achieve what I'm trying to do. I'd like to hear from the community, thank you.
Situation:
Need to geo-enrich IP Address records on Sentinel. Example: Successful SigninLogs, since MSFT enrichment sometimes generates "Unknown" results in the IP enrichment maps.
External reference file (subnet, country_code, country_name) are available publicly, however the size and # of records are rather large. (~12MB, 200K+records).
Issue:
Tried using storage account blob to host the "reference table", apparently hitting the limit on max. blob size in Storage Account.
Looks like there are max. 30.000 records on Workbooks to read from external sources using 'externaldata' command. Hence, only partial reference data can be read and referred to.
Options considered:
Ingest the reference table into the log analytics workspace, do a join/lookup to this custom reference table for enrichment
Export the IP addresses from SigninLogs table to a blob storage, enrich the IP address using logicapps, and then put it back to a 'reference' blob storage. then read the 'reference' blob storage using 'externaldata' syntax.
Limitation Observed:
Came to a realization that Sentinel couldn't perform API call for enrichment from external data. (CMIIW). I've done similar stuff with Splunk, and we could enrich the data on the fly, by calling in multiple API calls to outside database.
Ingest the Data - As you've mentioned, ingest the data and join the tables. You would need to regularly ingest this though to ensure you can lookup the data within the desired time range (e.g. If you have an Analytics Rule, then this only looks up data for a 14 day period).
Use a Playbook - If you want the Geo-IP lookup post incident, you can perform this with a Logic App
Use Jupyter Notebooks - This have the flexibility to perform API calls against external locations and join the data to that hosted in Sentinel. An example notebook is the IP Explorer Notebook. Use Jupyter notebooks to hunt for security threats
Threat Intelligence - Microsoft enriches all imported threat intelligence indicators with GeoLocation and WhoIs data, which is displayed together with other indicator details.
Since March 2022, you can upload large CSV files into a Sentinel Watchlist. This way, you can upload a complete GeoIP database and perform ipv4_lookups. This blog post explains you how to do this: https://cryptsus.com/blog/enrich-geolocation-sentinel-siem.html

S3 Integration with Snowflake: Best way to implement multi-tenancy?

My team is planning on building a data processing pipeline that will involve S3 integration with Snowflake. This article from Snowflake shows that an AWS IAM role must be created in order for Snowflake to access S3's data.
However, in our pipeline, we need to ensure multi-tenancy and data isolation between users. For example, let's assume that Alice and Bob has files in S3 under "s3://bucket-alice/file_a.csv" and "s3://bucket-bob/file_b.csv" respectively. Then, we want to make sure that, when staging Alice's data onto Snowflake, Alice can only access "s3://bucket-alice" and nothing under "s3://bucket-bob". This means that individual AWS IAM roles must be created for each user.
I do realize that Snowflake has it's own access control system, but my team wants to make sure that data isolation is fully achieved from the S3-to-Snowflake stage of the pipeline, and not only relying on Snowflake's access control.
We are worried that this will not be scalable, as AWS sets a limit of 5000 IAM users, and that will not be enough as we scale our product. Is this the only way of ensuring data multi-tenancy, and does anyone have a real-world application example of something like this?
Have you explored leveraging Snowflake's Internal Stage, instead? By default, every user gets their own internal stage that only they have permissions to from within Snowflake and NO access outside of Snowflake. Snowflake offers the ability to move data in and out of that Internal Stage using just about every driver/connector that Snowflake has available. This said, any pipeline/workflow that is being leveraged by 5000+ users would be able to use these connectors to load data to Snowflake Internal Stage (S3) without the need for any additional AWS IAM Users. Would that be a sufficient solution for your situation?

Many 4 character storage containers being created in my storage account

I have an Azure storage account.
For a while now, something has been creating 4 character empty containers as shown here, there are hundreds of them:
This storage account is used by:
Function Apps
Document Db (Cosmos)
Terraform State
Container Registry for Docker images
It's not a big deal but I don't want millions of empty containers being created by an unknown process.
Note1: I have looked for any way to find more statistics / history of these folders but I cant find any
Note2: We don't have any custom code that creates storage containers in our release pipelines (ie... PowerShell or CLI)
thanks
Russ
It seems the containers are used to store logs of Azure Function. I have a storage account just for azure function and web app. We could see it has the containers like yours via Storage Explorer.

Windows Azure File storage perfermance

Is there a solution to get the below informations of Window File Azure storage Account using Windows Azure Storage Client Library:
Azure Storage Account Capacity
Azure Storage Free and used Space
Azure Storage Account State (Active, Disable, Enable ….)
Client Transfer files (Mo, GO … ) per month, days …
Azure Storage Account Performance
...
Thanks
As far as I know, a azure standard account contains multiple services. Blob, table, queue, file.
If you want to know the information about he file service, you could use Windows Azure Storage Client Library. If you want to know the information about your storage account, I suggest you could use azure management library.
Azure Storage Account Capacity
As far as I know, the azure storage account capacity is 500TB.
Max size of a file share is 5TB.
Max size of a file is 1TB.
We could create multiple file share in one storage account. The only limit is the 500 TB storage account capacity.
More details, you could refer to this article.
Azure Storage Free and used Space
As far as I know, we could only get the quota and usage of a fileshare by using the Windows Azure Storage Client Library.
We could use CloudFileShare.Properties.Quota property to get the quota of the fileshare and use CloudFileShare.GetStats method to get the usage of the fileshare.
More details, you could refer to below codes:
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
"connectionstring");
CloudFileClient fileClient = storageAccount.CreateCloudFileClient();
CloudFileShare share = fileClient.GetShareReference("fileshare");
share.FetchAttributes();
//get the quota
int? i = share.Properties.Quota;
//get usage
var re = share.GetStats();
Console.WriteLine(i);
Console.WriteLine(re.Usage);
Azure Storage Account State (Active, Disable, Enable ….)
As far as I know, we couldn't get storage account state by using storage SDK. If you want to get this value, I suggest you could use azure management library. You could install it from Nuget package. You could get the StorageAccount.Properties.Status from the StorageAccounts class.
More details about how to use azure management library to access the storage account you could refer to this article.
Client Transfer files (Mo, GO … ) per month, days …
As far as I know, the Windows Azure Storage Client Library doesn't contain the method to get the client transfer files (Mo, GO … ) per month, days.
Here is a workaround, you could write codes to calculate the transfer files number in your application and store this number to azure table storage per day.(When uploading the file to the azure file storage, firstly get the number from the table and add one, then upload the number to the table storage)
If you want to get the number of the transfer files, you could use the azure table storage SDK to get the result.
Azure Storage Account Performance
As far as I know, if we want to check our azure storage account performance, we should firstly enable the diagnostics to log how the storage works. Then we could check the storage performance by using its service's metrics.
More details about how to access metrics data by using Windows Azure Storage Client Library. I suggest you could refer to this article.

What is default storage account in HDInsight

For a given HDInsight cluster I have seen that there is a 'Default Storage Account' and 'Linked Storage Account'. What does it mean? What's special for some account to be a default storage account for a given HDInsight cluster? How is this different than any arbitrary storage account with respect to that cluster. Probably that whenever we try to access that storage account from that cluster it wont ask for keys?
And how is that different from 'Linked Storage Account' for a given HDInsight cluster? I have seen that there is generally one default storage account for a HDInsight cluster but several Linked storage accounts.
Default storage account is like system drive. Log files are stored in the default storage account. Each cluster must have a default storage account. It is not supported to share a default storage account between two clusters. They are also some issues with reusing a default storage account for many times.
You can have many linked storage accounts. People usually store business data in linked storage accounts. In the past, you can only link a storage account during the cluster creation time. Now, you can use Ambari to add linked storage accounts to a live Linux-based cluster.
see https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage/