Auditing Azure File Storage Service - azure-storage

It is documented that Storage Analytics logging currently does not work for File storage service.
Storage Analytics metrics are available for the Blob, Queue, Table,
and File services.
Storage Analytics logging is available for the Blob, Queue, and Table
services.
https://learn.microsoft.com/en-us/rest/api/storageservices/enabling-and-configuring-storage-analytics
Knowing this I was hoping I could identify File service usage via the metrics, however I wasn't able to isolate something I could conclusively see as being for file usage. The capacity didn't seem to go up and ingress / egress I couldn't isolate as being just for files.
How best to audit File usage?

There is a workaround for getting metrics/analytics on storage services, specifically Azure files. It is not in storage analytics as of yet.
There is an option in the .net SDK, which allows you to view different metrics. Though, you have to use the resource ID, this is done via Azure Storage Metrics:
If you want to list the metric definitions for blob, table, file, or queue, you must specify different resource IDs for each service with the API.
Code Sample:
public static async Task ListStorageMetricDefinition()
{
// Resource ID for storage account
var resourceId = "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}";
var subscriptionId = "{SubscriptionID}";
// How to identify Tenant ID, Application ID and Access Key: https://azure.microsoft.com/documentation/articles/resource-group-create-service-principal-portal/
var tenantId = "{TenantID}";
var applicationId = "{ApplicationID}";
var accessKey = "{AccessKey}";
// Using metrics in Azure Monitor is currently free. However, if you use additional solutions ingesting metrics data, you may be billed by these solutions. For example, you are billed by Azure Storage if you archive metrics data to an Azure Storage account. Or you are billed by Operation Management Suite (OMS) if you stream metrics data to OMS for advanced analysis.
MonitorClient readOnlyClient = AuthenticateWithReadOnlyClient(tenantId, applicationId, accessKey, subscriptionId).Result;
IEnumerable<MetricDefinition> metricDefinitions = await readOnlyClient.MetricDefinitions.ListAsync(resourceUri: resourceId, cancellationToken: new CancellationToken());
foreach (var metricDefinition in metricDefinitions)
{
//Enumrate metric definition:
// Id
// ResourceId
// Name
// Unit
// MetricAvailabilities
// PrimaryAggregationType
// Dimensions
// IsDimensionRequired
}
}
Source:Azure Storage metrics in Azure Monitor
And you can also do it via Portal as below:

Related

Integrate BigQuery SubPub and Cloud Functions

I'm in a project the we need to use BigQuery, PubSub, Logs explorer and Cloud Functions.
The project:
Every time certain event occurs (like an user accepting cookies), a system inserts a new query into BigQuery with a lot of columns (params) like: utm_source, utm_medium, consent_cookies, etc...
Once I have this new query in my table I need to read the columns and get the values to use in a cloud function.
In the cloud function I want to use those values to make api calls.
What I manage to do so far:
I created a log routing sink that filter the new entries and send the log to my PubSub topic.
Where I'm stuck:
I want to create a Cloud function that triggers every time a new log comes in and in that function I want to access the information that is contained in the log, such as utm_source, utm_medium, consent_cookies, etc... And use values to make api calls.
Anyone can help me? Many MANY thanks in advance!
I made a project to illustrate the flow:
Insert to table:
2.From this insertion create a sink in logging: (filtering)
Now every time I create a new query it goes to PUB/SUB i get the log of the query
What I want to do is to trigger a function on this topic and use the values I have in the query to do operations like call api etc...
So far I was able to write this code:
"use strict";
function main() {
// Import the Google Cloud client library
const { BigQuery } = require("#google-cloud/bigquery");
async function queryDb() {
const bigqueryClient = new BigQuery();
const sqlQuery = `SELECT * FROM \`mydatatable\``;
const options = {
query: sqlQuery,
location: "europe-west3",
};
// Run the query
const [rows] = await bigqueryClient.query(options);
rows.forEach((row) => {
const username = row.user_name;
});
}
queryDb();
}
main();
Now I'm again stuck, Idont know how to get the correct query from the sink I created and use the info to make my calls...
You have 2 solutions to call your Cloud Functions from a PubSub message
HTTP Functions: You can set up a HTTP call. Create your Cloud Function in trigger-http, and create a push subscription on your PubSub topic to call the Cloud Functions. Don't forget to add security (make your function private and enable security on PubSub) because your function is publicly accessible
Background functions: You can bind directly your Cloud Functions to PubSub topic. A subscription is automatically created and linked to the Cloud Functions. The security is built-in.
And, because you have 2 types of functions, you have 2 different function signatures. I provide you both, the processing is the (quite) same.
function extractQuery(pubSubMessage){
// Decide base64 the PubSub message
let logData = Buffer.from(pubSubMessage, 'base64').toString();
// Convert it in JSON
let logMessage= JSON.parse(logData)
// Extract the query from the log entry
let query = logMessage.protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.query.query
console.log(query)
return query
}
// For HTTP functions
exports.bigqueryQueryInLog = (req, res) => {
console.log(req.body)
const query = extractQuery(req.body.message.data)
res.status(200).send(query);
}
// For Background functions
exports.bigqueryQueryInLogTopic = (message, context) => {
extractQuery(message.data)
};
The query logged is the insert into... that you have in your log entry. Then, you have to parse your SQL request to extract the part that you want.

Azure blob Storage: Copy blobs with access tier ARCHIVE within the same Azure storage account is not working

I'm using the startCopy API from azure-storage java sdk version 8.6.5 to copy blobs between containers within the same storage account. As per the docs, it will copy a block blob's contents, properties, and metadata to a new block blob. Does this also mean source and destination access tier will match ?
String copyJobId = cloudBlockBlob.startCopy(sourceBlob);
If the source blob access tier is ARCHIVE, I am getting the following exception -
com.microsoft.azure.storage.StorageException: This operation is not permitted on an archived blob.
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87) ~[azure-storage-8.6.5.jar:?]
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:305) ~[azure-storage-8.6.5.jar:?]
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:196) ~[azure-storage-8.6.5.jar:?]
at com.microsoft.azure.storage.blob.CloudBlob.startCopy(CloudBlob.java:791) ~[azure-storage-8.6.5.jar:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.startCopy(CloudBlockBlob.java:302) ~[azure-storage-8.6.5.jar:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.startCopy(CloudBlockBlob.java:180) ~[azure-storage-8.6.5.jar:?]
As shown below, I used startCopy API to copy all blobs in container02(src) to container03(destination). Blob with access tier ARCHIVE failed and also test1.txt blob's access tier is not same as in source.
I just want to confirm if this is expected or I am not using the right API and need to set these properties explicitly if I need both source and destination to look same??
Thanks in Advance!!!
1. Blob with access tier ARCHIVE failed
You cannot execute the startCopy operation when the access tier is ARCHIVE.
Please refer to this official documentation:
While a blob is in archive storage, the blob data is offline and can't be read, overwritten, or modified. To read or download a blob in archive, you must first rehydrate it to an online tier. You can't take snapshots of a blob in archive storage. However, the blob metadata remains online and available, allowing you to list the blob, its properties, metadata, and blob index tags. Setting or modifying the blob metadata while in archive is not allowed; however you may set and modify the blob index tags. For blobs in archive, the only valid operations are GetBlobProperties, GetBlobMetadata, SetBlobTags, GetBlobTags, FindBlobsByTags, ListBlobs, SetBlobTier, CopyBlob, and DeleteBlob.
2. test1.txt blob's access tier is not same as in source.
The access tier of the copied blob may be related to the default access tier.
Solution:
You may need to move the files from archive storage to the hot or cool access tier. Or you can use this API and specify standardBlobTier and rehydratePriority:
public final String startCopy(final CloudBlockBlob sourceBlob, String contentMd5, boolean syncCopy, final StandardBlobTier standardBlobTier, RehydratePriority rehydratePriority, final AccessCondition sourceAccessCondition, final AccessCondition destinationAccessCondition, BlobRequestOptions options, OperationContext opContext)

Streaming data to Bigquery using Appengine

I'm collecting data (deriving from cookies installed in some websites) in BigQuery using a streaming approach with a Python code in App Engine.
The function I use to save the data is the following:
def stream_data(data):
PROJECT_ID = "project_id"
DATASET_ID = "dataset_id"
_SCOPE = 'https://www.googleapis.com/auth/bigquery'
credentials = appengine.AppAssertionCredentials(scope=_SCOPE)
http = credentials.authorize(httplib2.Http())
table = "table_name"
body = {
"ignoreUnknownValues": True,
"kind": "bigquery#tableDataInsertAllRequest",
"rows": [
{
"json": data,
},
]
}
bigquery = discovery.build('bigquery', 'v2', http=http)
bigquery.tabledata().insertAll(projectId=PROJECT_ID, datasetId=DATASET_ID, tableId=table, body=body).execute()
I have deployed the solution on two different App Engine instances and I get different result. My question is: how is it possible?
On the other hand comparing the results with Google Analytics metrics I also notice that not all the data are stored in BigQuery. Do you have any idea about this problem?
In your code there isn't a query exception handling during the insertAll operation. If BigQuery can't write data, you don't catch the exception.
In your last line try this code:
bQreturn = bigquery.tabledata().insertAll(projectId=PROJECT_ID, datasetId=DATASET_ID, tableId=table, body=body).execute()
logging.debug(bQreturn)
In this way, on Google Cloud Platform log, you can easily find a possible error in the insertAll operation.
When using insertAll() method you have to keep this in mind:
Data is streamed temporarily in the streaming buffer which has
different availability characteristics than managed storage. Certain
operations in BigQuery do not interact with the streaming buffer, such
as table copy jobs and API methods like tabledata.list {1}
If you are using the table preview, streaming buffered entries may not be visible.
Doing SELECT COUNT(*) from your table should return your total number of entries.
{1}: https://cloud.google.com/bigquery/troubleshooting-errors#missingunavailable-data

Akka HTTP Source Streaming vs regular request handling

What is the advantage of using Source Streaming vs the regular way of handling requests? My understanding that in both cases
The TCP connection will be reused
Back-pressure will be applied between the client and the server
The only advantage of Source Streaming I can see is if there is a very large response and the client prefers to consume it in smaller chunks.
My use case is that I have a very long list of users (millions), and I need to call a service that performs some filtering on the users, and returns a subset.
Currently, on the server side I expose a batch API, and on the client, I just split the users into chunks of 1000, and make X batch calls in parallel using Akka HTTP Host API.
I am considering switching to HTTP streaming, but cannot quite figure out what would be the value
You are missing one other huge benefit: memory efficiency. By having a streamed pipeline, client/server/client, all parties safely process data without running the risk of blowing up the memory allocation. This is particularly useful on the server side, where you always have to assume the clients may do something malicious...
Client Request Creation
Suppose the ultimate source of your millions of users is a file. You can create a stream source from this file:
val userFilePath : java.nio.file.Path = ???
val userFileSource = akka.stream.scaladsl.FileIO(userFilePath)
This source can you be use to create your http request which will stream the users to the service:
import akka.http.scaladsl.model.HttpEntity.{Chunked, ChunkStreamPart}
import akka.http.scaladsl.model.{RequestEntity, ContentTypes, HttpRequest}
val httpRequest : HttpRequest =
HttpRequest(uri = "http://filterService.io",
entity = Chunked.fromData(ContentTypes.`text/plain(UTF-8)`, userFileSource))
This request will now stream the users to the service without consuming the entire file into memory. Only chunks of data will be buffered at a time, therefore, you can send a request with potentially an infinite number of users and your client will be fine.
Server Request Processing
Similarly, your server can be designed to accept a request with an entity that can potentially be of infinite length.
Your questions says the service will filter the users, assuming we have a filtering function:
val isValidUser : (String) => Boolean = ???
This can be used to filter the incoming request entity and create a response entity which will feed the response:
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.model.HttpResponse
import akka.http.scaladsl.model.HttpEntity.Chunked
val route = extractDataBytes { userSource =>
val responseSource : Source[ByteString, _] =
userSource
.map(_.utf8String)
.filter(isValidUser)
.map(ByteString.apply)
complete(HttpResponse(entity=Chunked.fromData(ContentTypes.`text/plain(UTF-8)`,
responseSource)))
}
Client Response Processing
The client can similarly process the filtered users without reading them all into memory. We can, for example, dispatch the request and send all of the valid users to the console:
import akka.http.scaladsl.Http
Http()
.singleRequest(httpRequest)
.map { response =>
response
.entity
.dataBytes
.map(_.utf8String)
.foreach(System.out.println)
}

MobileFirst 8 : Unexpected error encountered while storing data

We are using UserAuthenticationSecurityCheck to authenticate user.
If verification is successful, the MFP server will store the user attributes.
public class AuthSecurityCheck extends UserAuthenticationSecurityCheck {
static Logger logger = Logger.getLogger(AuthSecurityCheck.class.getName());
private String userId, displayName;
private JSONObject attrObject;
private String errorMessage;
#Override
protected AuthenticatedUser createUser() {
Map<String, Object> userAttrsMap = new HashMap<String, Object>();
userAttrsMap.put("attributes",attrObject);
return new AuthenticatedUser(userId, displayName, this.getName(), userAttrsMap);
}
...
}
But if we store larger data(when userAttrsMap is large enough), we will get the 500 error.
errorMsg: Unexpected error encountered while storing data
The error is shown below:
Full source is on Github: https://github.com/DannyYang/PMR_CreateUserStoredLargeData
MFP version:
cordova-plugin-mfp 8.0.2017102115
MFP DevelopKit : 8.0.0.00-20171024-064640
The issue happens owing to the size of the data you are holding within the AuthenticatedUser object and thereby the Securitycheck's state.
MFP runtime saves the state of the securitycheck along with all the attributes to the attribute store . This involves serializing the security check state and persisting it to the DB. With a large object ( the custom map you have) this persistence operation fails and ends in a transaction rollback. This happens because the data you are trying to persist is too big and exceeds the allocated size.
SecurityCheck’s design consideration is to use it for a security check ( validation) and creating an identity object. Within your security check implementation, you have the following:
//Here the large data is assigned to the variable.
attrObject = JSONObject.parse(largeJSONString);
//This data is set into the AuthenticatedUser object.
Map<String, Object> userAttrsMap = new HashMap<String, Object>();
userAttrsMap.put("attributes",attrObject);
return new AuthenticatedUser(userId, displayName, this.getName(), userAttrsMap);
In this scenario this large data becomes part of the Securitycheck itself and will be serialized and attempted for persistence into the attribute store. When this data does not fit in the column, the transaction is rolled back and the error condition is propagated to the end user. Hence the error message you see - ” Unexpected error occured while storing data”. Enabling detailed trace will indicate the actual cause of the issue in the server trace logs.
Either way, this approach is not recommended at all in production systems because:
a) Every request from the client reaching the server goes through security introspection which will involve the server to load, check and update the securitycheck’s state. On systems taking heavy load ( production ) this can and will have performance costs. The process involves serializing the data and deserializing it later. In a distributed topology ( cluster or farms ) the request may end up in any of the nodes and these nodes will have to load and later save the security check's state to the store. All this will impact performance of your system.
b) At the end of successful authentication, the AuthenticatedUser object is propagated to the client application indicating completion of the login flow . Even if the SecurityCheck state were to be stored successfully in the attribute store ( with large data) transmitting large payloads over the network just to indicate successful login will be counter productive. For the enduser it may appear as if nothing has happened since they entered the credentials, while data indicating success is still getting downloaded.
c) Under heavy loads , the server will be strained from both a) and b) above.
You should consider cutting down the data that is propagated to the client within the authenticateduser object. Keep the data minimal within the AuthenticatedUser object. Instead, you should offload obtaining large data to resource adapters , that can be accessed post successful login .