Azure Storage / File System abstraction library - file-io

As part of the web application that I'd developing at the moment I have a requirement to write files out to storage. The will initially be hosted on Azure Websites, but in the future I would like to have the ability to host it on non-azure servers.
So - I'm looking for a library (and I hope that one exists) that would make it easy to switch between outputting files out to Azure Blob Storage or a local file system. Ideally something that would have a common API and would allow you to switch between the storage location by changing config files only.
I'm having some issues finding libraries that would have this sort of functionality and I hope someone can point me in the right direction.

Not sure if such library exists. If an abstraction library exists than I would have thought it would need to provide implementation of Azure, S3, local FileSytem, Rackspace etc.
Anyways, it's fairly straight forward to implement. Our project had two different version, cloud based and on-premise based with main real difference being the Blob storage. What we did was to build the core upload/download etc logic against an interface, have two difference implementation of it one for Azure Blob storage and one for Local file storage used StructureMap to get a reference to concrete implementation based on config value
we obviously did not replicate each and every BLOB Storage API in the interface but only the minimal required by our system
Some code example:
Interface: (BlobBase is our custom class holding info such as container name, file name, content type etc. and BlobStorageProviderStatus is custom enum providing some status info. But you get the idea!)
public interface IBlobStorageProvider
{
void CreateContainer(string containerName);
BlobStorageProviderStatus UploadFile(BlobBase file, bool uploadAsNewversion, Stream data, int chunk, int totalChunks, out string version);
BlobStorageProviderStatus DownloadToStream(BlobBase file, Stream target, int chunkSize, IClientConnection clientConnection);
void Delete(BlobBase blobBase);
void DeleteDirectory(string directoryPath, string blobContainer);
BlobStorageProviderStatus UploadFile(string container, string folder, string fileName, Stream data, string contentType);
void DownloadToStream(string container, string filePath, Stream target);
}
Web.config has
<add key="AzureBlobStorage" value="true" />
and simplified version of StructureMap registeration:
x.For<IBlobStorageProvider>()
.Use(() => bool.Parse(ConfigurationManager.AppSettings["AzureBlobStorage"])
? new AzureBlobStorageProvider()
: new FileSystemStorageProvider());
and the actual usage sample:
IBlobStorageProvider blobStorage = ObjectFactory.GetInstance<IBlobStorageProvider>();
blobStorage.CreateContainer("image");

Related

Testing Azure Blob Storage SDK

I found this great Getting Started Guide for the Azure Blob Storage SDK and how to connect to my storage account.
A quick prototype showed that it already works, but I want to ensure this and the logic behind it via tests (either unit or integration tests).
I found this resource on an Azure Testing Library that can record HTTP requests of a pipeline and was wondering whether this is applicable to the Blob Storage SDK as well?
Are there any other options to properly test my applications code interaction with the Blob Storage SDK?
My idea would for example be:
Call a method on my client with a parameter
Take the blob name from the passed parameter and make a call to the blob storage container
Validate that the call was made to the correct container and blob via a test case
• I too just tried to follow the documentation link for various tasks to be performed regarding the Azure Blob storage using the .Net v12 SDK and the results were successful as follows: -
Also, if you want to call a method/task on your client/application using a parameter with respect to Blob storage, you can surely do so by using the ‘BlobServiceClient’ class. To know more on how to use it, please refer to the documentation link below: -
https://azure.github.io/azure-sdk-korean/dotnet_introduction.html
It clearly states on how to call the service name client and the methods for which to use the parameters for performing various tasks as shown in the sample code from that document below: -
namespace Azure.<group>.<service_name> {
// main service client class
public class <service_name>Client {
// simple constructors; don't use default parameters
public <service_name>Client(<simple_binding_parameters>);
public <service_name>Client(<simple_binding_parameters>, <service_name>ClientOptions options);
// 0 or more advanced constructors
public <service_name>Client(<advanced_binding_parameters>, <service_name>ClientOptions options = default);
// mocking constructor
protected <service_name>Client();
// service methods (synchronous and asynchronous)
public virtual Task<Response<<model>> <service_operation>Async(<parameters>, CancellationToken cancellationToken = default);
public virtual Response<model> <service_operation>(<parameters>, CancellationToken cancellationToken = default);
// other members
}
// options for configuring the client
public class <service_name>ClientOptions : ClientOptions {
}
}
Also, I would suggest you to please refer this community thread: -
Call .NET Web API method whenever a new file is added to Azure Blob Storage

How generate a workflow in runtime with elsa workflow

With Elsa workflow designer possible to define a workflow and publish it, also can make a workflow programmatically by implementing the IWorkflow interface.
I need to make programmatically workflow at runtime, save it on the database and run it sometimes.
in the asp.net core project controller, I resolve IWorkflowBuilder as a dependency and make a workflow with WorkflowBuilder and return a WorkFlowblueprint object, but I don't know how I can store it and how to run it?
I also have Elsa dashboard on my project and I use EntityFramework Persistence for it.
Is there a way to convert a WorkflowBluePrint to WorkFlowDefination or generate WorkflowDefination from scratch programmatically?
Does everyone have any idea?
Although it might theoretically be possible to store an IWorkflow implementation in the database, there are some caveats that make this tricky to say the least. Here is why:
A workflow definition created by the designer consists purely of a list of activities and connections between them. Because of that, everything is easily serialized into JSON and stored in the database.
However, when you write a C# class, you can do more fancy things, such as configuring activities using C# lambda expressions and implement "inline" activity code. When you try to serialize this to JSON, these C# expressions will be serialized using just their type names.
Although there might be ways to somehow store a programmatic workflow into the database, perhaps even by storing a compiled assembly in the DB, I don't think it's worth the trouble because there are better ways.
You said that you need a programmatic workflow that you only run sometimes.
To achieve that, you do not need to store a workflow in the database.
The way Elsa works is that all workflow sources are converted into a thing called a Workflow Blueprint.
A workflow blueprint is what represents an executable workflow where all the necessary details are dehydrated that the workflow invoker can use.
There are different "sources" to establish these workflow blueprints by means of classes that implement IWorkflowProvider, of which there are three:
Programmatic Workflow Provider
Database Workflow Provider
Blob Storage Workflow Provider
The programmatic provider is what turns IWorkflow implementations into workflow blueprints, while the database provider turns workflow definitions into blueprints. The blob storage provider is similar, except it turns JSON files into blueprints.
The bottom line is that the origin of a workflow blueprint doesn't matter for the workflow engine.
All workflow blueprints are accessed through a service called the workflow registry, which you can use to load & execute a given workflow.
For example, if you have a programmatic workflow called MyWorkflow, you can execute it whenever you want like this:
public class MyWorkflow : IWorkflow
{
public void Build(IWorkflowBuilder builder)
{
builder.WriteLine("Hello World!");
}
}
[ApiController]
[Route("my-workflow")]
public class MyWorkflowController : Controller
{
private readonly IWorkflowRegistry _workflowRegistry;
private readonly IStartsWorkflow _workflowStarter;
public MyWorkflowController(IWorkflowRegistry workflowRegistry, IStartsWorkflow workflowStarter)
{
_workflowRegistry = workflowRegistry;
_workflowStarter = workflowStarter;
}
[HttpGet("run")]
public async Task<IActionResult> RunMyWorkflow(CancellationToken cancellationToken)
{
// 1. Get my workflow blueprint.
var myWorkflowBlueprint = (await _workflowRegistry.GetWorkflowAsync<MyWorkflow>(cancellationToken))!;
// 2. Run the workflow.
await _workflowStarter.StartWorkflowAsync(myWorkflowBlueprint, cancellationToken: cancellationToken);
return Ok();
}
}
Invoking this controller will execute MyWorkflow.
As you can see, there is no need to store the workflow in the database in order to be able to execute it on demand. Even if you did store the workflow in the database, the code would be the same, provided that the name of the workflow remains "MyWorkflow". Under the covers, the GetWorkflowAsync<TWorkflow> is simply an extension method that uses the type name to find the workflow by name. If you wanted to load a workflow by name for which there's no workflow class defined, you would simply use FindByNameAsync, or FindAsync if all you had is a workflow definition ID.

SpringData Gemfire inserting fake date on Dev env

I am developing some app using Gemfire and it would be great to be able to provide some fake data while in Dev environment.
So instead of doing it in the code like I do today, I was thinking about using spring application-context.xml do pre-load some dummy data in the region I am currently working on. Something close to what DBUnit does but for DEV not Test scope.
Later I could just switch envs on Spring and that data would not be loaded.
Is it possible to add data using SpringData Gemfire to a local data grid?
Thanks!
There is no direct support in Spring Data GemFire to load data into a GemFire cluster. However, there are several options afforded to a SDG/GemFire developer to load data.
The most common approach is to define a GemFire CacheLoader attached to the Region. However, this approach is "lazy" and only loads data from a (potentially) external data source on a cache miss. Of course, you could program the logic in the CacheLoader to "prefetch" a number of entries in a somewhat "predictive" manner based on data access patterns. See GemFire's User Guide for more details.
Still, we can do better than this since it is more likely that you want to "preload" a particular data set for development purposes.
Another, more effective technique, is to use a Spring BeanPostProcessor registered in your Spring ApplicationContext that post processes your "Region" bean after initialization. For instance...
Where the RegionPutAllBeanPostProcessor is implemented as...
package example;
public class RegionPutAllBeanPostProcessor implements BeanPostProcessor {
private Map regionData;
private String targetRegionBeanName;
protected Map getRegionData() {
return (regionData != null ? regionData : Collections.emptyMap());
}
public void setRegionData(final Map regionData) {
this.regionData = regionData;
}
protected String getTargetRegionBeanName() {
Assert.state(StringUtils.hasText(targetRegionBeanName), "The target Region bean name was not properly specified!");
return targetBeanName;
}
public void setTargetRegionBeanName(final String targetRegionBeanName) {
Assert.hasText(targetRegionBeanName, "The target Region bean name must be specified!");
this.targetRegionBeanName = targetRegionBeanName;
}
#Override
public Object postProcessBeforeInitialization(final Object bean, final String beanName) throws BeansException {
return bean;
}
#Override
#SuppressWarnings("unchecked")
public Object postProcessAfterInitialization(final Object bean, final String beanName) throws BeansException {
if (beanName.equals(getTargetRegionBeanName()) && bean instanceof Region) {
((Region) bean).putAll(getRegionData());
}
return bean;
}
}
It is not too difficult to imagine that you could inject a DataSource of some type to pre-populate the Region. The RegionPutAllBeanPostProcessor was designed to accept a specific Region (based on the Region beans ID) to populate. So you could defined multiple instances each taking a different Region and different DataSource (perhaps) to populate the Region(s) of choice. This BeanPostProcess just take a Map as the data source, but of course, it could be any Spring managed bean.
Finally, it is a simple matter to ensure that this, or multiple instances of the RegionPutAllBeanPostProcessor is only used in your DEV environment by taking advantage of Spring bean profiles...
<beans>
...
<beans profile="DEV">
<bean class="example.RegionPutAllBeanPostProcessor">
...
</bean>
...
</beans>
</beans>
Usually, loading pre-defined data sets is very application-specific in terms of the "source" of the pre-defined data. As my example illustrates, the source could be as simple as another Map. However, it would be a JDBC DataSource, or perhaps a Properties file or well, anything for that matter. It is usually up to the developers preference.
Though, one thing that might be useful to add to Spring Data GemFire would be to load data from a GemFire Cache Region Snapshot. I.e. data that may have been dumped from a QA or UAT environment, or perhaps even scrubbed from PROD for testing purposes. See GemFire Snapshot Service for more details.
Also see the JIRA ticket (SGF-408) I just filed to add this support.
Hopefully this gives you enough information and/or ideas to get going. Later, I will add first-class support into SDG's XML namespace for preloading data sets.
Regards,
John

Using Redis as a cache storage for for multiple application on the same server

I want to use Redis as a cache storage for multiple applications on the same physical machine.
I know at least two ways of doing it:
by running several Redis instances on different ports;
by using different Redis databases for different applications.
But I don't know which one is better for me.
What are advantages and disadvantages of these methods?
Is there any better way of doing it?
Generally, you should prefer the 1st approach, i.e. dedicated Redis servers. Shared databases are managed by the same Redis process and can therefore block each other. Additionally, shared databases share the same configuration (although in your case this may not be an issue since all databases are intended for caching). Lastly, shared databases are not supported by Redis Cluster.
For more information refer to this blog post: https://redislabs.com/blog/benchmark-shared-vs-dedicated-redis-instances
We solved this problem by namespacing the keys. Intially we tried using databases where each database ID would be used a specific applications. However, that idea was not scalable since there is a limited number of databases, plus in Premium offerings (like Azure Cache for Redis Premium instances with Sharding enabled), the concept of database is not used.
The solution we used is attaching a unique prefix for all keys. Each application would be annotated with a unique moniker which would be prefixed infront of each key.
To reduce churn, we have built a framework (URP). If you are using StackExchange.Redis then yuo will be able to use the URP SDK directly. If it helps, I have added some of the references.
Source Code and Documentation - https://github.com/microsoft/UnifiedRedisPlatform.Core/wiki/Management-Console
Blog Post (idea) - https://www.devcompost.com/post/__urp
You can use different cache manager for each application will also work same way I am using.
like :
#Bean(name = "myCacheManager")
public CacheManager cacheManager(RedisTemplate<String, Object> redisTemplate) {
RedisCacheManager cacheManager = new RedisCacheManager(redisTemplate);
return cacheManager;
}
#Bean(name ="customKeyGenerator")
public KeyGenerator keyGenerator() {
return new KeyGenerator() {
#Override
public Object generate(Object o, Method method, Object... objects) {
// This will generate a unique key of the class name, the method name,
// and all method parameters appended.
StringBuilder sb = new StringBuilder();
sb.append(o.getClass().getName());
sb.append(method.getName());
for (Object obj : objects) {
sb.append(obj.toString());
}
return sb.toString();
}
};
}

Best practice for WCF services used both locally and remotely that process large files on the filesystem?

I'm creating a WCF service that may be used either locally or remotely, and processes files sometimes using third-party components applications that unfortunately require as input a path to actual file on the filesystem, not a .net Stream or anything like that. Is there a standard approach for this situation, in terms of what the parameters to contract operations should be etc.? Although I suppose this can't be vital since it ultimately has to perform acceptably in both the local and remote cases, I'd prefer if, in the local case, it didn't have to read the whole file from the filesystem, include the contents in the message, and rematerialize it again on the filesystem, but for remote use this is necessary. Is there a way to do this e.g. by having an FSRefDoc type which serializes differently depending on whether it's used locally or remotely?
edit: To clarify: The problem is that I want to send different pieces of information entirely in the two cases. If I'm controlling a local service, I can just send a path to the file on the local filesystem, but if it's a remote service, I have to send the file contents themselves. Of course I can send the contents in both cases, but that means I lose performance in the local case. Maybe I shouldn't be worried about this.
OK,
Following your update, I would consider the following.
1) Create a method that takes a path. Expose this via a named pipe binding and use this locally.
2) Create a method that takes a file (stream/byte array etc). Expose this using an appropriate binding (on a different end point) for non local computers (in a LAN scenario TCP is usually your best bet).
Then all you need to do is make sure you don't duplicate the same business logic. So in a nutshell- create 2 different service interfaces, 2 different end points and 2 different bindings.
Well, you really touch on two separate issues:
local vs. remote service availability
"normal" vs. streamed service (for large files)
In general, if your service works behind a corporate firewall on a LAN, you should use the NetTcpBinding since it's the fastest and most efficient. It's fast and efficient because it uses binary message encoding (vs. text message encoding over the internet).
If you must provide a service for the "outside" world, you should try to use a binding that's as interoperable as possible, and here your choices are basicHttpBinding (totally interoperable - "old" SOAP 1.1 protocols) which cannot be secured too much, and wsHttpBinding which offers a lot more flexibility and options, but is less widely supported.
Since you can easily create a single service with three endpoints, you can really create your service and then define these three endpoints: one for local clients using NetTcpBinding, one of the widest availability using basicHttpBinding, and optionally another one with wsHttpBinding.
That's one side of the story.
The other is: for your "normal" service calls, exchanging a few items of information (up to a few KB in size), you should use the normal default behavior of "buffered transfer" - the message is prepared completely in a buffer and sent as a whole.
However, for handling large files, you're better off using a streaming transfer mode - either "StreamedResponse" if you want clients to be able to download files from your server, or "StreamedRequest" if you want clients to be able to uplaod files, or just plain "Streamed" if you send files both ways.
So besides the three "regular" endpoints, you should have at least another endpoint for each binding that handles streaming exchange of data, i.e. upload/download of files.
This may seems like a lot of different endpoints - but that's really not a problem, your clients can connect to whatever endpoint(s) are appropriate for them - regular vs. streamed and internal/local (netTcpBinding) vs. external (basicHttpBinding) as they need - and in the end, you write the code only once!
Ah , the beauty of WCF! :-)
Marc
UPDATE:
OK, after your comment, this is what I would do:
create a ILocalService service contract with a single method GetFile that returns a path and file name
create an implementation for the service contract
host that service on an endpoint with netTcpBinding (since it's internal, local)
[ServiceContract]
interface ILocalService
{
[OperationContract]
string GetFile(......(whatever parameters you need here).....);
}
class LocalService : ILocalService
{
string GetFile(......(whatever parameters you need here).....)
{
// do stuff.....
return fileName;
}
}
and secondly:
create a second service contract IRemoteService with a single method GetFile which doesn't return a file name as string, but instead returns a stream
create an implementation for the service contract
host that service on an endpoint with basicHttpBinding for internet use
make sure to have transferMode="StreamedResponse" in your binding configuration, to enable streaming back the file
[ServiceContract]
interface IRemoteService
{
[OperationContract]
Stream GetFile(......(whatever parameters you need here).....);
}
class RemoteService : IRemoteService
{
Stream GetFile(......(whatever parameters you need here).....)
{
// do stuff.....
FileStream stream = new FileStream(....);
return stream;
}
}