Splitting Sensenet content repository into multiple databases - sensenet

Is there a way of splitting a Content repository into multiple databases? There is a great chance I'll have TBs of data, maybe even tens of TBs of data. Maintaining database bigger than 1 TB becomes an issue, so I can't imagine dealing with a bigger database. I've considered using Filestream, but having multiple databases would be much more viable solution.
If not, is there at least a way of having several repositories contained in a single web site?

Currently (as of version 7.2) sensenet requires a central database to connect to, you cannot split that into multiple parts.
There is the blob storage feature however that lets you store binaries outside of the main metadata database. You choose a blob storage implementation (e.g. the MongoDb blob provider), install it and you can start uploading files to sensenet. Binaries above a certain (configured) size will go to the external provider.
You'll have to take care of the backup of the blob storage though, because that is different for every db provider. At least the size of the metadata db will be significantly lower.

Related

What are some use cases for object storage?

What are some use cases for object storage, as opposed to file systems or block storage (database) systems?
From what I understand, object storage is mostly used for persistent storage for applications running on cloud systems. It seems to have a lot of overlap with file systems, except that the details of how the objects are stored is abstracted away so that apps can access them with simple web queries.
However, I'd love if someone could give examples of applications where this is actually used instead of or alongside the other two storage systems.
Some example use cases for object storage:
Off-site backups
Storing and serving user content (e.g. profile pictures)
Storing artifacts (e.g. JAR files, startup scripts) to be deployed to VMs
Distributing static content (e.g. video content for your users)
Caching intermediate data (e.g. individual frames from a render farm before assembly into output video)
Accepting input or providing output to a web service (as accepting data by POST can be difficult/inefficient for large input files).
archiving data for regulatory purposes
All these cases might be accompanied by a database to store metadata (ie to find the objects). Actually storing the data in the database would, however, exceed size limits or significantly harm database performance.
These use-cases can be achieved with a file-system, so long as your total usage can be handed by a single machine. If you have more traffic than that you will need replicated storage, load balancing etc, at which point you are effectively implementing a object storage system yourself.

Azure Tables vs Blob

I move my web application to Azure.
My application has many small (till 1 Mb, approx. 100 kb per file) files (image formats). Each file has the unique name (which can be found by name). Right now there is a simple folder on web hosting with 1000x files. What is more effectively to use - Azure Tables or Blob ?
Blob service would be the best choice for this scenario. Couple things to consider:
Since the partition key is down to the blob name, we can load balance access to different blobs across as many servers in order to scale out access to them.
If you need to, you could make your container(s) public, allowing unauthenticated access to the blobs.
If #2 is not desired, you could still use Shared Access Signatures to make your blobs downloadable with a browser or another HTTP client that is not aware of Azure Storage.

Storing Uploaded Files in Azure Web Sites: File System or Azure Storage

When using Azure Web Sites (WAWS) general opinion seems to be that uploaded content such as photo's or files should be stored in Azure Storage Blobs and not in the WAWS File System.
Clearly using Azure Storage is a great idea if you have a lot of data and need scale and redundancy however for small or simple sites it seems to add another layer of complexity and also means you can't easily use things like ImageResizer without purchasing the Azure compatible licence etc.
So given that products like WordPress from the Azure Gallery uses "/site/wwwroot/wp-content/uploads/" to store all uploaded files on WAWS is there anything wrong with using the WAWS file system for storage or are there other considerations to take into account when using Azure WAWS?
The major drawback to using the WAWS storage is that your data is now intermingled with the application. By saving all of your plugins/images/blobs externally in a database or blob storage, you retain the flexibility to redeploy your application to a new region/datacenter by just pushing your code to the new website and changing connection strings.
If your plugins/images are stored on disk in WAWS, then you need to make sure that you are backing it up appropriately. If anything happens, you need to restore the site along with all of the data that had been uploaded.
Azure Web Sites is using Azure storage as a file storage so essentially the level of complexity you're talking about is abstracted.
Another great benefit that comes with this approach is if you scale your web site to multiple instances all of them will work with exact same file content.
Of course if you want to use pure Azure Storage features like snapshots or sharing specific content to specific users this is not available as is. But for the web site purposes is quite good.
Hope that helps

BLOB's in SQL that stores a Video file

I am hoping someone can explain how to use BLOBs. I see that BLOBs can be used to store video files. My question is why would a person store a video file in a BLOB in a SQL database? What are the advantages and disadvantages compared to storing pointers to the location of the video file?
A few different reasons.
If you store a pointer to a file on disk (presumably using the BFILE data type), you have to ensure that your database is updated whenever files are moved, renamed, or deleted on disk. It's relatively common when you store data like this that over time your database gets out of sync with the file system and you end up with broken links and orphaned content.
If you store a pointer to a file on disk, you cannot use transactional semantics when you're dealing with multimedia. Since you can't do something like issue a rollback against a file system, you either have to deal with the fact that you're going to have situations where the data on the file system doesn't match the data in the database (i.e. someone uploaded a video to the file system but the transaction that created the author and title in the database failed or vice versa) or you have to add additional steps to the file upload to simulate transactional semantics (i.e. upload a second <>_done.txt file that just contains the number of bytes in the actual file that was uploaded. That's cumbersome and error-prone and may create usability issues.
For many applications, having the database serve up data is the easiest way to provide it to a user. If you want to avoid giving a user a direct FTP URL to your files because they could use that to bypass some application-level security, the easiest option is to have a database-backed application where to retrieve the data, the database reads it from the file system and then returns it to the middle tier which then sends the data to the client. If you're going to have to read the data into the database every time the data is retrieved, it often makes more sense to just store the data directly in the database and to let the database read it from its data files when the user asks for it.
Finally, databases like Oracle provide additional utilities for working with multimedia data in the database. Oracle interMedia, for example, provides a rich set of objects to interact with video data stored in the database-- you can easily tag where scenes begin or end, tag where various subjects are discussed, when the video was recorded, who recorded it, etc. And you can integrate that search functionality with searches against all your relational data. Of course, you could write an application on top of the database that did all those things as well but then you're either writing a lot of code or using another framework in your app. It's often much easier to leverage the database functionality.
Take a read of this : http://www.oracle.com/us/products/database/options/spatial/039950.pdf
(obviously a biased view, but does have a few cons (that have now been fixed by the advent of 11g)

Database of images and text

background:
I'm in the design phase of building an app.
I want the app to display text and images, the problem is that I will have A LOT of them. hundreds to thousands.
This is my largest app so far, and I am unsure on how to handle all the data.
The question???????:
What would be the best way to store and access these images and text?
Would I use a formal database approach like SQL?
Or would it be better to navigate files/folders e.g. dropping all the files in res/drawable?
potentially useful facts:
The database will be stored and accessed natively so it can be accessed off-line.
The user will not be adding to the database in anyway, only accessing the data.
the database will be updated every 6 months.
The application 'page' will display 1-5 images along with several blocks of text.
Concept:
the app will be like a recipe app...the user will pick some parameters e.g. ingredients, type, diet.. then select a recipe. And then several images and blocks of text will be displayed showing and detailing the process of some recipe.
I apologize if this is repeated but I didn't see a specific answer for my purposes.
The "Best" approach will depend on the functionality of the database server in question.
Generally, you should store the images "In" the database until that becomes a performance issue. Once you start storing images "Outside" of the database you will have to handle all the issue that are normally taken care of by the database. Disk space management, orphan records, file name conflicts, folder file limits, to name just a few. Depending on your situation these may be big issues or thay may be nothing to worry about.
I've seen several application where images (or attachements) were kept "Outside" the database, and in each case it was done poorly. There are just so many issues to handle, and most developers don't even think of half of them. In many cases the performance of storing the images "In" the databse was acceptable, but the developers decided against it because they just knew it would not perform well.
If your using SQL server 2008 the Filestream data type is ideal for your case. It stores the binary files outside of the database but behaves as a normal field. Also you are able to read/write the files using a stream instead of getting/setting the whole file as a byte array (like when using varbin(max))
If you don't have this functionality in your database, I would recommend storing the images outside of the DB
Its probably a better idea to use a file based approach for deployed static resources.
At the very least because taking a dependency on file system is typically easier to manage then taking a dependency on a DB.
Also this line indicates some sort of non-web client
The database will be stored and accessed natively so it can be accessed off-line."
This means if you go with the DB approach you'll have a couple of other interesting problems
Deployment
Depending on the platform deploying a DB can be a real bear depending on your target platform. What happens if they if already have the engine but its a different version.
Resources
Is your DB going to be client/server based (like MySQL/SQL Server etc)? If so then your app has to now manage the current state of its process. If not then you'll be using a file-based db SQL Lite/MS Access, at which point I would question why using a static DB is worth doing at all.
One final note. There's nothing stopping your Content Production environment from using a DB. Its quite common for Content producers to maintain a database for their content that will you will later use to produce the files for publishing/deployment.