What is the best solution to save big amount files receiving everyday? - sql

in my application, i receive at least 1000 files per days from different email.
Pentaho i store them in a folder, but what is the best solution to save these files:
storing in folder in my hard disk or saving in a table (sql)?
Thank you

Unless you plan to take advantage of the additional features offered by a sql database, I would advise you continue to store them to the hard drive, and DO NOT forget the importance of backing up your files.

Related

Database model to manage documents

I need to build a tables related to manage documents such as jpg,doc,msg,pdf using a sql server 2008 .
As i know sql server support .jpg images, so my question is if it's possible to upload other kind of files into a db.
This is an example of the table (could be redefined if it's needed).
Document : document_id int(10)
name varchar(10)
type image (doesnt know how it might works)
Those are the initial values for a table, but i dont know how to make it useful for any type.
pd: do i need to assign a directory to save this documents into the server?
You can store almost any file type in an sql server table...if you do, you will almost certainly regret it.
Store a meta-data / a pointer to the file in your database instead, and store the files themselves on a disk directly where they belong.
Your database size - and thus hardware required to run it - will grow very rapidly, so you will be incurring large costs that you do not need to incur.
Use Filestream
https://learn.microsoft.com/en-us/sql/relational-databases/blob/filestream-sql-server
I know that a link-only answer is not an answer but I can't believe no one has mentioned it yet
The proper database design pattern is not to save Files into DBMS. You should develop a kind of File Manager Subsystem to manage your files for all of your projects.
File Manager Subsystem
This subsystem should be Reusable, Extendable, Secure and etc. All your projects that want to save Files, can use this subsystem.
Files can be saved in every where such as Local Hard, Network Drive, External Drives, Clouds and etc. So this subsystem should be design to support all kind of requests.
(you can improve the mentioned subsystem by adding a lot of features to it. for example checking duplicate files,...)
This subsystem, should generate a Unique Key for each file. After uploading and saving the files, the subsystem should generate that key.
Now, you can use this Unique Key to save in database (instead of file). Every time if you want to get the file, you can get the Unique Key from database and request to get file from the subsystem by unique key.

What's a best approach to create a filestore

This is an open ended question. I have noob understanding of databases but willing to learn whatever is required. Though I believe my problem could be done without learning a lot.
So, here goes the question:
I have large amount of files getting generated in mt projects(depending on the builds) and I need to archive them and also need to reproduce them according to buildNumber if requested by users. I don't expect these requests to be a lot. May be 1-2 requests a day.
For eg: 16GB data per build every week. Most of the files in weekly builds are duplicate. And I don't want to archive them again and again. I prefer to store them only once. There is one caveat that it can happen that the files relative location can change, even though content hasn't changed.
My approach is as follow: Create a hash from each file. Create the key-value pair as fileHash-actual file and store it. Store this information in some kind of manifest file for each build. So, I should be able to create the builds back with correct files/paths etc.
Can it ever happen that 2 different files will ever have the same hash? Can some database help to do it efficiently? I am currently thinking of dumping all files in one folder.
Thanks

Asset Management: which is the better way to organise user generated files on a web server?

We are in the process of building a system which allows users to upload multiple images and videos to our servers.
The team I'm working with have decided to save all the assets belonging to a user in a folder named using the user's unique identifier. This folder in turn will be a sub-folder of our main assets folder on the file server.
The file structure they have proposed is as follows:
[asset_root]/userid1/assets1
[asset_root]/userid1/assets2
[asset_root]/userid2/assets1
[asset_root]/userid2/assets2
etc.
We are expecting to have thousands or possibly a million+ users in the life time of this system.
I always thought that it wasn't a good idea to have many sub-folders in a single location and suggested a year/month/day approach as follows:
[asset_root]/2010/11/04/userid1/assets1
[asset_root]/2010/11/04/userid1/assets2
[asset_root]/2010/11/04/userid2/assets1
[asset_root]/2010/11/04/userid2/assets2
etc.
Does anyone know which of the above approaches would be better suited for this many assets? Is there a better method to organize images/videos on a server?
The system in question will be an Windows IIS 7.5 with a SAN.
Many thanks in advance.
In general you are correct, in that many file systems impose a limit on the number of files and folders which may be in one folder. If you hit that limit with the number of users you have, your in trouble.
In general, I would simply use a uuid for each image, with some dimension of partitioning. e.g. A hash of ABCDEFGH would end up as [asset_root]/ABC/DEFGH. Using a hash gives you a greater degree of assurance about the number of files which will end up in each folder and prevents you from having to worry about, for example, not knowing which month an image you need was stored in.
I'm presuming your file system is NTFS? IF so, you've got a limit of 4,294,967,295 files on the disk - the limit of files in a folder is the same. If you have on the order of millions of users you should be fine, though you might want to consider having only one folder per user instead of several as your example indicates.

Is storing Image File in database good in desktop application running in network?

I recently came across a problem for image file storage in network.
I have developed a desktop application. It runs in network. It has central database system. Users log in from their own computer in the network and do their job.
Till now the database actions are going fine no problem. Users shares data from same database server.
Now i am being asked to save the user[operator]'s photo too. I am getting confused whether to save it in database as other data or to store in separate file server.
I would like to know which one is better storing images in database or in file server?
EDIT:
The main purpose is to store the account holder's photo and signature and later show it during transaction so that teller can verify the person and signature is correct or not?
See these:
Storing images in database: Yea or nay?
Should I store my images in the database or folders?
Would you store binary data in database or folders?
Store pictures as files or or the database for a web app?
Storing a small number of images: blob or fs?
User Images: Database or filesystem storage?
Since this is a desktop application it's a bit different.
It's really how much data are we talking about here. If you've only got 100 or so users, and it's only profile pictures, I would store it in the DB for a few practical reasons:
No need to manage or worry about a separate file store
You don't need to give shared folder access to each user
No permissions issues
No chance of people messing up your image store
It will be included in your standard DB backup
It will be nicely linked to your data (no absolute vs. relative path issues)
Of course, if you're going to be storing tons of images for thousands of users, I would go with the file system storage.
I think you have to define what you mean with better.
If it is faster my guess you don't want to use a database. You probably just want it plain on a file server.
If you want something like a mini-facebook, where you need a much more dynamic environment, perhaps you are better of storing it a database.
This is more a question than an answer, what do you want to do with the pictures?

What is a managable way to store e-mails for extended periods of time?

If you have a site which sends out emails to the customer, and you want to save a copy of the mail, what is an effective strategy?
If you save it to a table in your database (e.g. create a table called Mail), it gets very large very quickly.
Some strategies I've seen are:
Save it to the file system
Run a scheduled task to clear old entries from the database - but then you wind up not having a copy;
Create a separate table for each time frame (one each year, or one each month)
What strategies have you used?
I don't agree that gmail is an effective backup for business data.
Why trust your business information to a provider who makes no guarantees of service, or over who you have no control whatsoever?
Makes no sense to me.
Depending on how frequently you need to access this information, I'd say go with the filesystem or database archive. At least that way, you have control over your own data.
Data you want to save is saved in a database. The only exception that is justified is large binary data (images, videos). Who cares how large the table gets? If the mails are automated and template-based, you just have to save the variable parts anyway. The size will be about the same wherever you save it, but you probably already have a mechanism to backup your database, so you won't have to invent one to handle millions of files.
Lots of assumptions:
1. You're running windows / would like an archive in windows
2. The ability to search in the mails is important.
Since you are sending mails to your customers there isn't any reason you can't bcc a mail account of your own. Assuming you have a suitable account on your own server then I'd look at using MailStore (home) to pull the mails out from your account and put them into it's own compressed database.
Another option (depending on the email content) is to not save the email, but make sure you can recreate the email by archiving the original content that went into generating the email.
It depends on the content of your email. If it contains large images. I would plump for the file system. Otherwise if your Mail table table is getting very large very quickly I would go for the separate table, archiving off dead customers.
We save the email to a database table. It really doesn't get that big that quickly. We've a table with 32,000 emails in it (they're biggish emails too # 50kb per email) and with compression, the file only uses 16MB.
If you're sending a shed load of email, then know that GMail(free) currently only allows 7GB of data. I'd be happy holding that on a disk.
I'd think about putting in place some sort of general archiving functionality. How you implement that depends on your specific retrieval needs.
For example if you wish just to retrieve emails sent to a particular customer for a certain month then stocking them in an appropriate heirachy on the File System (zip them up if necessary) should be simple to do. You might want to record a list of sent emails in a database table with a pointer to the appropriate directory but a naming convention for your directories and files might be sufficient
You might not need to access very old emails very infrequently so you might archive these to DVD for example if online storage is a problem
If you're wanting to often search the actual content of emails then your going to have to put the content in a DB table or use an indexer like Lucerne to examine the files stocked on disk