I'm using ImageResizer as an Azure webapp with a service plan with 50 Gb file storage. My settings for DiskCache are:
<diskcache dir="~/imagecache" autoclean="true" hashModifiedDate="true" subfolders="1024" asyncWrites="true" asyncBufferSize="10485760" cacheAccessTimeout="15000" logging="true"/>
But that doesn't seem to stop the imagecache folder to get to the 50 Gb limit quite quickly. I have around 100 Gb of images in blob storage (original size), not all will be used on the same day, however the same image could be cached with different parameters multiple times. The images cached are around 200Kb average?.
Is there a way to stop the storage filling up so quick? Is there maybe a better way of using DiskCache? or use something else? The Premium Plans with 250Gb and decent CPU/RAM are far too expensive to justify the cost for this.
Thanks
You can't limit the cache by files size, only by a (very) rough count. Deleting the cache and setting subfolders="256" should keep you under 50GB, assuming that 200kb average holds true.
... However, if your cache fills up "quickly" (as in 1-3 days), then you're probably going to experience serious cache churn and poor performance as your disk write queue skyrockets.
You might consider using a CDN if you can't get storage space for, say, 10 days worth of cached files.
Related
We receive around 10 million images per day ranging in size from 3kb to 200kb. At peak times it is around 400 images per second. It is an average of around 30kb per image.
At the moment all these images come into a single server with a 1TB NVMe SSD for storage.
At night we move the days images to an archive server.
At peak times users want to read the latest images as they are being written but there are delays as it appears the server is attempting read and write at the same time.
What is the best way to be able to succeed in read / write to the same volume and be able to scale easily going forward?
I've started looking into distributed file systems like SeaweedFS. Is this the right way to go?
Are there better options than SeaweedFS?
Thank you
How can I limit maximum size on disk when using Ignite Persistence? For example, my data set in a database is 5TB. I want to cache maximum of 50GB of data in memory with no more than 500GB on disk. Any reasonable eviction policy like LRU for on-disk data will work for me. Parameter maxSize controls in-memory size and I will set it to 50GB. What should I do to limit my disk storage to 500GB then? Looking for something like maxPersistentSize and not finding it there.
Thank you
There is no direct parameter to limit the complete disk usage occupied by the data itself. As you mentioned in the question, you can control in-memory regon allocation, but when a data region is full, data pages are going to be flushed and loaded on demand to/from the disk, this process is called page replacement.
On the other hand, page eviction works only for non-persistent cluster preventing it from running OOM. Personally, I can't see how and why that eviction might be implemented for the data stored on disk. I'm almost sure that other "normal" DBs like Postgres or MySQL do not have this option either.
I suppose you might check the following options:
You can limit WAL and WAL archive max sizes. Though these items are rather utility ones, they still might occupy a lot of space [1]
Check if it's possible to use expiry policies on your data items, in this case, data will be cleared from disk as well [2]
Use monitoring metrics and configure alerting to be aware if you are close to the disk limits.
I have a Synology with 2 To of disk space, and it is saved every day by Hyper Backup (with Smart Recycle).
But there is a file #img_bkp_cache that is growing, and takes almost 1/5th of the total disk capacity :
368G /volume2/#img_bkp_cache
1.3T /volume2/Samba
Is it safe to remove that cache file? How to do that? What can I do to shrink it otherwise?
Thank you for your help.
Here is Synology support answer (translated):
The cache image contains your remote backups index. This index is
compared to the remote index to figure out which elements have
changed. If you have several remote backups, then #img_bkp_cache will
get bigger and bigger.
The index takes roughly 5% of the total size of a backup.
It is not really safe to remove #img_bkp_cache. If you do so, the
remote backup will not be affected, but it will be impossible to manage
incremental backups.
In a nutshell, this file is important and cannot be deleted without consequences.
Note: Finally, I switched from RAID 1 to RAID 5 and doubled my storage capacity (I had a fifth volume that was unused), which "solved" the problem.
I need to read log files (.CSV) using fastercsv and save the contents of it in a db (each cell value is a record). The thing is there are around 20-25 log files which has to be read daily and those log files are really large (each CSV file is more then 7Mb). I had forked the reading process so that user need not have to wait a long time but still reading 20-25 files of that size is taking time (more then 2hrs). Now I want to fork reading of each file i.e there will be around 20-25 child process getting created, my question is can I do that? If yes will it affect the performance and is fastercsv able to handle this?
ex:
for report in #reports
pid = fork {
.
.
.
}
Process.dispatch(pid)
end
PS:I'm using rails 3.0.7 and Its going to happen in server which is running in amazon's large instance(7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of local instance storage, 64-bit platform)
If the storage is all local (and I'm not sure you can really say that if you're in the cloud), then forking isn't likely to provide a speedup because the slowest part of the operation is going to be disc I/O (unless you're doing serious computation on your data). Hitting the disc via several processes isn't going to speed that up at once, though I suppose if the disc had a big cache it might help a bit.
Also, 7MB of CSV data isn't really that much - you might get a better speedup if you found a quicker way to insert the data. Some databases provide a bulk load function, where you can load in formatted data directly, or you could turn each row into an INSERT and file that straight into the database. I don't know how you're doing it at the moment so these are just guesses.
Of course, having said all that, the only way to be sure is to try it!
The reason I ask it we have a dedicated RAID10 array with ~150GB for the tempdb (the "t" drive). It is only used for storing tempdb. The t drive isn't used by by SQL Server or any other process for anything else.
Our DBA has tempdb setup with 15GB initial size and autogrow 20% increments. Everytime the server starts it resized to 15GB and then over the course of the day grows to ~80GB (on average). Now IT is looking into making initial size larger say 30 or 40GB but given the drive is ONLY used for tempdb my thinking is why not "max it" right away.
Is the any negative effect to simply create 4 data files in the primary group for tempdb give them each an initial size of 30GB (120GB total), turn autogrow off and be done with it?
Are there any limits on SQL Server ability to span multiple tempdb data files in one query? i.e. will it cause problems if the tempdb has say 70GB total free but the file used by one process is full (30 of 30GB used)?
I would size them to about 100GB and leave autogrow on, this way you don't have to wait for it to grow every time, I would also add multiple files
Is the any negative effect to simply
create 4 data files in the primary
group for tempdb give them each an
initial size of 30GB, turn autogrow
off and be done with it?
Sounds like a good plan to me, however I would leave autogrow on just in case someone decides to do a sort operation on a big table which doesn't have an index on that column
See also here: http://technet.microsoft.com/en-us/library/cc966534.aspx
It is recommended to have .25 to 1
data files (per filegroup) for each
CPU on the host server.
This is especially true for TEMPDB
where the recommendation is 1 data
file per CPU.
Dual core counts as 2 CPUs; logical
procs (hyperthreading) do not.
We have found it very useful to create large TempDB data and log files. Any actions that limit server OS activities such as resizing TempDB increase server efficiencies. We have a 16 processor machine with 113 GB dedicated to TempDB data space. This machine is dedicated to large SSIS ETL processes, thus resulting in mass data operations.
The bulk of our ETL operations spawn up to 4 SQL threads. After initially configuring a TempDB file for each processor (16), we quickly realized via performance monitoring that our configuration was forcing SQL\windows to unnecessarily span the multiple TempDB files. We settled on 5 larger TempDB data files and realized performance improvements. We have since moved on to a 24 processor box and are using 8 TempDB files.
Please note that this is a large data migration server; I’m sure transaction-oriented systems would still benefit from the recommended 1-1 processor to TempDB file configuration. It should also be noted that having a large increase % on a TempDB file may force a critical transaction to take the windows operation hit and thus may not be appropriate for your specific application.