How to limit DiskCache plugin to a specific storage size? - imageresizer

I am using the DiskCache plugin. Looking at the documentation, I see the AutoClean attribute says
If true, items from the cache folder will be automatically 'garbage collected' if the cache size limits are exceeded. Defaults to false.
Where are the cache size limits defined? I keep running out of my 10 gb of space on Azure websites. How can I limit to 8GB?

Reduce the subfolders setting to restrict the number of files that are permitted. It's a stretchy limit based on the number of active files (I/O churn is very bad). The default limit is 400 * subfolders, the maximum is 1000 * subfolders.

Related

Limit Infinispan File Store Size

I would like to cache very large amounts of data in an Infinispan 13 cache that uses passivation to the disk. I've accomplished this with the following configuration:
<persistence passivation="true">
<file-store purge="true"/>
</persistence>
<memory storage="OFF_HEAP" max-size="1GB" when-full="REMOVE"/>
However, now I would like to set the maximum size for the file-store to i.e. 50GB and have the cache delete overflowing entries completely.
Is there a way to do this? I could not find any option to limit the size of a file-store in the documentation.
Thank you!
There is no way to specifically limit the total size of the files stored. Depending upon your use case setting the compaction-ratio lower which should help free some space. https://docs.jboss.org/infinispan/13.0/configdocs/infinispan-config-13.0.html under file-store
You can use expiration though to remove entries after a given period of time. https://infinispan.org/docs/stable/titles/configuring/configuring.html#expiration_configuring-memory-usage This will remove those entries from the cache, which in turn would hit that compaction-ratio sooner to clean up old files.

paginating unlimited S3 size with filters

in my bucket, S3 has an unlimited number of objects whose names are the epoch value in which they were created. (e.g. "1503379525")
The Lambda function should concatenate the contents of all files between a specific time range (last 15 minutes).
My solution is to use:
Pagination to get a list of objects
page_iterator.search("Contents[?Key > epoch-for-last-15-min][]")
I need to consider the memory limit of Lambda and need to make sure the paginator works fine in a bucket with an unlimited number of files.
Is this a feasible solution?

how to limit the size of the file that exporting from bigquery to gcs?

I Used the python code for exporting data from bigquery to gcs,and then using gsutil to export to s3!But after exporting to gcs ,I noticed the some files are more tha 5 GB,which gsutil cannnot deal?So I want to know the way for limiting the size
So after the issue tracker, the correct way to take this is.
Single URI ['gs://[YOUR_BUCKET]/file-name.json']
Use a single URI if you want BigQuery to export your data to a single
file. The maximum exported data with this method is 1 GB.
Please note that data size is up to a maximum of 1GB, and the 1GB is not for the file size that is exported.
Single wildcard URI ['gs://[YOUR_BUCKET]/file-name-*.json']
Use a single wildcard URI if you think your exported data set will be
larger than 1 GB. BigQuery shards your data into multiple files based
on the provided pattern. Exported files size may vary, and files won't
be equally in size.
So again you need to use this method when your data size is above 1 GB, and the resulting files size may vary, and may go beyond the 1 GB, as you mentioned 5GB and 160Mb pair would happen on this method.
Multiple wildcard URIs
['gs://my-bucket/file-name-1-*.json',
'gs://my-bucket/file-name-2-*.json',
'gs://my-bucket/file-name-3-*.json']
Use multiple wildcard URIs if you want to partition the export output.
You would use this option if you're running a parallel processing job
with a service like Hadoop on Google Cloud Platform. Determine how
many workers are available to process the job, and create one URI per
worker. BigQuery treats each URI location as a partition, and uses
parallel processing to shard your data into multiple files in each
location.
the same applies here as well, exported file sizes may vary beyond 1 GB.
Try using single wildcard URI
See documentation for Exporting data into one or more files
Use a single wildcard URI if you think your exported data will be
larger than BigQuery's 1 GB per file maximum value. BigQuery shards
your data into multiple files based on the provided pattern. If you
use a wildcard in a URI component other than the file name, be sure
the path component does not exist before exporting your data.
Property definition:
['gs://[YOUR_BUCKET]/file-name-*.json']
Creates:
gs://my-bucket/file-name-000000000000.json
gs://my-bucket/file-name-000000000001.json
gs://my-bucket/file-name-000000000002.json ...
Property definition:
['gs://[YOUR_BUCKET]/path-component-*/file-name.json']
Creates:
gs://my-bucket/path-component-000000000000/file-name.json
gs://my-bucket/path-component-000000000001/file-name.json
gs://my-bucket/path-component-000000000002/file-name.json

how to use total-max-memory property

While creating a region in geode you can specify --total-max-memory which should limit the amount of memory used the the region entries.
ref: https://geode.apache.org/docs/guide/tools_modules/gfsh/command-pages/create.html#topic_54B0985FEC5241CA9D26B0CE0A5EA863
I created a region of type PARTITION_OVERFLOW with total-max-memory set, I can see that this attribute is there in the the partition attributes for the region on server, but when the amount of data crossed the total-max-memory limit it did not start overflowing old entries to disk, after some time(memory usage is almost 10x greater than total-max-memory) the heap lru(which is based on total jvm head) kicks in and starts evicting entries.
Is there any additional setting which has to be done to trigger eviction when total-max-memory limit is reached for a region.
The total-max-memory is option is not used in geode. Following is the reference JIRA https://issues.apache.org/jira/browse/GEODE-2719.

setting soft limit for 4store client

While running the sparql queries I get the warning
# hit complexity limit 20 times, increasing soft limit may give more results
and this is not dependent to any particular query but for all.I wanted to know how can I increase the soft limit if I am not using the http interface rather I am using the 4store client master
By default 4store gives a soft limit of about 1000. You have a couple of options to tweak/disable search breadth:
If using 4s-query you can override the setting with --soft-limit e.g. 4s-query --soft-limit 5000 demo 'SELECT ... etc.'. Setting it to -1 completely disables it.
If using a 3rd party client you can tweak soft limit with a cgi parameter e.g. sparql-query 'http://example.org/sparql/?soft-limit=-1'
You can set the soft limit globally when starting the sparql server with the -s parameter. See 4s-httpd --help.