Cannot find stored elements in Apache JSC cache - apache

I'm using a JSC cache to store big amounts of objects that my application is using (more than 10.000.000)
I wrote a quick test to check the configuration, and although the elements seem to be stored in cache, when i'm trying to retrieve them, most of them aren't there.
I use a region cache and an auxiliary Disc Cache, as you can see from my configuration file
jcs.region.testCache1=DC
jcs.region.testCache1.cacheattributes=org.apache.jcs.engine.CompositeCacheAttributes
jcs.region.testCache1.cacheattributes.MaxObjects=1000
jcs.region.testCache1.cacheattributes.MemoryCacheName=org.apache.jcs.engine.memory.lru.LRUMemoryCache
jcs.region.testCache1.cacheattributes.UseMemoryShrinker=true
jcs.region.testCache1.cacheattributes.MaxMemoryIdleTimeSeconds=3600
jcs.region.testCache1.cacheattributes.ShrinkerIntervalSeconds=60
jcs.region.testCache1.cacheattributes.MaxSpoolPerRun=500
jcs.region.testCache1.elementattributes=org.apache.jcs.engine.ElementAttributes
jcs.region.testCache1.elementattributes.IsEternal=true
jcs.auxiliary.DC=org.apache.jcs.auxiliary.disk.indexed.IndexedDiskCacheFactory
jcs.auxiliary.DC.attributes=org.apache.jcs.auxiliary.disk.indexed.IndexedDiskCacheAttributes
jcs.auxiliary.DC.attributes.DiskPath=${user.dir}/jcs_swap
jcs.auxiliary.DC.attributes.MaxPurgatorySize=10000
jcs.auxiliary.DC.attributes.MaxKeySize=-1
I set the eternal attritbute to 'true', so that the elements are never expired and removed, an memory shrinker that puts the elements periodically to the DiscCache, and a DiscCache whose MaxKeySize is set to -1, indicating that it can host whatever amount of alements. Do u see any misconiguratiokn?
When I use this configuration with medium amount of elements (~10.000) everything works fine. When I'm using it with more than 1.000.000, I cannot retrieve most of the elements.

After some testing, I found a solution by my own. I was inserting elements in the cache by executing the following snipset
for(Integer i=0; i<2000000; i++) {
TestElement element = new TestElement();
element.setId(i);
element.setValue("element" + i);
cache.add(i, element);
}
This caused troubles, because the cache didn't have the time to spool elements in the disk cache. However if I use sleep for a couple of msecs before adding new elements (which makes more sense in a real time environment) everything works as expected

Related

Migrating from Microsoft.Azure.Storage.Blob to Azure.Storage.Blobs - directory concepts missing

These are great guides for migrating between the different versions of NuGet package:
https://github.com/Azure/azure-sdk-for-net/blob/Azure.Storage.Blobs_12.6.0/sdk/storage/Azure.Storage.Blobs/README.md
https://elcamino.cloud/articles/2020-03-30-azure-storage-blobs-net-sdk-v12-upgrade-guide-and-tips.html
However I am struggling to migrate the following concepts in my code:
// Return if a directory exists:
container.GetDirectoryReference(path).ListBlobs().Any();
where GetDirectoryReference is not understood and there appears to be no direct translation.
Also, the concept of a CloudBlobDirectory does not appear to have made it into Azure.Storage.Blobs e.g.
private static long GetDirectorySize(CloudBlobDirectory directoryBlob) {
long size = 0;
foreach (var blobItem in directoryBlob.ListBlobs()) {
if (blobItem is BlobClient)
size += ((BlobClient) blobItem).GetProperties().Value.ContentLength;
if (blobItem is CloudBlobDirectory)
size += GetDirectorySize((CloudBlobDirectory) blobItem);
}
return size;
}
where CloudBlobDirectory does not appear anywhere in the API.
There's no such thing as physical directories or folders in Azure Blob Storage. The directories you sometimes see are part of the blob (e.g. folder1/folder2/file1.txt). The List Blobs requests allows you to add a prefix and delimiter in a call, which are used by the Azure Portal and Azure Data Explorer to create a visualization of folders. As example prefix folder1/ and delimiter / would allow you to see the content as if folder1 was opened.
That's exactly what happens in your code. The GetDirectoryReference() adds a prefix. The ListBlobs() fires a request and Any() checks if any items return.
For V12 the command that'll allow you to do the same would be GetBlobsByHierarchy and its async version. In your particular case where you only want to know if any blobs exist in the directory a GetBlobs with prefix would also suffice.

OutOfMemory on custom extractor

I have stitched a lot of small XML files into one file, and then made a custom extractor to return rows with one byte array that corresponds to each file.
Run on remote/master
Run it for one file (gzipped, 11Mb), it works fine.
Run it for more than one file, I get a System.OutOfMemoryException.
Run on local/master
Run it for one or more files (gzipped 500+ Mbs), works fine.
Extractor looks like this:
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
using (var stream = new StreamReader(input.BaseStream))
{
var xml = stream.ReadToEnd();
// Clean stiched XML
xml = UtilsXml.CleanXml(xml);
// Get nodes - one for each stiched file
var d = new XmlDocument();
d.LoadXml(xml);
var root = d.FirstChild;
for (int i = 0; i < root.ChildNodes.Count; i++)
{
output.Set<object>(1, Encoding.ASCII.GetBytes(root.ChildNodes[i].OuterXml.ToString()));
yield return output.AsReadOnly();
}
yield break;
}
}
and error message looks like this:
==== Caught exception System.OutOfMemoryException
at System.Xml.XmlDocument.CreateTextNode(String text)
at System.Xml.XmlLoader.LoadAttributeNode()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at Microsoft.Analytics.Tools.Formats.Text.XmlByteArrayRowExtractor.<Extract>d__0.MoveNext()
at ScopeEngine.SqlIpExtractor<ScopeEngine::GZipInput,Extract_0_Data0>.GetNextRow(SqlIpExtractor<ScopeEngine::GZipInput\,Extract_0_Data0>* , Extract_0_Data0* output) in d:\data\ccs\jobs\bc367467-ef86-43d2-a937-46ba2d4cc524_v0\sqlmanaged.h:line 1924
So what am I doing wrong? And how do I debug this on remote?
Thanks!
Unfortunately local run does not enforce memory allocations, so you would have to check memory in local vertex debug yourself.
Looking at your code above, I see that you are loading XML documents into a DOM. Please note that an XML DOM can explode the data size from the string representation up to a factor of 10 or more (I have seen 2 to 12 in my times as the resident SQL XML guru).
Each UDO today only gets 1/2 GB of RAM to play with. So what I assume is that your XML DOM document(s) start going beyond that.
The recommendation normally is that you use the XMLReader interface (there is a reader extractor in the samples on http://usql.io as well) and scan through the document(s) to find the information you are looking for.
If your documents are always small enough (e.g., <20MB), you may want to make sure that you release the memory of the other documents and operate one document at a time.
We do have plans to allow you to annotate your UDO with memory needs, but that is still a bit out.

Cannot read second page scanned via ADF

I have a Brother mutlifunction networked printer/scanner/fax (model MFC-9140CDN). I am trying to use the following code with WIA, to retrieve items scanned in with the document feeder:
const int FEEDER = 1;
var manager=new DeviceManager();
var deviceInfo=manager.DeviceInfos.Cast<DeviceInfo>().First();
var device=deviceInfo.Connect();
device.Properties["Pages"].set_Value(1);
device.Properties["Document Handling Select"].set_Value(1);
var morePages=true;
var counter=0;
while (morePages) {
counter++;
var item=device.Items[1];
item.Properties["Bits Per Pixel"].set_Value(1);
item.Properties["Horizontal Resolution"].set_Value(300);
item.Properties["Vertical Resolution"].set_Value(300);
var img=(WIA.ImageFile)item.Transfer();
var path=String.Format(#"C:\Users\user1\Documents\test_{0}.tiff",counter);
img.SaveFile(path);
var status=(int)device.Properties["Document Handling Status"].get_Value();
morePages = (status & FEEDER) > 0;
}
When the Transfer method is reached for the first time, all the pages go through the document feeder. The first page gets saved with img.SaveFile to the passed-in path, but all the subsequent pages are not available - device.Items.Count is 1, and trying device.Items[2] raises an exception.
In the next iteration, calling Transfer raises an exception -- understandably, because there are now no pages in the feeder.
How can I get the subsequent images that have been scanned into the feeder?
(N.B. Iterating through all the device properties, there is an additional unnamed property with the id of 38922. I haven't been able to find any reference to this property.)
Update
I couldn't find a property on the device corresponding to WIA_IPS_SCAN_AHEAD or WIA_DPS_SCAN_AHEAD_PAGES, but that makes sense because this property is optional according to the documentation.
I tried using TWAIN (via the NTwain library, which I highly recommend) with the same problem.
I have recently experienced a similar error with a HP MFC.
It seems that a property was being changed by the driver. The previous developer of the software I'm working on just kept reinitalisating the driver each time in the for loop.
In my case the property was 'Media Type' being set to FLATBED (0x02) even though I was doing a multi-page scan and needed it to be NEXT_PAGE (0x80).
The way I found this was by storing every property before I scanner (both device and item properties) and again after scanning the first page. I then had my application print out any properties that had changed and was able to identify my problem.
This is a networked scanner, and I was using the WSD driver.
Once I installed the manufacturer's driver, the behavior is as expected -- one page goes through the ADF, after which control is returned to the program.
(Even now, when I use WIA's CommonDialog.ShowSelectDevice method, the scanner is available twice, once using the Windows driver and once using the Brother driver; when I choose the WSD driver, I still see the issue.)
This bug did cost me hours...
So thanks a lot Zev.
I also had two scanners shown in the dialog for physically one machine. One driver scans only the first page and then empties the feeder without any chance to intercept. The other one works as expected.
BTW: It is not needed to initialize the scanner for each page. I call my routines for initialization prior to the Transfer() loop. Works just fine.
Another hickup I ran into was to first initialize page sizes, then the feeder. So if you do not get it to work, try switching the sequence how you change the properties for your WIA driver. As mentioned in the MSDN, some properties also influence others, potentially resetting your changes.
So praise to ZEV SPITZ for the answer on Aug. 09, 2015.
You should instantiate and setup device inside the 'while' loop. See:
const int FEEDER = 1;
var morePages=true;
var counter=0;
while (morePages) {
counter++;
var manager=new DeviceManager();
var deviceInfo=manager.DeviceInfos.Cast<DeviceInfo>().First();
var device=deviceInfo.Connect();
//device.Properties["Pages"].set_Value(1);
device.Properties["Document Handling Select"].set_Value(1);
var item=device.Items[1];
item.Properties["Bits Per Pixel"].set_Value(1);
item.Properties["Horizontal Resolution"].set_Value(300);
item.Properties["Vertical Resolution"].set_Value(300);
var img=(WIA.ImageFile)item.Transfer();
var path=String.Format(#"C:\Users\user1\Documents\test_{0}.tiff",counter);
img.SaveFile(path);
var status=(int)device.Properties["Document Handling Status"].get_Value();
morePages = (status & FEEDER) > 0;
}
I got this looking into this free project, which I believe is able to help you too: adfwia.codeplex.com

Symfony2 performance tweaking

Symfony2 was looking so promising, powerful and flexible. So we were going to use Symfony2 + mongodb for one of our projects. But it appeared too slow (Apache/2.2.25 + PHP/5.4.20). Currently the app is pretty simple. but I have noticed that the httpd.exe lads CPU up to 28% when some simple page is loaded. The page is quite lite - just user profile info and the list of his posts. I even can't imagine how hundreds of users can be served (not even talking about numbers like 100k users) if performance will not be much better.
For instance the CPU load is 2% when opening the heavy 'products' page of ActivationCloud account (which fetches a good amount of data) (PHP+Smarty+SQL).
After taking a look on Xdebug output, I have found that a gret deal of time 20% is utilized by ClassLoader->loadClass(...) - 265 calls
After performing the following steps:
*generated class map
php composer.phar dump-autoload --optimize
*installed and enabled APC
[APC]
extension=php_apc.dll
apc.enabled=1
apc.shm_segments=1
;32M per WordPress install
apc.shm_size=128M
;Relative to the number of cached files (you may need to
watch your stats for a day or two to find out a good number)
apc.num_files_hint=7000
;Relative to the size of WordPress
apc.user_entries_hint=4096
;The number of seconds a cache entry is allowed to idle
in a slot before APC dumps the cache
apc.ttl=7200
apc.user_ttl=7200
apc.gc_ttl=3600
;Setting this to 0 will give you the best performance, as APC will
;not have to check the IO for changes. However, you must clear
;the APC cache to recompile already cached files. If you are still
;developing, updating your site daily in WP-ADMIN, and running W3TC
;set this to 1
apc.stat=1
;This MUST be 0, WP can have errors otherwise!
apc.include_once_override=0
;Only set to 1 while debugging
apc.enable_cli=0
;Allow 2 seconds after a file is created before
it is cached to prevent users from seeing half-written/weird pages
apc.file_update_protection=2
;Leave at 2M or lower. WordPress does't have any file sizes close to 2M
apc.max_file_size=2M
;Ignore files
apc.filters = "/var/www/apc.php"
apc.cache_by_default=1
apc.use_request_time=1
apc.slam_defense=0
apc.mmap_file_mask=/var/www/temp/apc.XXXXXX
apc.stat_ctime=0
apc.canonicalize=1
apc.write_lock=1
apc.report_autofilter=0
apc.rfc1867=0
apc.rfc1867_prefix =upload_
apc.rfc1867_name=APC_UPLOAD_PROGRESS
apc.rfc1867_freq=0
apc.rfc1867_ttl=3600
apc.lazy_classes=0
apc.lazy_functions=0
expected a miracle after it but it did not happen.
*enabled APC class loader - in Symfony\web\app.php uncommented
/*
$loader = new ApcClassLoader('sf2', $loader);
$loader->register(true);
*/
The ClassLoader->loadClass(...) got better 'Self' is 11 instead of 21
Frankly speaking I was shocked by what I saw in xdebug :( a lot of repetitive calls like Container->get(...) -317 calls, DocumentManager->getClassMeataData(...) - 301 calls. Totally more than 2k of function calls. Hard to believe that.
These bundles are installed:
class AppKernel extends Kernel
{
public function registerBundles()
{
$bundles = array(
new Symfony\Bundle\FrameworkBundle\FrameworkBundle(),
new Symfony\Bundle\SecurityBundle\SecurityBundle(),
new Symfony\Bundle\TwigBundle\TwigBundle(),
new Symfony\Bundle\MonologBundle\MonologBundle(),
new Symfony\Bundle\SwiftmailerBundle\SwiftmailerBundle(),
new Symfony\Bundle\AsseticBundle\AsseticBundle(),
new Doctrine\Bundle\DoctrineBundle\DoctrineBundle(),
new Doctrine\Bundle\MongoDBBundle\DoctrineMongoDBBundle(),
new Sensio\Bundle\FrameworkExtraBundle\SensioFrameworkExtraBundle(),
new HWI\Bundle\OAuthBundle\HWIOAuthBundle(),
new Knp\Bundle\MenuBundle\KnpMenuBundle(),
... our bundles ...
);
if (in_array($this->getEnvironment(), array('dev', 'test'))) {
$bundles[] = new Symfony\Bundle\WebProfilerBundle\WebProfilerBundle();
$bundles[] = new Sensio\Bundle\DistributionBundle\SensioDistributionBundle();
$bundles[] = new Sensio\Bundle\GeneratorBundle\SensioGeneratorBundle();
}
return $bundles;
}
It was sad to find that Symfony2 got one of the worst benchmark results among others php frameworks http://www.techempower.com/benchmarks/#section=data-r8&hw=i7&test=json&l=sg
At the same time Francois Zaninotto said in his blog http://symfony.com/blog/who-really-uses-symfony that Yahoo uses Symfony2 for the bookmarks service, tried some apps form the list http://trac.symfony-project.org/wiki/ApplicationsDevelopedWithSymfony - they are not looking slow also on Quora http://www.quora.com/Who-is-using-Symfony2-in-production its spoken that dailymotion is using it as well.
How to make the performance acceptable?
Got Symfony working x10 faster after adding the
realpath_cache_size = 4096k
to php.ini
First you should use linux (you mentioned https.exe so I think you are using windows). Than you should use nginx instead of apache and php-5.5 with fpm instead of mod_php. Opcache instead of apc (by the way apc.stat should be turned off). Doctrine caches should be turned on and than you should use http caching wherever you can. (You can view packagist's code for some hints.)

How to remember variables with Greasemonkey script when a page reloads

Ive got an problem currently on an mobile site that i'm running directly in my pc's firefox browser. Everytime a button is clicked, the page reloads, thus resetting my variables. I've got this script:
// ==UserScript==
// #name trada.net autoclick 55_1min_mobile
// #namespace airtimeauction auto click
// #include http://www.trada.net/Mobile/
// #version 0.1
// #description Automatically click // ==/UserScript==
var interval = 57000;
var bidClickTimer = setInterval (function() {BidClick (); }, interval);
var numBidClicks = 0;
function BidClick ()
{var bidBtn1=document.getElementById("ctl00_mainContentPlaceholder_AirtimeAuctionItem7_btn_bidNow");
numBidClicks++;
if (numBidClicks > 500)
{
clearInterval (bidClickTimer);
bidClickTimer = "";
}
else
{
bidBtn1.click (1);
}
};
BidClick();
It should click the button every 57 seconds, but the moment it clicks the button, the page reloads, thus resetting the variables. How can i get greasemonkey to "remember" or carry over the variables to the next page/script when it reloads? Will it have something to do with GM_setValue? It will only be this few variables, but the second problem or question wil be, will it subtract the few seconds it takes the page to reload from the "57" seconds? How do i compensate for that?
In addition to GM_setValue...
you also can use the new Javascript "localStorage" object, or a SQL Javascript API.
The advantage of the SQL approach is it is very meager in its resource consumption in a script (think about it; rather than concatenating a humongous string of results, you can tuck away each result and recall it if needed with a precise query. The downside is you have to set up a SQL server, but using something like SQLite that's not a big deal these days. Even postgres or mysql can be quickly spun on a laptop...
Yes, I think you have to use GM_setValue/GM_getValue.
And if you have to do something exactly every 57 seconds, then calculate the time when the next action should take place after the reload, and store it using GM_setValue.
When your script starts, read first if the next action is stored, if it is, use that time to schedule the next action, and calculate the time for the action after that, and so on...
GM.setValue will set a value indefinitely and is scoped to the script, but will work if your script runs across multiple domains.
window.localStorage will set a value indefinitely and is scoped to the domain of the page, so will not work across domains, but will work if you need several GreaseMonkey scripts to access the same value.
window.sessionStorage will set a value only while the window or tab is open and is scoped to only that window or tab for that domain.
document.cookie can set a value indefinitely or only while the browser is open, and can be scoped across subdomains, or a single domain, or a path, or a single page.
Those are the main client-side mechanisms for storing values across page loads that are intended for this purpose. However, there is another method which is sometimes possible (if the page itself is not using it), and can also be quite useful; window.name.
window.name is scoped to the window or tab, but will work across domains too. If you need to store several values, then they can be put into an object and you can store the object's JSON string. E.g. window.name = JSON.stringify(obj)