Does vars.putObject and vars.getObject consume extra memory in jmeter - testing

I have a test plan that looks like
Test plan
Jsr223 sampler
{
def lst = [100 elements];
vars.putObject("lst",lst);
}
loop controller(100 times)
{
Http request
preprocessor
{
lst = vars.getObject("lst");
}
}
Now does the lst in preprocessor uses the same memory of lst in jsr223 sampler or it creates its new memory and uses.
Q2) Another question is, does the lst memory in preprocessor gets cleared for every iteration or does it created new memory for each iteration.

In your setup you always refer to the same object which lives in JMeterVariables class instance, it neither allocates new portion of memory nor frees it during new iterations.
However be aware that each JMeter thread (virtual user) will have this object in its local storage so for 1 thread you will have 1 instance, for 2 threads - 2 instances.
So if you have > 1 thread and use the same object across all threads - it's better to use props instead of vars, as per documentation:
Properties are not the same as variables. Variables are local to a thread; properties are common to all threads
If you want to clear the object manually use vars.remove() function where needed like:
vars.remove('lst')
In order to reduce memory consumption you might consider putting your objects into a CSV file and go for CSV Data Set Config which doesn't load the full file into memory and has flexible sharing options of the values across the threads.

Related

Python multiprocessing, logging to different files

I would like to run a code on n processes, and have the logs from each process in a separate file.
I tried, naively, sthing like this
from multiprocessing import Process
import logging
class Worker(Process):
def __init__(self, logger_name, log_file):
super().__init__()
self.logger = logging.getLogger(logger_name)
self.log_file = log_file
self.logger.addHandler(logging.FileHandler(log_file))
print("from init", self.logger, self.logger.handlers)
def run(self) -> None:
print("from run", self.logger, self.logger.handlers)
if __name__ == '__main__':
p1 = Worker("l1", "log1")
p1.start()
(tried in python 3.9 and 3.11)
but from some reason, the handler is gone. This is the output:
from init <Logger l1 (WARNING)> [<FileHandler log1 (NOTSET)>]
from run <Logger l1 (WARNING)> []
Why is the FileHandler gone? Should I use the AddHandler within the run method -- is it a correct way?
I was trying to use this answer but couldn't make it really work.
For the moment, I solved it via defining the handlers in run but it seems like a dirty hack to me...
UPDATE: This happens on my MacBook python installations. On a linux server, I couldn't reproduce this. Very confusing.
In either case, the question is probably:
"Is this the correct way to log to files, with several copies of one
process?"
I found the reason for the observed behavior. It has to do with pickling of objects when they are transferred between Processes.
In the standard library's implementation of Logger, a __reduce__ method is defined. This method is used in cases where an object cannot be reliably pickled. Instead of trying to pickle the object itself, the pickle protocol instead uses the returned value from __reduce__. In the case of Logger, __reduce__ returns a function name (getLogger) and a string (the name of the Logger being pickled) to be used as an argument. In the unpicking procedure, the unpickling protocol makes a function call (logging.getLogger(name)); the result of that function call becomes the unpickled Logger instance.
The original Logger and the unpickled Logger will have the same name, but perhaps not much else in common. The unpickled Logger will have the default configuration, whereas the original Logger will have any customization you may have performed.
In Python, Process objects do not share an address space (at least, not on Windows). When a new Process is launched, its instance variables must somehow be "transferred" from one Process to another. This is done by pickling/unpickling. In the example code, the instance variables declared in the Worker.__init__ function do indeed appear in the new Process, as you can verify by printing them in Worker.run. But under the hood Python has actually pickled and unpickled all of the instance variables, to make it look like they magically have migrated to the new Process. In the vast majority of cases this works just fine. But not necessarily if one of those instance variables defines a __reduce__ method.
A logging.FileHandler cannot, I suspect, be pickled since it uses operating system resources (a file). This is probably the reason (or at least one of the reasons) why Logger objects can't be pickled.

Lucene index files changing constantly even when there is no adding, updating, or deletion operations performed on it

I have noticed that, my lucene index segment files (file names) are always changing constantly, even when I am not performing any add, update, or delete operations. The only operations I am performing is reading and searching. So, my question is, does Lucene index segment files get updated internally somehow just from reading and searching operations?
I am using Lucene.Net v4.8 beta, if that matters. Thanks!
Here is an example of how I found this issue (I wanted to get the index size). Assuming a Lucene Index already exists, I used the following code to get the index size:
Example:
private long GetIndexSize()
{
var reader = GetDirectoryReader("validPath");
long size = 0;
foreach (var fileName in reader.Directory.ListAll())
{
size += reader.Directory.FileLength(fileName);
}
return size;
}
private DirectoryReader GetDirectoryReader(string path)
{
var directory = FSDirectory.Open(path);
var reader = DirectoryReader.Open(directory);
return reader;
}
The above method is called every 5 minutes. It works fine ~98% of the time. However, the other 2% of the time, I would get the error file not found in the foreach loop, and after debugging, I saw that the files in reader.Directory are changing in count. The index is updated at certain times by another service, but I can assure that no updates were made to the index anywhere near the times when this error occurs.
Since you have multiple processes writing/reading the same set of files, it is difficult to isolate what is happening. Lucene.NET does locking and exception handling to ensure operations can be synced up between processes, but if you read the files in the directory directly without doing any locking, you need to be prepared to deal with IOExceptions.
The solution depends on how up to date you need the index size to be:
If it is okay to be a bit out of date, I would suggest using DirectoryInfo.EnumerateFiles on the directory itself. This may be a bit more up to date than Directory.ListAll() because that method stores the file names in an array, which may go stale before the loop is done. But, you still need to catch FileNotFoundException and ignore it and possibly deal with other IOExceptions.
If you need the size to be absolutely up to date and plan to do an operation that requires the index to be that size, you need to open a write lock to prevent the files from changing while you get the value.
private long GetIndexSize()
{
// DirectoryReader is superfluous for this example. Also,
// using a MMapDirectory (which DirectoryReader.Open() may return)
// will use more RAM than simply using SimpleFSDirectory.
var directory = new SimpleFSDirectory("validPath");
long size = 0;
// NOTE: The lock will stay active until this is disposed,
// so if you have any follow-on actions to perform, the lock
// should be obtained before calling this method and disposed
// after you have completed all of your operations.
using Lock writeLock = directory.MakeLock(IndexWriter.WRITE_LOCK_NAME);
// Obtain exclusive write access to the directory
if (!writeLock.Obtain(/* optional timeout */))
{
// timeout failed, either throw an exception or retry...
}
foreach (var fileName in directory.ListAll())
{
size += directory.FileLength(fileName);
}
return size;
}
Of course, if you go that route, your IndexWriter may throw a LockObtainFailedException and you should be prepared to handle them during the write process.
However you deal with it, you need to be catching and handling exceptions because IO by its nature has many things that can go wrong. But exactly how you deal with it depends on what your priorities are.
Original Answer
If you have an IndexWriter instance open, Lucene.NET will run a background process to merge segments based on the MergePolicy being used. The default settings can be used with most applications.
However, the settings are configurable through the IndexWriterConfig.MergePolicy property. By default, it uses the TieredMergePolicy.
var config = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer)
{
MergePolicy = new TieredMergePolicy()
};
There are several properties on TieredMergePolicy that can be used to change the thresholds that it uses to merge.
Or, it can be changed to a different MergePolicy implementation. Lucene.NET comes with:
LogByteSizeMergePolicy
LogDocMergePolicy
NoMergePolicy
TieredMergePolicy
UpgradeIndexMergePolicy
SortingMergePolicy
The NoMergePolicy class can be used to disable merging entirely.
If your application never needs to add documents to the index (for example, if the index is built as part of the application deployment), it is also possible to use a IndexReader from a Directory instance directly, which does not do any background segment merges.
The merge scheduler can also be swapped and/or configured using the IndexWriterConfig.MergeScheduler property. By default, it uses the ConcurrentMergeScheduler.
var config = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer)
{
MergePolicy = new TieredMergePolicy(),
MergeScheduler = new ConcurrentMergeScheduler()
};
The merge schedulers that are included with Lucene.NET 4.8.0 are:
ConcurrentMergeScheduler
NoMergeScheduler
SerialMergeScheduler
The NoMergeScheduler class can be used to disable merging entirely. This has the same effect as using NoMergePolicy, but also prevents any scheduling code from being executed.

JavaCPP Leptonica : How to clear memory of pixClone handles

Until now, I've always used pixDestroy to clean up PIX objects in my JavaCPP/Leptonica application. However, I recently noticed a weird memory leak issue that I tracked down to a Leptonica function internally returning a pixClone result. I managed to reproduce the issue by using the following simple test:
#Test
public void test() throws InterruptedException {
String pathImg = "...";
for (int i = 0; i < 100; i++) {
PIX img = pixRead(pathImg);
PIX clone = pixClone(img);
pixDestroy(clone);
pixDestroy(img);
}
Thread.sleep(10000);
}
When the Thread.sleep is reached, the RAM memory usage in Windows task manager (not the heap size) has increased to about 1GB and is not released until the sleep ends and the test finishes.
Looking at the docs of pixClone, we see it actually creates a handle to the existing PIX:
Notes:
A "clone" is simply a handle (ptr) to an existing pix. It is implemented because (a) images can be large and hence expensive to
copy, and (b) extra handles to a data structure need to be made with a
simple policy to avoid both double frees and memory leaks. Pix are
reference counted. The side effect of pixClone() is an increase by 1
in the ref count.
The protocol to be used is: (a) Whenever you want a new handle to an existing image, call pixClone(), which just bumps a ref count. (b)
Always call pixDestroy() on all handles. This decrements the ref
count, nulls the handle, and only destroys the pix when pixDestroy()
has been called on all handles.
If I understand this correctly, I am indeed calling pixDestroy on all handles, so the ref count should reach zero and thus the PIX should have been destroyed. Clearly, this is not the case though. Can someone tell me what I'm doing wrong? Thanks in advance!
As an optimization for the common case when a function returns a pointer it receives as argument, JavaCPP also returns the same object to the JVM. This is what is happening with pixClone(). It simply returns the pointer that the user passes as argument, and thus both img and clone end up referencing the same object in Java.
Now, when pixDestroy() gets called on the first reference img, Leptonica helpfully resets its address to 0, but we've now lost the address, and the second call to pixDestroy() receives that null pointer, resulting in a noop, and a memory leak.
One easy way to avoid this issue is by creating explicitly a new PIX reference after each call to pixClone(), for example, in this case:
PIX clone = new PIX(pixClone(img));

What does it mean to "finalize" in Julia?

I am currently working with the CUDArt package. The GitHub documentation includes the following snippet of code when loading a ptx module containing a custom CUDA C kernel:
md = CuModule("mycudamodule.ptx", false) # false means it will not be automatically finalized
(comment in original)
I am trying to understand what exactly this false option for finalizing means and when I would / would not want to use it. I came across this post on SO (What is the right way to write a module finalize method in Julia?). It quotes from Julia documentation as:
finalizer(x, function)
Register a function f(x) to be called when there are no program-accessible references to x. The behavior of this function is unpredictable if x is of a bits type.
I don't really understand what this means though, or even whether the finalizing here is the same as that referred to in the CUDArt example. For example, it doesn't make sense to me to try to call a function on an argument x when that argument isn't accessible to the program - how could this even be possible? Thus, I would appreciate any help in clarifying:
What it means to "finalize" in Julia and
When I would/would not want to use it in the context of importing .ptx modules with CUDArt
I can't speak for CUDArt, but here is what finalize means in Julia: when the garbage collector detects that the program can no longer access the object, then it will run the finalizer, and then collect (free) the object. Note that the garbage collector can still access the object, even though the program cannot.
Here is an example:
julia> type X
a
end
julia> j = X(1) # create new X(1) object, accessible as j
julia> finalizer(j, println) # print the object when it's finalized
julia> gc() # suggest garbage collection; nothing happens
julia> j = 0 # now the original object is no longer accessible by the program
julia> gc() # suggest carbage collection
X(1) # object was collected... and finalizer was run
This is useful so that external resources (such as file handles or malloced memory) are freed if an object is collected.
I cannot comment, but I would like to add that from docs:
finalizer(f, x)
f must not cause a task switch, which excludes most I/O operations such as println. Using the #async macro (to defer context switching to outside of the finalizer) or ccall to directly invoke IO functions in C may be helpful for debugging purposes.

Execute command block in primitive in NetLogo extension

I'm writing a primitive that takes in two agentsets and a command block. It needs to call a few functions, execute the command block in the current context, and then call another function. Here's what I have so far:
class WithContext(pushGraphContext: GraphContext => Unit, popGraphContext: api.World => GraphContext)
extends api.DefaultCommand {
override def getSyntax = commandSyntax(
Array(AgentsetType, AgentsetType, CommandBlockType))
def perform(args: Array[Argument], context: Context) {
val turtleSet = args(0).getAgentSet.requireTurtleSet
val linkSet = args(1).getAgentSet.requireLinkSet
val world = linkSet.world
val gc = new GraphContext(world, turtleSet, linkSet)
val extContext = context.asInstanceOf[ExtensionContext]
val nvmContext = extContext.nvmContext
pushGraphContext(gc)
// execute command block here
popGraphContext(world)
}
}
I looked at some examples that used nvmContext.runExclusively, but that looked like it's specifically for having a given agentset run the command block. I want the current agent (possibly the observer) to run it. Should I wrap nvm.agent in an agentset and pass that to nvmContext.runExclusively? If so, what's the easiest way to wrap an agent in agentset? If not, what should I do?
Method #1
The quicker-but-arguably-dirtier method is to use runExclusiveJob, as demonstrated in (e.g.) the create-red-turtles command in https://github.com/NetLogo/Sample-Scala-Extension/blob/master/src/SampleScalaExtension.scala .
To wrap the current agent in an agentset, you can use agent.AgentSetBuilder. (You could also pass an Array[Agent] of length 1 to one of the ArrayAgentSet constructors, but I'd recommend AgentSetBuilder since it's less reliant on internal implementation details which are likely to change.)
Method #2
The disadvantage of method #1 is the slight constant overhead associated with creating and setting up the extra AgentSet, Job, and Context objects and directing execution through them.
Creating and running a separate job isn't actually how built-in commands like if and while work. Instead of making a new job, they remain in the current job and cause commands in a command block to run (or not run) by manipulating the instruction pointer (nvm.Context.ip) to jump into them or skip over them.
I believe an extension command could do the same. I'm not certain if it has been tried before, but I can't see any reason it wouldn't work.
Doing it this way would involve understanding more about NetLogo engine internals, as documented at https://github.com/NetLogo/NetLogo/wiki/Engine-architecture . You'd model your primitive after e.g. https://github.com/NetLogo/NetLogo/blob/5.0.x/src/main/org/nlogo/prim/etc/_if.java , including altering your implementation of nvm.CustomAssembled. (Note that prim._extern, which runs extension commands, delegates its assemble method to the wrapped command's own assemble method, so this should work.) In your assemble method, instead of calling done() at the end to terminate the job, you'd just allow execution to fall through.
I could try to construct an example that works this way, but it'd take me a couple hours; it's probably not worth me doing unless there's a real need.