I use gurobi to solve a LP problem. The method is barrier method and I disabled crossover, since I'm satisfied with the sub-optimal solution, and crossover takes forever.
However, after the sub-optimal termination, the console is still running (with huge use of memory). The solution takes 0.5 hours, but I have to wait for hours until I can continue.
I used python command line and Spyder.
Here is a summary of the (modified) log:
Barrier performed 500 interactions in 2000 seconds
Sub-optimal termination - objective XX
(here it takes hours)
warning: a sub-optimal solution is available
(here it takes hours)
The complete log is here: https://drive.google.com/open?id=1q2Z6QJNmXTWSRsJqxtEpUlCwhXnLIhFf
I expect the console would return the results immediately after the termination. The solving is fast, but it takes hours for something else.
Is there anything wrong? What can I do to make it faster?
Related
Recently I started using PITest for Mutation Testing. Post building my project using maven when I run the command mvn org.pitest:pitest-maven:mutationCoverage I get this error bunch of times:
-stderr : objc[2787]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/bin/java and /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be ustderr : sed. Which one is undefined.
Sometimes the error is followed by
PIT >> WARNING : Slave exited abnormally due to MEMORY_ERROR
or PIT >> WARNING : Slave exited abnormally due to TIMED_OUT
I use OsX version 10.10.4 and Java 8 (jdk1.8.0_74).
Any fix/ work-around for this?
Don't worry about this;
-stderr : objc[2787]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/bin/java and /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be ustderr : sed. Which one is undefined.
This is just for information that there are two implementations of JavaLauncherHelper and the message tells you that one of the two will use std-err output stream but it is undetermined which one of the two. It is a known isse, see also this question
The other two are a result of what PIT is doing: it's modifying the byte code and it may happen that this not just affects the output of an operation (detected by a test) but actually affects the runtime behavior. For example if boundaries of a loop get changed that way, that the loop runs endlessly. Pit is capable of detecting this and prints out an error. Mutations detected by either a memory error or a timeout error can be considered as "killed". But you should check each of those individually as they could be false positives, too.
PIT >> WARNING : Slave exited abnormally due to MEMORY_ERROR
means the modified code produces more or larger objects so the forked jvm runs out of memory. Imagine a loop like this
while(a < b){
list.add(new Object());
a++;
}
And the a++ gets altered to a--. The loop may eventually end, but it's more likely you run out of memory before that.
From the documentation
A memory error might occur as a result of a mutation that increases the amount of memory used by the system, or may be the result of the additional memory overhead required to repeatedly run your tests in the presence of mutations. If you see a large number of memory errors consider configuring more heap and permgen space for the tests.
The timeout issue is similar to this, the reason coud be either, that you run an infinite loop or that system thinks you run an infinite loop, i.e. when the system is too slow to compute the altered code. If you experience a lot of timeouts you should consider increasing the timeout value. But be carefull, as this may impact the overall execution time.
From the FAQ
Timeouts when running mutation tests are caused by one of two things
1 A mutation that causes an infinite loop
2 PIT thinking an infinite loop has occured but being wrong
In order to detect infinite loops PIT measures the normal execution time of each test without any mutations present. When the test is run in the presence of a mutation PIT checks that the test doesn’t run for any longer than
normal time * x + y
Unfortunately the real world is more complex than this.
Test times can vary due to the order in which the tests are run. The first test in a class may have a execution time much higher than the others as the JVM will need to load the classes required for that test. This can be particularly pronounced in code that uses XML binding frameworks such as jaxb where classloading may take several seconds.
When PIT runs the tests against a mutation the order of the tests will be different. Tests that previously took miliseconds may now take seconds as they now carry the overhead of classloading. PIT may therefore incorrectly flag the mutation as causing an infinite loop.
An fix for this issue may be developed in a future version of PIT. In the meantime if you encounter a large number of timeouts, try increasing y in the equations above to a large value with –timeoutConst (timeoutConstant in maven).
I'm doing some work on profiling the behavior of programs. One thing I would like to do is get the amount of time that a process has run on the CPU. I am accomplishing this by reading the sum_exec_runtime field in the Linux kernel's sched_entity data structure.
After testing this with some fairly simple programs which simply execute a loop and then exit, I am running into a peculiar issue, being that the program does not finish with the same runtime each time it is executed. Seeing as sum_exec_runtime is a value represented in nanoseconds, I would expect the value to differ within a few microseconds. However, I am seeing variations of several milliseconds.
My initial reaction was that this could be due to I/O waiting times, however it is my understanding that the process should give up the CPU while waiting for I/O. Furthermore, my test programs are simply executing loops, so there should be very little to no I/O.
I am seeking any advice on the following:
Is sum_exec_runtime not the actual time that a process has had control of the CPU?
Does the process not actually give up the CPU while waiting for I/O?
Are there other factors that could affect the actual runtime of a process (besides I/O)?
Keep in mind, I am only trying to find the actual time that the process spent executing on the CPU. I do not care about the total execution time including sleeping or waiting to run.
Edit: I also want to make clear that there are no branches in my test program aside from the loop, which simply loops for a constant number of iterations.
Thanks.
Your question is really broad, but you can incur context switches for various reasons. Calling most system calls involves at least one context switch. Page faults cause contexts switches. Exceeding your time slice causes a context switch.
sum_exec_runtime is equal to utime + stime from /proc/$PID/stat, but sum_exec_runtime is measured in nanoseconds. It sounds like you only care about utime which is the time your process has been scheduled in user mode. See proc(5) for more details.
You can look at nr_switches both voluntary and involuntary which are also part of sched_entity. That will probably account for most variation, but I would not expect successive runs to be identical. The exact time that you get for each run will be affected by all of the other processes running on the system.
You'll also be affected by the amount of file system cache used on your system and how many file system cache hits you get in successive runs if you are doing any IO at all.
To give a very concrete and obvious example of how other processes can affect the run time of the current process, think about if you are exceeding your physical RAM constraints. If your program asks for more RAM, then the kernel is going to spend more time swapping. That time swapping will be accounted in stime but will vary depending on how much RAM you need and how much RAM is available. There are lot's of other ways that other processes can affect your process's run time. This is just one example.
To answer your 3 points:
sum_exec_runtime is the actual time the scheduler ran the process including system time
If you count switching to the kernel as the process giving up the CPU, then yes, but it does not necessarily mean a different user process may get the CPU back once the kernel is done.
I think I've already answered this question that there are lot's of factors.
I have written a simple SUDOKU solver. To roughly test the performance I'm using simple System.currentTimeMillis calls.
I have prepared a set of initial sudoku configuration in text file. The program reads the file and solves each sudoku configuration. When running the tests I have noticed that the first 3-4 solve runs are really slower than the rest and by slower I mean by order of magnitude.
There is sample pseudo-code snippet:
main(){
while(file has lines){
configuration = readLine();
Solver s = new Solver(configuration);
now1 = System.currentTimeMillis();
s.solve();
now2 = System.currentTimeMillis();
System.out.print(now2 - now1);
}
}
I measure only solve() method, so IO is not a problem, I even hardcoded some data into program - still first few slower. The difficulty of puzzle is not an issue as well I have tried different permutations and difficulties of configuration and always the same - first few are slower.
My question is - why is that and is there a way to prevent it?
This is supposed to happen. The JIT compiler optimizes code that gets called more often as your program runs for longer.
This only reflects the general fact that the technique you're using to test performance simply isn't reliable in Java.
In Practice, Methods are not compiled by JIT the first time they are called by JVM. For each method, the JVM maintains a call count, which is incremented every time the method is called. The JVM interprets a method until its call count exceeds a JIT compilation threshold. Therefore, often-used methods are compiled soon after the JVM has started , and less-used methods are compiled much later, or not at all. This JIT compilation Threshold helps the JVM start quickly.
So, the busiest methods of java program are always optimized most
aggresively which increases its execution speed each time it is called by program.
Here is the Source for above information.
In performance testing engagements we'd always run the system being tested for a while to let it reach a steady state. Only then would we start the performance metrics. You might try the same: run the solve() method a number of times before capturing your metrics.
I have some cuda code running through some FFTs and other math operations, which works on blocks of 2^n as requested by the user. The code works well when first run, but after running long enough it starts to fail. Eventually it will get to the point where if I run any block size larger then 2^ll I get no data back (all zeros). I've done some testing by modifying the kernel code and from what I can tell the kernel is not executing. I'm trying to figure out why my code stops producing data after multiple iterations on large block sizes.
The issue looks at first glance to be memory leak. I know I have to run multiple iterations of the processing to cause an error. At first only large block sizes will stop working, but as I run more iterations smaller block sizes will start to fail as well. The reason I'm not certain the issue is memory is that my code will work for a block size lower then 2^11 regardless of how many iterations I run. If this was a simple memory leak I would have expected the symptoms to get progressively worse until I couldn't access any memory on the card.
I've also noticed that larger block sizes (roughly equivalent to the amount of memory each thread uses) tend to cause the program to fail sooner. Increasing the number of blocks processed (ie number of Cuda threads) doesn't seem to have an affect on when the code starts to fail.
as far as I can tell no error code is being returned, the kernel doesn't appear to execute at all.
Can anyone suggest what my be causing this issue? I would settle for any insight in how to debug code running on the GPU or to monitor GPU memory availability.
If you need more computation done, bump up your grid size and not your thread block size. To quote the CUDA programming guide 3.0 on pg. 8, "On current GPUs, a thread block may contain up to 512
threads."
This means that threadIdx.x * threadIdx.y * threadIdx.z <= 512 at all times. If you maintain that invariant, do things work?
We are facing an issue with VB.NET listeners that utilizes high CPU (50% to 70%) in the server machine where it is running. Listeners are using a threading concept and also we used FileSystemWatcher class to keep monitoring the file renaming pointing to one common location. Both are console applications and scheduled jobs running all the day.
How can I control the CPU utilization with this FileSystemWatcher class?
This could all depend on the code you are running.
For instance if you have a timer with an interval of 10ms but only do work every two minutes and on each timer interval you do a lot of checking this will take a lot of CPU to do nothing.
If you are using multiple threads and one is looping waiting for the second to release a lock (Monitor.TryEnter()) then again this may be taking up extra CPU. You can avoid this by putting the waiting thread into Monitor.Wait() and then when the busy thread is finished do Monitor.Pulse().
Apart for the very general advice above, if you post the key parts of your code or profile results then we may be able to help more.
If you are looking for a profiler we use RedGates ANTS Profiler (costs but with a free trial) and it give good results, I haven't used any other to compare (and I am in no way affiliated with RedGate) so others may be better.