JVM C1 and C2 Compile Time Benchmark

JVM C1 and C2 Compile Time Benchmark - jvm

I have simple hello world program, and set -XX:TieredStopAtLevel=[0 to 4]. I understand the basic difference how it uses either interpreter or C1 or C2 or both C1 and C2 to compile code.
I'd like to know real benchmark details to know compile time, and others details if i use different numbers.

To benchmark JIT compilers, use -Xcomp to force compilation of all executed methods, and check CompilationMXBean.getTotalCompilationTime to find the total time spent in JIT compilation.
Example
import java.lang.management.ManagementFactory;
public class CompilationTime {
public static void main(String[] args) throws Exception {
System.out.println(ManagementFactory.getCompilationMXBean().getTotalCompilationTime());
}
}
C1
$ java -Xcomp -XX:TieredStopAtLevel=1 CompilationTime
162 // milliseconds
C2
$ java -Xcomp -XX:-TieredCompilation CompilationTime
1129 // milliseconds

Related

Is there a way to forward declare tables to remove cyclic dependency in flatbuffer schema?

I don't understand how to forward_declare tables in flat buffer schema.
//in c1.fbs
include "c2.fbs"
table C1
{
c2 : C2;
}
//in c2.fbs
include "c1.fbs"
table C2
{
c1: C1;
}
Problems:
Above schema compiles fine with flatc (1.8.0), but causes a cyclic dependency in generated cpp headers! Shouldn't flatc complain too?
How to forward declare C2 in c1.fbs, and remove the call: include "c2.fbs"
PS:
More specifically, I stumbled this issue trying to mimic following class structure in fbs schema.
union Cs {C2, C3};
class C1
{
Cs x;
}
class C2 : public C1
{
List <C1> y;
}
class C3 : public C1
{
}
Please help.

There are no forward declaration statements in the FlatBuffers schema language, FlatBuffers supports cycles automatically. What you're seeing must be a bug in the C++ generator specifically (not sure why, as it does pre-declare everything), so please submit an issue on the FlatBuffers github site.
a workaround may be to stick them in a single file for now.

How do I invoke a Java method from perl6

use java::util::zip::CRC32:from<java>;
my $crc = CRC32.new();
for 'Hello, Java'.encode('utf-8') {
$crc.'method/update/(B)V'($_);
}
say $crc.getValue();
sadly, this does not work
Method 'method/update/(B)V' not found for invocant of class 'java.util.zip.CRC32'
This code is available at the following links. It is the only example I've been able to find
Rakudo Perl 6 on the JVM (slides)
Perl 6 Advent Calendar: Day 03 – Rakudo Perl 6 on the JVM

Final answer
Combining the code cleanups explained in the Your answer cleaned up section below with Pepe Schwarz's improvements mentioned in the Expectation alert section below we get:
use java::util::zip::CRC32:from<Java>;
my $crc = CRC32.new();
for 'Hello, Java'.encode('utf-8').list {
$crc.update($_);
}
say $crc.getValue();
Your answer cleaned up
use v6;
use java::util::zip::CRC32:from<Java>;
my $crc = CRC32.new();
for 'Hello, Java'.encode('utf-8').list { # Appended `.list`
$crc.'method/update/(I)V'($_);
}
say $crc.getValue();
One important changed bit is the appended .list.
The 'Hello, Java'.encode('utf-8') fragment returns an object, a utf8. That object returns just one value (itself) to the for statement. So the for iterates just once, passing the object to the code block with the update line in it.
Iterating just once could make sense if the update line was .'method/update/([B)V', which maps to a Java method that expects a buffer of 8 bit ints, which is essentially what a Perl 6 utf8 is. However, that would require some support Perl 6 code (presumably in the core compiler) to marshal (automagically convert) the Perl 6 utf8 into a Java buf[] and if that code ever existed/worked it sure isn't working when I test with the latest Rakudo.
But if one appends a judicious .list as shown above and changes the code block to match, things work out.
First, the .list results in the for statement iterating over a series of integers.
Second, like you, I called the Integer arg version of the Java method (.'method/update/(I)V') instead of the original buffer arg version and the code then worked correctly. (This means that the binary representation of the unsigned 8 bit integers returned from the Perl 6 utf8 object is either already exactly what the Java method expects or is automagically marshaled for you.)
Another required change is that the from<java> needs to be from<Java> per your comment below -- thanks.
Expectation alert
As of Jan 2015:
Merely using the JVM backend for Rakudo/NQP (i.e. running pure P6 code on a JVM) still needs more hardening before it can be officially declared ready for production use. (This is in addition to the all round hardening that the entire P6 ecosystem is expected to undergo this year.) The JVM backend will hopefully get there in 2015 -- it will hopefully be part of the initial official launch of Perl 6 being ready for production use this year -- but that's going to largely depend on demand and on there being more devs using it and contributing patches.
P6 code calling Java code is an additional project. Pepe Schwarz has made great progress in the last couple months in getting up to speed, learning the codebase and landing commits. He has already implemented the obviously nicer shortname calling shown at the start of this answer and completed a lot more of the marshaling logic for converting between P6 and Java types and is actively soliciting feedback and requests for specific improvements.

The code which is responsible for this area of Java interop is found in the class org.perl6.nqp.runtime.BootJavaInterop. It suggests that the overloaded methods are identified by the string method/<name>/<descriptor>. The descriptor is computed in function org.objectweb.asm.Type#getMethodDescriptor. That jar is available through maven from http://mvnrepository.com/artifact/asm/asm.
import java.util.zip.CRC32
import org.objectweb.asm.Type
object MethodSignatures {
def printSignature(cls: Class[_], method: String, params: Class[_]): Unit = {
val m = cls.getMethod(method, params)
val d = Type.getMethodDescriptor(m)
println(m)
println(s"\t$d")
}
def main(args: Array[String]) {
val cls = classOf[CRC32]
# see https://docs.oracle.com/javase/8/docs/api/java/util/zip/CRC32.html
val ab = classOf[Array[Byte]]
val i = classOf[Int]
printSignature(cls, "update", ab)
printSignature(cls, "update", i)
}
}
This prints
public void java.util.zip.CRC32.update(byte[])
([B)V
public void java.util.zip.CRC32.update(int)
(I)V
Since I want to call the update(int) variant of this overloaded method, the correct method invocation (on line 5 of the example program) is
$crc.'method/update/(I)V'($_);
This crashes with
This representation can not unbox to a native int
finally, for some reason I do not understand, changing the same line to
$crc.'method/update/(I)V'($_.Int);
fixes that and the example runs fine.
The final version of the code is
use v6;
use java::util::zip::CRC32:from<java>;
my $crc = CRC32.new();
for 'Hello, Java'.encode('utf-8') {
$crc.'method/update/(I)V'($_.Int);
}
say $crc.getValue();

I got this to work on Perl 6.c with following modification (Jan 4, 2018):
use v6;
use java::util::zip::CRC32:from<JavaRuntime>;
my $crc = CRC32.new();
for 'Hello, Java'.encode('utf-8').list {
$crc.update($_);
}
say $crc.getValue();
Resulting in:
% perl6-j --version
This is Rakudo version 2017.12-79-g6f36b02 built on JVM
implementing Perl 6.c.
% perl6-j crcjava.p6
1072431491

Compile-time information in CUDA

I'm optimizing a very time-critical CUDA kernel. My application accepts a wide range of switches that affect the behavior (for instance, whether to use 3rd or 5th order derivative). Consider as an approximation a set of 50 switches, where every switch is an integer variable (a bool sometimes, or a float, but this case is not so relevant for this question).
All these switches are constant during the execution of the application. Most of these switches are run-time and I store them in constant memory, so to exploit the caching mechanism. Some other switches can be compile-time and the customer is fine with having to re-compile the application if he wants to change the value in the switch. A very simple example could be:
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
Assume that do_this and do_that are compute-bound and very cheap, that I optimize the for loop so that its overhead is negligible, that I have to place the if inside the iteration. If the compiler recognizes that compile_time_switch is static information it can optimize out the call to the "wrong" function and create code that is just as optimized as if the if weren't there. Now the real question:
In which ways can I provide the compiler with the static value of this switch? I see two such ways, listed below, but none of them work for me. What other possibilities remain?
Template parameters
Providing a template parameter enables this static optimization.
template<int compile_time_switch>
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
This simple solution does not work for me, since I don't have direct access to the code that calls the kernel.
Static members
Consider the following struct:
struct GlobalParameters
{
static const bool compile_time_switch = true;
};
Now GlobalParameters::compile_time_switch contains the static information as I want it, and that compiler would be able to optimize the kernel. Unfortunately, CUDA does not support such static members.
EDIT: the last statement is apparently wrong. the definition of the struct is of course legit and you are able to use the static member GlobalParameters::compile_time_switch in device code. The compiler inlines the variable, so that the final code will directly contain the value, not a run-time variable access, which is the behavior you would expect from an optimizer compiler. So, the second options is actually suitable.
I consider my problem solved both thanks to this fact and to kronos' answer. However, I'm still looking for other alternative methods to provide compile-time information to the compiler.

Yor third options are preprocessor definitions:
#define compile_time_switch 1
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
The preprocessor will discard the else case compleatly and the compiler has nothing to optimize in his dead code elemination pass, because there is no dead code.
Furthermore, you can specify the definition with the -D comand line switch and (I think) any by nvidia supported compiler will accept -D (msvc may use a different switch).

overload and override which happens when : compile or runtime

Overload and Override: which one happens at compile time and which one at runtime?

It depends on which language you're using, and how you're using it.
For example, in Java the overload resolution is always performed at compile-time, with override resolution is performed at execution time.
In C# that's still normally true - but if you're using C# 4's dynamic typing feature, overload resolution is performed at execution time too:
static void Foo(int y) {}
static void Foo(string y) {}
...
dynamic x = 10;
Foo(x); // Calls Foo(int)
x = "hello";
Foo(x); // Calls Foo(string)
There are plenty of other languages which behave dynamically too. So you really need to learn the behaviour of the language you're using at the time.

Overload -> Compile time
Override -> Runtime

How can I grab single key hit in D Programming Language + Tango?

I read this article and try to do the exercise in D Programming Language, but encounter a problem in the first exercise.
(1) Display series of numbers
(1,2,3,4, 5....etc) in an infinite
loop. The program should quit if
someone hits a specific key (Say
ESCAPE key).
Of course the infinite loop is not a big problem, but the rest is. How could I grab a key hit in D/Tango? In tango FAQ it says use C function kbhit() or get(), but as I know, these are not in C standard library, and does not exist in glibc which come with my Linux machine which I use to programming.
I know I can use some 3rd party library like ncurses, but it has same problem just like kbhit() or get(), it is not standard library in C or D and not pre-installed on Windows. What I hope is that I could done this exercise use just D/Tango and could run it on both Linux and Windows machine.
How could I do it?

Here's how you do it in the D programming language:
import std.c.stdio;
import std.c.linux.termios;
termios ostate; /* saved tty state */
termios nstate; /* values for editor mode */
// Open stdin in raw mode
/* Adjust output channel */
tcgetattr(1, &ostate); /* save old state */
tcgetattr(1, &nstate); /* get base of new state */
cfmakeraw(&nstate);
tcsetattr(1, TCSADRAIN, &nstate); /* set mode */
// Read characters in raw mode
c = fgetc(stdin);
// Close
tcsetattr(1, TCSADRAIN, &ostate); // return to original mode

kbhit is indeed not part of any standard C interfaces, but can be found in conio.h.
However, you should be able to use getc/getchar from tango.stdc.stdio - I changed the FAQ you mention to reflect this.

D generally has all the C stdlib available (Tango or Phobos) so answers to this question for GNU C should work in D as well.
If tango doesn't have the needed function, generating the bindings is easy. (Take a look at CPP to cut through any macro junk.)

Thanks for both of your replies.
Unfortunately, my main development environment is Linux + GDC + Tango, so I don't have conio.h, since I don't use DMC as my C compiler.
And I also found both getc() and getchar() is also line buffered in my development environment, so it could not achieve what I wish I could do.
In the end, I've done this exercise by using GNU ncurses library. Since D could interface C library directly, so it does not take much effort. I just declare the function prototype that I used in my program, call these function and linking my program against ncurses library directly.
It works perfectly on my Linux machine, but I still not figure out how could I do this without any 3rd party library and could run on both Linux and Windows yet.
import tango.io.Stdout;
import tango.core.Thread;
// Prototype for used ncurses library function.
extern(C)
{
void * initscr();
int cbreak ();
int getch();
int endwin();
int noecho();
}
// A keyboard handler to quit the program when user hit ESC key.
void keyboardHandler ()
{
initscr();
cbreak();
noecho();
while (getch() != 27) {
}
endwin();
}
// Main Program
void main ()
{
Thread handler = new Thread (&keyboardHandler);
handler.start();
for (int i = 0; ; i++) {
Stdout.format ("{}\r\n", i).flush;
// If keyboardHandler is not ruuning, it means user hits
// ESC key, so we break the infinite loop.
if (handler.isRunning == false) {
break;
}
}
return 0;
}

As Lars pointed out, you can use _kbhit and _getch defined in conio.h and implemented in (I believe) msvcrt for Windows. Here's an article with C++ code for using _kbhit and _getch.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

JVM C1 and C2 Compile Time Benchmark - jvm

I have simple hello world program, and set -XX:TieredStopAtLevel=[0 to 4]. I understand the basic difference how it uses either interpreter or C1 or C2 or both C1 and C2 to compile code. I'd like to know real benchmark details to know compile time, and others details if i use different numbers.

Related

Is there a way to forward declare tables to remove cyclic dependency in flatbuffer schema?

How do I invoke a Java method from perl6

Compile-time information in CUDA

overload and override which happens when : compile or runtime

How can I grab single key hit in D Programming Language + Tango?

Categories

Resources