How to read numpy array in ND4j - pandas

I have too components that deal with n-dimension array. One component is written in python which process the data and save the processed ndarray by tobytes(). Now the other component is written in java, which need to read the serialized ndarray produced in first component.
I am curious if there are any existing java libraries that can read serialized numpy array. Or there is a better way to communicate ndarray between java & python.
Any advice is appreciated!
Thank you!

ND4J supports reading from and writing to Numpy arrays. Look at the ND4J javadocs for xxxNpyYYYArray methods .
It can read and write from/to files, byte arrays and even raw pointers to a numpy array.
The pointer methods allow for using the arrays without copying or serialization. We use the pointer methods inside jumpy (which runs Java via pyjnius) and when using javacpp's cpython/numpy preset to run a cpython interpreter inside a Java process.

I have used Apache Arrow to solve this.
First the pyarrow package has a numpy ndarray API to serialize the array into bytes. Basically the ndarray becomes an Arrow bytes sequence batch.
Then the java API provides a VectorSchemaRoot to read it from the bytes. And you could get the values in the Arrow array. You could use this array to create ND4J array(if you need), or directly operate your array.
For detailed operations you could refer to Apache Arrow doc, and if any obstacles we could discuss here.
Also, Arrow uses native memory to store the buffer so the data is off the java heap. This may be an issue at some point.
Any other solutions could also share with me. :)

Related

Iterating through Kotlin map from C native export

We have a Kotlin package that we native build and export to C. We have the header file with all the nested struct and pinned-style pointers.
In the Kotlin code, there is a Map which we want to access. We can get a hold of the Kotlin package enum (the key of the Map), but what's the C code for actually indexing into the "kref kotlin Map object" to get to the value in the map?
Basically, we'd like to know how to manipulate Map, List and Array from C code. Actual steps and/or reference doc would be appreciated.
Kotlin/Native compiler does not export any of the collection's functions to the native library API. This decision was taken some time ago, with the idea to minimize the verbosity of the library header. However, this leads to the problem you faced. Right now, the recommended approach is to write wrapper functions in your Kotlin code.For an example of this approach, please see this ticket at the Kotlin issue tracker. I also recommend subscribing to it, to get the updates on the problem's state ASAP. Posting this in case the ticket won't be available for someone:
fun getListElement(list: List<Any?>, index: Int) = list.get(index)
/// function accessing the list element by index

Pass native array as managed array from C++/CLI

I have 3rd party lib that controls camera.
It is written in C++ and I want to consume it from C#. The lib grabs image as byte*.
There are some ways to pass this to C#.
Pass byte* directly to C# with unsafe statement. In this case, C# code can't change image size like crop operation.
Use Marshal.Copy to create managed byte array and pass it. This gives flexibility to C#, but there is overhead and additional memory usage.
So, my question is:
Is there a way to convert byte* to managed byte[] without copy? Because managed byte array with fixed statement is "blittable", I think there can be "reverse direction" method.
Any help is appreciated, thank you.

How to receive bytes in Managed C++ project from COM plus project

I have a module A in Managed C++, it depends on module B in native C++ which wrapped as COM plus.
In module B, I read bytes from a file. Now I am trying to call the file reading functionality from A. But failed.
Dependency detail: I used tlbimp.exe and generated the interop according to Module B. A referrs to the interop.
I tried to pass an "array^" but only one char was received, which is understandable because marshaling doesn't know the array length and could NOT handle the whole array.
I searched out some recommendation about safe array, but could NOT use it successfully in my projects.
Could somebody help me on this?
Thanks a lot.
If you are going to be talking to your native object via COM, you're going to have to pass the array the COM way.
SAFEARRAY would definitely work, but you don't have to use it. It is a fair amount of work to set up anyway. If neither component is a scripting language or VB6, there is little value to using a SAFEARAY.
COM can marshal the array just fine, you just have to tell it how big it is. The two most common mechanisms in COM to pass (native) arrays are "fixed-sized arrays" and "conformant arrays".
Fixed-size array:
If you know at compile time the size of the array, this is the way to go. Declare your COM method as follows in your IDL:
...
const long ARRAY_SIZE = 1024;
...
HRESULTS MethodAbc(MyClass array[ARRAY_SIZE]);
Marshalling will take care of passing the whole array.
Conformant Arrays:
You declare them as follows in IDL:
HRESULT MethodAbc([size_is(arraySize)] MyClass array[], long arraySize);
This tells COM that the arraySize parameter holds the count of elements.
My experience with CLI is minimal, but I don't think you can just pass a CLI handle. Among other things, I believe you need to pin the pointer so that GC doesn't move the array during the COM call. Others please correct me here if I'm wrong.

Qt Webkit Bridge: C++ access to JavaScript

I am using a QWebView and call to setHtml() to display some HTML/JavaScript pages. I am passing data updates using QWebFrame::evaluateJavaScript by passing it a QString containing a call to a JavaScript function (with arguments). After reading about the Qt WebKit Bridge I
feel like there should be a better way to do his. I see discussion of exposing Qt objects / functions in C++ to the JavaScript, but not the other direction. (I do not want the JavaScript to poll the C++ side for updates.) Is there a way to connect a JavaScript function as a slot to a Qt/C++ signal? (Or a similar pattern) I feel like I have looked through a lot of docs, posted questions (even the 'Similar Questions' as I type this), but have not seen examples of this. Appreciate any info, links or nudges in the right direction.
Use addToJavaScriptWindow object for exposing your C++ objects the Javascript tier:
page()->mainFrame()->addToJavaScriptWindowObject(QString("myObject"), myObject);
Check documentation here:
http://qt-project.org/doc/qt-4.8/qwebframe.html#addToJavaScriptWindowObject
That will expose your C++ myObject as "myObject".
Then, you can do exactly what you are looking for on your post, let's suppose myObject declares a signal in the following way:
signal:
mySignal(QString aParameter);
you can simply connect that signal to a slot on your web side as follows:
myObject.mySignal.connect(this, this.mySignalSlot);
In this case, mySignalSlot should have the same signature than mySignal declaration at the C++ class side (well, "same signature" would mean same number of parameters as javascript is not typed, so you won't need to declare a type for each of them).
What's really usefull here is to pass JSon objects. In case you need to pass big amount of data you can use base64 encoding. For images, QTWebKit supports natively QImage and QPixmap classes; these last two are natively encoded as complex javascript objects by QTWebKit Bridge engine.

How to use other clustering methods for clustergram in Matlab's bioinformatics toolbox

EDIT: I figured it out. Just did not understand notation.
Hello,
Hopefully someone out there is familiar with the clustergram in the bioinformatics toolbox. I am interested in the graphical aspects of the function (the dendrogram/heat map), but am currently handicapped as it requires me to use Matlab's cluster() function. I would prefer to use my personal algorithm to cluster, and then allow Matlab to visualize this for me.
I have searched the code, but am woefully ignorant about object oriented programming in general, and Matlab's version in particular. Thus all I know is the function calls the line 'obj = obj.getclusters', but have no idea how to edit it this such that I use my own clustering algorithm instead of Matlab's.
Any help is appreciated!
EDIT: I am specifically working on a new algorithm, hence why I have no need for pdist or linkage. The dendrograms are calculated outside the clustergram function. All I am using to create the dendrogram/heatmap is the clustergram function. My Bioinformatics toolbox is version 3.3
Really, all I am looking for here is what the hell does 'obj = obj.getclusters;' do? I am not a programmer and really am not familiar with OO. To me, that looks like we magically have clusters, as there is no function call. This is at line 304 of clustergram()
First I have later versions of Bioinformatics Toolbox (3.4 and later), and for those versions clustergram.m file does not have the line obj = obj.getclusters;
Remember CLUSTERGRAM in the class (not function as it was it older version). When you run clustergram(data,...) you actually run the constructor method of this class to create clustergram object. This object is obj variable. So when you run obj = obj.getclusters; you actually run getclusters method in clustergram class, which updates the object obj.
To get more details what getclusters method is doing look for a following line in methods block:
function obj = getcluster(obj)
In the latest versions there is method computeClusters defined as
function computeClusters(obj)
This method computes both dendrograms for rows and columns and updates the object. You can directly alter this function, of course, but I wouldn't recommend it. It's much better to develop separate functions for distance metric and linkage and use those functions to construct clustergram object.
If your algorithm does not use distance and linkage, please explain how it's suppose to build dendrograms. Does it create linkage matrix same as output of LINKAGE function? Without such matrix I don't think you can use clustergram even for visualization only. Do you have an example how your clustergram should look like? May be you can use Heatmap class of other simpler functions like IMAGE or IMAGESC.