Given an Octave dataframe object created as
c = cell(m,n);
%populate c...
pkg load dataframe
df = dataframe(c);
(see https://octave.sourceforge.io/dataframe/overview.html),
Is it possible to access the underlying cell array?
Is it there a conversion mechanism back to cell array?
Is it possible to save df to CSV?
Yes. A dataframe object, like any object, can be converted back into a struct.
Once you have the resulting struct, look for the fields x_name to get the column names, and x_data to get the data in the form of a cell array, i.e.
struct(df).x_data
As for conversion to csv, the dataframe package does not seem to provide any relevant methods as far as I can tell (in particular the package does not provide an overloaded #dataframe/csvwrite method). Therefore, I'd just extract the information as above, and go about writing it into a csv file from there.
If you're not dealing with strictly numerical data, you might want to have a look at the cell2csv / csv2cell methods from the io package (since the built-in csvwrite function is strictly for numerical data).
And if that doesn't do exactly what you want, I'd probably just go for creating a csv file manually via custom fprintf statements.
PS. You can generally see what methods a package provides via pkg describe -verbose dataframe, or the methods for a particular class via methods(dataframe) (or even methods(df)). Also, if you ever wanted to access the documentation for an overloaded method, e.g. say the summary method, then this is the syntax for doing so: help #dataframe/summary
Related
I have a numpy array that is created by the third party library, namely proj_data object. Now I am trying to debug my code and I would like to get the "order" parameter of proj_data array.
However there is no such attribute that would tell me which order specifier was used when the array was created.
I need to know that, because the array is in turn used to transfer data to GPU and different order probably produces an error. Ideally I would get solution not by code analysis but really by some reflection of the proj_data object.
I am trying to use the Accord.net library to build test method of several of the machine learning algorithms that library supports.
One of the issues I have run into is that when I am trying to codify my string data, the Codification class does not seem capable of dealing with any datatable columns that are not strings, despite the documentation saying otherwise.
Codification codebook = new Codification(fulldata, AllAttributeNames);
I call that line where fulldata is a datatable, and I have tried including columns of both Int32 type and Double type, and the Codification class has thrown an error saying it is unable to convert them to type String.
"System.InvalidCastException: 'Unable to cast object of type 'System.Double' to type 'System.String'.'"
EDIT: It turns out this error is because the Codification system can only handle alternate data types if it is encoding the entire table. I suppose I can see the logic here, although I would prefer a better error, or that the method was a little smarter.
I now have another issue that has cropped up related to this. After changing my code to this:
Codification codebook = new Codification(fulldata);
I then learning.Learn(inputs, outputs) my algorithm and want to use the newly trained algorithm. So the next step would be to take a bunch of test data, make sure it matches the codebooks encoding, and send it through the algorithm. Unfortunately, when I try and use the
int[][] testinput = codebook.Transform(testData, inputColumnNameArray);
It blows up claiming it could not find a mapping to transform. It does this in reference to an Integer column that the codebook correctly did not map to new values. So now it seems this Transform method is not capable of handling non-string columns, and I have not found an overload of it that can, even though the documentation indicates it should be able to handle this.
Does anyone know how to get around this issue without manually building the entire int[][] testinput array one value at a time?
Turns out I was able to answer my own question eventually.
The Codification class has two methods of using it as near as I can tell. The constructor that takes a list of column names, as well as the Transform methods both lack intelligence in dealing with non-string data types, perhaps these methods are going away in the future.
The constructor that just takes a datatable by itself, as well as the Apply method, are both capable of handling data types other than strings. Once I switched to using these two methods my errors went away.
Codification codebook = new Codification(fulldata);
int[][] testinput = codebook.Apply(testData, inputColumnNameArray);
The confusion for me lay in all the example code seemingly randomly using these two methods, but using the Apply method only when processing the training data, and using the Transform method when encoding test data.
I am not sure why they chose to do this in the documentation example code, but it definitely took me a long time to figure out what was going on enough to stop having this particular issue.
Is there a way in Java to have text file with listed a=10.35 b=20.57 c=30.79 and get program to only read the variable decimal values and assign them to declared variables a, b, c in the program.
Searched youtube found nothing.
Do not know if it is possible.
Do not know.
Got it working.
You can certainly read in the contents of the text file and parse it down to the chars and doubles.
If you are referring to declaring named variables based on the file, there is no way to do this directly at runtime. You can, however, use a data structure like a dictionary or map to store the data and access it using the name as a key.
If you could provide more details about what you are trying to do, that would make it easier to answer your question more specifically.
Is it possible to write dask/pandas DataFrame to parquet and than return bytes string? I know that is not possible with to_parquet() function which accepts file path. Maybe, you have some other ways to do it. If there is no possibility to do something like this, is it makes sense to add such functionality? Ideally, it should be like this:
parquet_bytes = df.to_parquet() # bytes string is returned
Thanks!
There has been work undertaken to allow such a thing, but it's not currently a one-line thing like you suggest.
Firstly, if you have data which can fit in memory, you can use fastparquet's write() method, and supply an open= argument. This must be a function that creates a file-like object in binary-write mode, in your case a BytesIO() would do.
To make this work directly with dask, you could make use of the MemoryFileSystem from the filesystem_spec project. You would need to add the class to Dask and write as following:
dask.bytes.core._filesystems['memory'] = fsspec.implementations.memory.MemoryFileSystem
df.to_parquet('memory://name.parquet')
When done, MemoryFileSystem.store, which is a class attribute, will contain keys that are like filenames, and values which are BytesIO objects containing data.
I would like to save my objects as XML so that other applications can read from and write to the data files -- something that is very difficult with Matlab's binary mat files.
The underlying problem I'm running into is that Matlab's equivalent of reflection (which I've used to do similar things in .NET) is not very functional with respect to private properties. Matlab's struct(object) function offers a hack in terms of writing XML from an object because while I can't do
x = myInstance.myPrivateProperty;
...I can do
props = struct(myInstance);
x = props.myPrivateProperty;
So, I can create a pure (contains no objects) struct from any object using the code below, and then it's trivial to write an XML file using a pure struct.
But, is there any way to reverse the process? That is, to create an object instance using the data saved by the code below (that data contains a list of all non-Dependent, non-Constant, non-Transient properties of the class instance, and the name of the class)? I was thinking about having all my objects inherit from a class called XmlSerializable which would accept a struct as a single argument in the constructor and then assign all the values contained in the struct to the correspondingly-named properties. But, this doesn't work because if MyClass inherits XmlSerializable, the code in XmlSerializable isn't allowed to set the private properties of MyClass (related to How can I write generalized functions to manipulate private properties?). This would be no problem in .NET (See Is it possible to set private property via reflection?), but I'm having trouble figuring it out in Matlab.
This code creates a struct that contains all of the state information for the object(s) passed in, yet contains no object instances. The resulting struct can be trivially written to XML:
function s = toPureStruct(thing)
if isstruct(thing)
s = collapseObjects(thing);
s.classname = 'struct';
elseif isobject(thing)
s.classname = class(thing);
warning off MATLAB:structOnObject;
allprops = struct(thing);
warning on MATLAB:structOnObject
mc = metaclass(thing);
for i=1:length(mc.PropertyList)
p = mc.PropertyList(i);
if strcmp(p.Name, 'classname')
error('toStruct:PropertyNameCollision', 'Objects used in toStruct may not have a property named ''classname''');
end
if ~(p.Dependent || p.Constant || p.Transient)
if isobject(allprops.(p.Name))
s.(p.Name) = toPureStruct(allprops.(p.Name));
elseif isstruct(allprops.(p.Name))
s.(p.Name) = collapseObjects(allprops.(p.Name));
else
s.(p.Name) = allprops.(p.Name);
end
end
end
else
error(['Conversion to pure struct from ' class(thing) ' is not possible.']);
end
end
function s = collapseObjects(s)
fnames = fields(s);
for i=1:length(fnames)
f = s.(fnames{i});
if isobject(f)
s.(fnames{i}) = toPureStruct(f);
elseif isstruct(f)
s.(fnames{i}) = collapseObjects(f);
end
end
end
EDIT: One of the other "applications" I would like to read the saved files is a version control system (to track changes in parameters in configurations defined by Matlab objects), so any viable solution must be capable of producing human-intelligible text. The toPureStruct method above does this when the struct is converted to XML.
You might be able to sidestep this issue by using the new v7.3 MAT file format for your saved objects. Unlike the older MAT file formats, v7.3 is a variant of the HDF5, and there's HDF5 support and libraries in other languages. This could be a lot less work, and you'd probably get better performance too, since HDF5 is going to have a more efficient representation of numeric arrays than naive XML will.
It's not the default format; you can enable it using the -v7.3 switch to the save function.
To the best of my knowledge, what I want to do is impossible in Matlab 2011b. It might be possible, per #Andrew Janke's answer, to do something similar using Matlab's load command on binary HDF5 files that can be read and modified by other programs. But, this adds an enormous amount of complexity as Matlab's HDF5 representation of even the simplest class is extremely opaque. For instance, if I create a SimpleClass classdef in Matlab with two standard properties (prop1 and prop2), the HDF5 binary generated with the -v7.3 switch is 7k, and the expanded XML is 21k, and the text "prop1" and "prop2" do not appear anywhere. What I really want to create from that SimpleClass is this:
<root>
<classname>SimpleClass</classname>
<prop1>123</prop1>
<prop2>456</prop2>
</root>
I do not think it is possible to produce the above text from class properties in a generalized fashion in Matlab, even though it would be possible, for instance, in .NET or Java.