Pycaret data_func usage - large-data

What is the proper setup to use data_func?
I have attempted to use list comprehension which is not allowed as lists are not callable.
I have attempted to use pandas as a generator but generator objects are not picklable.
What the set up to use the data_func parameter?
https://pycaret.readthedocs.io/en/latest/api/classification.html#pycaret.classification.setup
I expect that the data_func parameter would accept a dataframe generator object or a list of dataframes. Either is acceptable or what is the proper use?

Related

Pandas `replace` * in signature

Following is the signature for the replace method in Pandas API (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)
DataFrame.replace(to_replace=None, value=_NoDefault.no_default, *,
inplace=False, limit=None, regex=False, method=_NoDefault.no_default)
While I understand the API method, I do not understand what does that * refer to in the arguments string. My guess would be that it is something related to *args. Perhaps *-like notation can be used after the second argument, but I wanted some clarity on this
This is to make all arguments after it keyword only. The * catches the third and more unnamed parameters.
Thus in this method, only to_replace and value can be passed directly:
df.replace('a', 'b')
pandas had been transitioning from classical ordered parameter to keyword-only for many methods lately. This improves on usability/readability for many cases. For instance pivot/pivot_table have so far similar parameters but in a different order, which makes it difficult to remember. In the future, enforcing keyword-only will make it unambiguous.

Get order column/row-major for given numpy ndarray

I have a numpy array that is created by the third party library, namely proj_data object. Now I am trying to debug my code and I would like to get the "order" parameter of proj_data array.
However there is no such attribute that would tell me which order specifier was used when the array was created.
I need to know that, because the array is in turn used to transfer data to GPU and different order probably produces an error. Ideally I would get solution not by code analysis but really by some reflection of the proj_data object.

Dynamically getting TensorFlow type by string?

I'm dynamically casting my tensor into different types based on what the input type string is.
How do I get tf.float64 from a string 'float64'? I tried tf.getattr('float64'), but tf is a module that has no getattr method.
I'm hacking it by creating a lookup for now, but I'm sure there's a cleaner way.
Solved it myself:
tf.dtypes.as_dtype('float32')

Octave: converting dataframe to cell array

Given an Octave dataframe object created as
c = cell(m,n);
%populate c...
pkg load dataframe
df = dataframe(c);
(see https://octave.sourceforge.io/dataframe/overview.html),
Is it possible to access the underlying cell array?
Is it there a conversion mechanism back to cell array?
Is it possible to save df to CSV?
Yes. A dataframe object, like any object, can be converted back into a struct.
Once you have the resulting struct, look for the fields x_name to get the column names, and x_data to get the data in the form of a cell array, i.e.
struct(df).x_data
As for conversion to csv, the dataframe package does not seem to provide any relevant methods as far as I can tell (in particular the package does not provide an overloaded #dataframe/csvwrite method). Therefore, I'd just extract the information as above, and go about writing it into a csv file from there.
If you're not dealing with strictly numerical data, you might want to have a look at the cell2csv / csv2cell methods from the io package (since the built-in csvwrite function is strictly for numerical data).
And if that doesn't do exactly what you want, I'd probably just go for creating a csv file manually via custom fprintf statements.
PS. You can generally see what methods a package provides via pkg describe -verbose dataframe, or the methods for a particular class via methods(dataframe) (or even methods(df)). Also, if you ever wanted to access the documentation for an overloaded method, e.g. say the summary method, then this is the syntax for doing so: help #dataframe/summary

Why is ones_like listed as a ufunc?

I was surprised to see numpy.ones_like listed in the list of ufuncs here. Is this just an oversight, or is there specific use case?
That is an oversight in the documentation. ones_like is not a ufunc. It is implemented in numpy/core/numeric.py, along with zeros_like and similar functions. It uses the shape and data type of the argument, but it does not perform an elementwise operation.