I'm currently working on some image/video recognition systems. I'd like to build it based on NuPIC. But I cannot find a multi-dimensional spatial pooler, which is very important in the vision domain. Should I implement multi-dimensional SP by myself?
With this recent change to the SP, it now supports any number of dimensions. Just set inputDimensions and columnDimensions to any number of dimensions you want (make sure they are the same, though), and the SP will respect the topology of the input and column space.
Related
I want to use a clustering algorithm to a dataframe that contains a lot of features (32 columns).
A part of the features are encoded using one hot encoder.
I want to use PCA ( Principal Component analysis ) to reduce the dimension and make the machine learning process easier.
Is it possible to use the PCA just for some columns of the data frame and keep the other columns as they are then use machine learning model.
Or it is obligatory to use PCA for all the dataframe before clustering.
I guess there should be no issue with doing what you describe.
What this does, effectively, is merge some of the objects' features into fewer ones, but then using other, non-merged ones in addition to the merged ones. I don't know what effect that would have on the outcome; it might be good to run a correlation to see whether the unmerged features add anything to the PCA-merged ones. You might find that they basically duplicate what is there already.
Since clustering is an exploratory method, you can basically do whatever you want. It is of course advisable to have a reason for doing so, as it otherwise ends up as simply trial-and-error, and if you find a result, you won't be able to describe why you got there. It is possible (or even likely for some data sets) that there are multiple ways to cluster them, so you should make decisions based on what you know about the data already, so they can be justified in those terms.
Running random trial-and-error clustering until you find a structure makes it a bit difficult to come up with a good explanation why that structure is valid.
I am trying to compute some mesh features for 3D models that I created using numpy-stl. I would like to compute all of the features given within pyradiomics, but I am not sure how to use them on just the meshes without them having all of the extra binary image, and matrix information? Unless there is a better program t use for shape feature extraction? Also, in the documentation, it says that there are some features you need to enable C extensions for. How can you do that in your python script?
C extensions are enabled by default. As of PyRadiomics 2.0, the python equivalents for those functions have been remove (horrible performance).
As to your meshes. PyRadiomics is build to extract features from images and binary labelmaps. To use meshes you would have to first convert them.
What features do you want to extract? PyRadiomics does use a sort of on-the-fly built mesh to calculate surface area and volume, which are also used in the calculation of several other shape features.
If you want to take a look at how volume and SA are calculated, the source code for that is in C (radiomics/src/cshape.c). The calculation of the derived features (e.g. sphericity) is in shape.py
I am looking to utilize Autodesk Fusion 360 to generate a huge number of shapes (tens of thousands) in 3D model form, so I need a way to operate it non-interactively.
I am aware of the Fusion 360 API documentation here https://autodeskfusion360.github.io/
But I was wondering if there is a known way to do this.
Essentially there are four main steps to create:
Create base shape (i.e. sphere, cube, trapezoid etc..) and save
Define and name each dimension on the shape as either dependent on another (ex: height/2) or an input (ex: height) until your model is fully constrained and controlled by parameters
Create a program (python is friendly) that inputs values to named inputs
Connect to a database within the program (using odbc or similar) that will iterate through shape characteristics
Input necessary characteristics from database and save unique shape within your program
You will have to be more specific in the question to clarify the answer. Please include as detailed an example as you can and I'll edit my answer.
TensorFlow use reverse-mode automatic differentiation(reverse mode AD), as shown in https://github.com/tensorflow/tensorflow/issues/675.
Reverse mode AD need a data structure called a Wengert List - see https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation.
However, searching through the TensorFlow repository with the keyword "Wengert List", I get nothing.
Do they use a different name, or do they get rid of Wengert List? If so, how?
AD terminology is very old. It was invented when there was no Python and things were complicated. Nowadays you could just use a regular Python list for that purpose.
Implementation of reverse AD is in gradients function of gradients_impl.py here
The data-structure used to store the tape is initialized on line 532 and it's a Python Queue
# Initialize queue with to_ops.
queue = collections.deque()
However, searching through the TensorFlow repository with the keyword "Wengert List", but I get nothing.
This is because TensorFlow is not tape based AD, it is graph based AD system.
Wengert list would be the tape describing the order in which operations were originally executed.
There is also source code transformation based AD and a nice example of that system is Tangent.
Nowadays almost no one uses tape (Wengert list) any more. Check for instance what PyTorch does (Page 2).
TensorFlow 2 uses a Wengert List (a tape) as does JAX and Autograd. This is because these tools keep track of the operations on variables with some sort of gradient_tape.
Tensorflow 1 did not use a Wengert list to keep track of what computation was being performed, instead it used a static graph to keep track of what computation was being performed. This had certain performance benefits, but limited what TensorFlow 1 was capable of doing.
I was wondering when and why I should prefer a panel(nd) over a dataframe with hierarchical index, and vice versa. In my very brief experience, I would say that the former is more convenient for slicing, while the latter for mathematical operations. My particular need would be to interactively manipulate 3-5 dimensional panels with convenient slicing and element-wise operations.
Thanks,
Giacomo
Generally stick with a multi-indexed frame as they are more fully supported.
A panelnd is like a generalized n-dim Panel, good mainly for single-dtyped data. It does work like a Panel, but has some quirks and missing features (its why its experimental).
Their are ways to apply some operations to multiple slabs of a n-dim (esp. via new apply in 0.13.1, see here.
Once I get to more than 3 dimensions, I mainly 'hold' the data and slice to work it in 2 dimensions, then reassemble it if needed. Storage can also be convient for these higher dim objects (e.g. via HDFStore), and was the reason they were created in the first place.