Is it possible to have dynamic build configs for custom dataset in tfds? - tensorflow

I'm trying to create a custom dataset for TensorFlow using this guide. The data is audio and I want to do a preprocessing on it so that the audio is split into smaller pieces (and the label varies with different lengths). I found out that such parameters can be passed with tfds.core.BuilderConfig. But those options should be created beforehand. Is it possible to have dynamic build configs given by the user (e.g. the user specifies the length of the audio split)? Or I should put all the possible configs in there?

Related

How to generate surface mesh from DICOM files and/or nifti file in Python?

I need to generate surface generation from a segmented NIFTI file. I can do that easily in 3D slicer but I want to do that in Python. Is it possible to do that in Python?
I tried using VTK model but it is not showing the Surface rendered output.
check out simpleITK. It has tools that can read in series with full 3d awareness, and also generate various surfaces for you.
It isn't an app that will display things for you however, but there are various examples you can find for how to take the output of simpleITK and use with various visualisation options.
You can do that easily by scripting in 3D Slicer. You don't need to use the graphical user interface, just launch a Python script from the command-line. You can put together the script you need from examples in the script repository (for example https://www.slicer.org/wiki/Documentation/Nightly/ScriptRepository#Export_model_nodes_from_segmentation_node).

C5.0 gives back only a single leaf

I'm doing a data analysis task in SPSS Modeler and I have finally arrived to the point of the stream where I'm trying to fit some models on the data.
However when I tried to run the mentioned c5.0 modeling node on my data, the node generated a modeling nugget containing only a single leaf, so there are no decision rules in the model. I partitioned the data before to train and test subsets (70-30). I did not use misclassification cost, used the properly predefined attribute roles. In the model's model page I checked the use partitioned data, build model for each split, Group symbolics, Use global pruning options in, I also tried to use expert mode, but it fails on simple mode too. I have tried to use different options but it gives the same output without a single split.
How can I make the model give back a more complex decision tree, I suppose that this is not the expected outcome.
Any suggestions are welcomed.
Please, check your distribution of the target variable and share it.
If the balances differs greatly from 50%-50%, you may need to balance your inputs first.
Missclassification cost is another technique to give you an output, but again it should be based on your empirical distributions.

Using Pyradiomics to calculate shape features on meshes instead of matrices?

I am trying to compute some mesh features for 3D models that I created using numpy-stl. I would like to compute all of the features given within pyradiomics, but I am not sure how to use them on just the meshes without them having all of the extra binary image, and matrix information? Unless there is a better program t use for shape feature extraction? Also, in the documentation, it says that there are some features you need to enable C extensions for. How can you do that in your python script?
C extensions are enabled by default. As of PyRadiomics 2.0, the python equivalents for those functions have been remove (horrible performance).
As to your meshes. PyRadiomics is build to extract features from images and binary labelmaps. To use meshes you would have to first convert them.
What features do you want to extract? PyRadiomics does use a sort of on-the-fly built mesh to calculate surface area and volume, which are also used in the calculation of several other shape features.
If you want to take a look at how volume and SA are calculated, the source code for that is in C (radiomics/src/cshape.c). The calculation of the derived features (e.g. sphericity) is in shape.py

How can I create shapes in Fusion 360 via API or command line?

I am looking to utilize Autodesk Fusion 360 to generate a huge number of shapes (tens of thousands) in 3D model form, so I need a way to operate it non-interactively.
I am aware of the Fusion 360 API documentation here https://autodeskfusion360.github.io/
But I was wondering if there is a known way to do this.
Essentially there are four main steps to create:
Create base shape (i.e. sphere, cube, trapezoid etc..) and save
Define and name each dimension on the shape as either dependent on another (ex: height/2) or an input (ex: height) until your model is fully constrained and controlled by parameters
Create a program (python is friendly) that inputs values to named inputs
Connect to a database within the program (using odbc or similar) that will iterate through shape characteristics
Input necessary characteristics from database and save unique shape within your program
You will have to be more specific in the question to clarify the answer. Please include as detailed an example as you can and I'll edit my answer.

tensorflow one of 20 parameter server is very slow

I am trying to train DNN model using tensorflow, my script have two variables, one is dense feature and one is sparse feature, each minibatch will pull full dense feature and pull specified sparse feature using embedding_lookup_sparse, feedforward could only begin after sparse feature is ready. I run my script using 20 parameter servers and increasing worker count did not scale out. So I profiled my job using tensorflow timeline and found one of 20 parameter server is very slow compared to the other 19. there is not dependency between different part of all the trainable variables. I am not sure if there is any bug or any limitation issues like tensorflow can only queue 40 fan out requests, any idea to debug it? Thanks in advance.
tensorflow timeline profiling
It sounds like you might have exactly 2 variables, one is stored at PS0 and the other at PS1. The other 18 parameter servers are not doing anything. Please take a look at variable partitioning (https://www.tensorflow.org/versions/master/api_docs/python/state_ops/variable_partitioners_for_sharding), i.e. partition a large variable into small chunks and store them at separate parameter servers.
This is kind of a hack way to log Send/Recv timings from Timeline object for each iteration, but it works pretty well in terms of analyzing JSON dumped data (compared to visualize it on chrome://trace).
The steps you have to perform are:
download TensorFlow source and checkout a correct branch (r0.12 for example)
modify the only place that calls SetTimelineLabel method inside executor.cc
instead of only recording non-transferable nodes, you want to record Send/Recv nodes also.
be careful to call SetTimelineLabel once inside NodeDone as it would set the text string of a node, which will be parsed later from a python script
build TensorFlow from modified source
modify model codes (for example, inception_distributed_train.py) with correct way of using Timeline and graph meta-data
Then you can run the training and retrieve JSON file once for each iteration! :)
Some suggestions that were too big for a comment:
You can't see data transfer in timeline that's because the tracing of Send/Recv is currently turned off, some discussion here -- https://github.com/tensorflow/tensorflow/issues/4809
In the latest version (nightly which is 5 days old or newer) you can turn on verbose logging by doing export TF_CPP_MIN_VLOG_LEVEL=1 and it shows second level timestamps (see here about higher granularity).
So with vlog perhaps you can use messages generated by this line to see the times at which Send ops are generated.