Machine Learning Algorithm for multiple output features - tensorflow

I am looking for machine learning algorithm where I have multiple variables as output . It is something like like a vector[A,....X] each of which can have 0 or 1 value. I have data to train the model with required input features.
Which algorithm should I use for such case. With my limited knowledge I know that multi label classification can solve the problem where one output variable can take multiple values like color. But this case is multiple output variables taking 0 or 1 . Please let me know.

It is difficult to give an answer on which algorithm is the best without more information.
A perceptron, a neural network with an output layer with multiple binary (threshold function) neurons could be a good candidate.

Related

Predict a nonlinear array based on 2 features with scalar values using XGBoost or equivalent

So I have been looking at XGBoost as a place to start with this, however I am not sure the best way to accomplish what I want.
My data is set up something like this
Where every value, whether it be input or output is numerical. The issue I'm facing is that I only have 3 input data points per several output data points.
I have seen that XGBoost has a multi-output regression method, however I am only really seeing it used to predict around 2 outputs per 1 input, whereas my data may have upwards of 50 output points needing to be predicted with only a handful of scalar input features.
I'd appreciate any ideas you may have.
For reference, I've been looking at mainly these two demos (they are the same idea just one is scikit and the other xgboost)
https://machinelearningmastery.com/multi-output-regression-models-with-python/
https://xgboost.readthedocs.io/en/stable/python/examples/multioutput_regression.html

How to pass a list of numbers as a single feature to a neural network?

I am trying to cluster sentences by clustering the sentence embedding of them taken from fasttext model. Each sentence embedding has 300 dimensions, and I want to reduce them to 50 (say). I tried t-SNE, PCA, UMAP. I wanted to see how Auto Encoder works for my data.
Now passing those 300 numbers for each sentence as separate features to the NN would make sense or they should be passed as a single entity? If so, is there an way to pass a list as a feature to NN?
I tried passing the 300 numbers as individual features and with the output I tried clustering. Could get very few meaningful clusters rest were either noise or clusters with no similar sentences but being grouped (But with other techniques like UMAP I could get far more meaningful clusters in more number). Any leads would be helpful. Thanks in advance :)

Should my seq2seq RNN idea work?

I want to predict stock price.
Normally, people would feed the input as a sequence of stock prices.
Then they would feed the output as the same sequence but shifted to the left.
When testing, they would feed the output of the prediction into the next input timestep like this:
I have another idea, which is to fix the sequence length, for example 50 timesteps.
The input and output are exactly the same sequence.
When training, I replace last 3 elements of the input by zero to let the model know that I have no input for those timesteps.
When testing, I would feed the model a sequence of 50 elements. The last 3 are zeros. The predictions I care are the last 3 elements of the output.
Would this work or is there a flaw in this idea?
The main flaw of this idea is that it does not add anything to the model's learning, and it reduces its capacity, as you force your model to learn identity mapping for first 47 steps (50-3). Note, that providing 0 as inputs is equivalent of not providing input for an RNN, as zero input, after multiplying by a weight matrix is still zero, so the only source of information is bias and output from previous timestep - both are already there in the original formulation. Now second addon, where we have output for first 47 steps - there is nothing to be gained by learning the identity mapping, yet network will have to "pay the price" for it - it will need to use weights to encode this mapping in order not to be penalised.
So in short - yes, your idea will work, but it is nearly impossible to get better results this way as compared to the original approach (as you do not provide any new information, do not really modify learning dynamics, yet you limit capacity by requesting identity mapping to be learned per-step; especially that it is an extremely easy thing to learn, so gradient descent will discover this relation first, before even trying to "model the future").

Inference on several inputs in order to calculate the loss function

I am modeling a perceptual process in tensorflow. In the setup I am interested in, the modeled agent is playing a resource game: it has to choose 1 out of n resouces, by relying only on the label that a classifier gives to the resource. Each resource is an ordered pair of two reals. The classifier only sees the first real, but payoffs depend on the second. There is a function taking first to second.
Anyway, ideally I'd like to train the classifier in the following way:
In each run, the classifier give labels to n resources.
The agent then gets the payoff of the resource corresponding to the highest label in some predetermined ranking (say, A > B > C > D), and randomly in case of draw.
The loss is taken to be the normalized absolute difference between the payoff thus obtained and the maximum payoff in the set of resources. I.e., (Payoff_max - Payoff) / Payoff_max
For this to work, one needs to run inference n times, once for each resource, before calculating the loss. Is there a way to do this in tensorflow? If I am tackling the problem in the wrong way feel free to say so, too.
I don't have much knowledge in ML aspects of this, but from programming point of view, I can see doing it in two ways. One is by copying your model n times. All the copies can share the same variables. The output of all of these copies would go into some function that determines the the highest label. As long as this function is differentiable, variables are shared, and n is not too large, it should work. You would need to feed all n inputs together. Note that, backprop will run through each copy and update your weights n times. This is generally not a problem, but if it is, I heart about some fancy tricks one can do by using partial_run.
Another way is to use tf.while_loop. It is pretty clever - it stores activations from each run of the loop and can do backprop through them. The only tricky part should be to accumulate the inference results before feeding them to your loss. Take a look at TensorArray for this. This question can be helpful: Using TensorArrays in the context of a while_loop to accumulate values

Evolutionary algorithm: What is the purpose of hidden/intermediate nodes

I saw this video online, it shows a "neural network" with three inputs and three outputs, although the inputs are not changing, I believe there is enough similarity between this network and those of other evolutionary algorithms to make the question valid.
My question is, since it is possible for all three input nodes shown in the video to "exert influence" on the output nodes with controlled weight, why is the four intermediate nodes necessary? Why not connect the input nodes directly to the outputs?
An artificial neural network consisting only of inputs and outputs is a (single-layer) perceptron. Realizing these networks would not solve many problems set back the use of artificial neural networks for over a decade!
For simplicity, imagine only one output neuron (many outputs can be considered many similar problems in parallel). Furthermore, let's consider for the moment only one input. The neurons use an activation function, which determines the activity (output) of this neuron depending on the input it receives. For activation functions used in practice*, the more input, the higher output (or the same in some ranges, but let's forget about that). And chaining two of these also results in "the more input, the more final output".
With one output neuron you interpret the results as "if output is over threshold, then A, otherwise B". (Where "A" and "B" can mean different things). Because both our neurons produce more signal the more input they receive, then our network can only answer easy linear problems of type "if input signal is over threshold, then A, otherwise B".
Using two inputs is very similar: we combine the output of two input-neurons. Now we are in the situation "if inputs to input neurons 1 and 2 are, together, high enough that our final output is over a threshold, then A, otherwise B". Graphically this means we can decide A or B by drawing a line (allow curvature) on the input 1-input 2 plane:
But there are problems that cannot be solved this way! Consider the XOR problem. Our goal is to produce this:
As you can see, it is impossible to draw a line that gets all the A's on one side and all the B's on the other. And these lines represent all the possible one-layer perceptrons! We say that the XOR problem is not linearly separable (and this is why the XOR is a traditional test for neural networks).
Introducing at least one hidden layer allows to solve this problem. In practice this is like combining the result of two one-layer perceptrons:
Adding more neurons to the hidden layer means being able to solve more and more complex problems. In fact, any function f(A,B).
However, you may know other networks use more layers (see deep learning), but in this case the motivation is not a theoretical limitation, but rather searching for networks that perform better.
*Using weird hand-crafted activation functions will not make things better. You may be able to solve an specific problem, but still not all, and you need to know how to design this activation function.