Variable number of instances for Multiple Instance Learning - tensorflow

I am trying to do a Multiple Instance Learning for a binary classification problem, where each bag of instances has an associated label 0/1. However, the different bags have different numbers of instances. One solution is to take the minimum of all the instance numbers of the bag. For eg-
Bag1 - 20 instances, Bag2- 5 instances, Bag3 - 10 instances .... etc
I am taking the minimum i.e- 5 instances from all the bags. However, this technique discards all the other instances from other bags which might contribute to the training.
Is there any workaround/algorithm for MIL where variable instance numbers for bags could be handled?

You can try using RaggedTensors for this. They're mostly used in NLP work, since sentences have variable numbers of words (and paragraphs have variable numbers of sentences etc.). But there's nothing special about ragged tensors that limits them to this domain.
See the tensorflow docs for more information. Not all layers or operations will work, but you can use things like Dense layers to build a Sequential or Functional (or even custom) model, if this works for you.

Related

Are there any difference initializing Self Organizing Map with random distribution and initialize using the first input to the network?

When initialize SOM with random distribution the network may converge correctly but not, when initialize with the input. Why ?
Poor weight initialization can lead to entanglement in the neurons, and becomes a problem when it comes to topological mapping. The problem is when two neurons that should be far apart have ended up representing the same cluster of input data.
Using your first input as the initialization could be leading to this problem. However, a relatively easy check is to use Sammon Mapping to reduce the dimensions of the clustering nodes into a 2 dimensional representation of the distance between each other. This can be visualized as the nodes with lines connecting adjacent pairs. An unstable learning process can then be discerned from a Sammon map that folds over on itself.
Sammon Map
Sammon Map with folds
This doesn't mean initializing the weights with input data is a bad idea, however I would recommend using random input data as the initialization with the use of something like numpy.random.seed(), as using input data could speed up the learning process.

How to pass a list of numbers as a single feature to a neural network?

I am trying to cluster sentences by clustering the sentence embedding of them taken from fasttext model. Each sentence embedding has 300 dimensions, and I want to reduce them to 50 (say). I tried t-SNE, PCA, UMAP. I wanted to see how Auto Encoder works for my data.
Now passing those 300 numbers for each sentence as separate features to the NN would make sense or they should be passed as a single entity? If so, is there an way to pass a list as a feature to NN?
I tried passing the 300 numbers as individual features and with the output I tried clustering. Could get very few meaningful clusters rest were either noise or clusters with no similar sentences but being grouped (But with other techniques like UMAP I could get far more meaningful clusters in more number). Any leads would be helpful. Thanks in advance :)

How to embed discrete IDs in Tensorflow?

There are many discrete IDs and I want to embed them to feed into a neural network. tf.nn.embedding_lookup only supports the fixed range of IDs, i.e., ID from 0 to N. How to embed the discrete IDs with the range of 0 to 2^62.
Just to clarify how I understand your question, you want to do something like word embeddings, but instead of words you want to use discrete IDs (not indices). Your IDs can be very large (2^62). But the number of distinct IDs is much less.
If we were to process words, then we would build a dictionary of the words and feed the indices within the dictionary to the neural network (into the embedding layer). That is basically what you need to do with your discrete IDs too. Usually you'd also reserve one number (such as 0) for not previously seen values. You could also later trim the dictionary to only include the most frequent values and put all others into the same unknown bucket (exactly the same options you would have when doing word embeddings or other nlp).
e.g.:
unknown -> 0
84588271 -> 1
92238356 -> 2
78723958 -> 3

Inference on several inputs in order to calculate the loss function

I am modeling a perceptual process in tensorflow. In the setup I am interested in, the modeled agent is playing a resource game: it has to choose 1 out of n resouces, by relying only on the label that a classifier gives to the resource. Each resource is an ordered pair of two reals. The classifier only sees the first real, but payoffs depend on the second. There is a function taking first to second.
Anyway, ideally I'd like to train the classifier in the following way:
In each run, the classifier give labels to n resources.
The agent then gets the payoff of the resource corresponding to the highest label in some predetermined ranking (say, A > B > C > D), and randomly in case of draw.
The loss is taken to be the normalized absolute difference between the payoff thus obtained and the maximum payoff in the set of resources. I.e., (Payoff_max - Payoff) / Payoff_max
For this to work, one needs to run inference n times, once for each resource, before calculating the loss. Is there a way to do this in tensorflow? If I am tackling the problem in the wrong way feel free to say so, too.
I don't have much knowledge in ML aspects of this, but from programming point of view, I can see doing it in two ways. One is by copying your model n times. All the copies can share the same variables. The output of all of these copies would go into some function that determines the the highest label. As long as this function is differentiable, variables are shared, and n is not too large, it should work. You would need to feed all n inputs together. Note that, backprop will run through each copy and update your weights n times. This is generally not a problem, but if it is, I heart about some fancy tricks one can do by using partial_run.
Another way is to use tf.while_loop. It is pretty clever - it stores activations from each run of the loop and can do backprop through them. The only tricky part should be to accumulate the inference results before feeding them to your loss. Take a look at TensorArray for this. This question can be helpful: Using TensorArrays in the context of a while_loop to accumulate values

How do I have to train a HMM with Baum-Welch and multiple observations?

I am having some problems understanding how the Baum-Welch algorithm exactly works. I read that it adjusts the parameters of the HMM (the transition and the emission probabilities) in order to maximize the probability that my observation sequence may be seen by the given model.
However, what does happen if I have multiple observation sequences? I want to train my HMM against a huge lot of observations (and I think this is what is usually done).
ghmm for example can take both a single observation sequence and a full set of observations for the baumWelch method.
Does it work the same in both situations? Or does the algorithm have to know all observations at the same time?
In Rabiner's paper, the parameters of GMMs (weights, means and covariances) are re-estimated in the Baum-Welch algorithm using these equations:
These are just for the single observation sequence case. In the multiple case, the numerators and denominators are just summed over all observation sequences, and then divided to get the parameters. (this can be done since they simply represent occupation counts, see pg. 273 of the paper)
So it's not required to know all observation sequences during an invocation of the algorithm. As an example, the HERest tool in HTK has a mechanism that allows splitting up the training data amongst multiple machines. Each machine computes the numerators and denominators and dumps them to a file. In the end, a single machine reads these files, sums up the numerators and denominators and divides them to get the result. See pg. 129 of the HTK book v3.4