How to tell if the t-SNE algorithm produces the same result? - data-science

I am studying the t-SNE algorithm and came across a question and found no answer about it. If I have the same dataset and run the t-SNE algorithm multiple times with the same number of iterations and the same number of perplexity, will I get the same views?

Yes, but the condition for this is the same number of perplexity.

Related

Does increasing the number of iterations affect log-lik, AIC etc.?

Whenever I try to solve a convergence issue in one of my glmer models with the help of a different optimizer, I repeat the entire model optimization procedure with the new optimizer. That is, I re-run all the models I've computed so far with the new optimizer and again conduct comparisons with anova (). I do this because as far as I know different optimizers may lead to differences in AICs and log-lik ratios for one and the same model, making comparisons between two models that use different optimizers problematic.
In my most recent analysis, I've increased the number of iterations with optCtrl=list(maxfun=100000) to avoid convergence errors. I'm now wondering whether this can also lead to differences in AIC/log-lik etc. for one and the same model? Is it equally problematic to compare two models that differ with regard to the inclusion of the optCtrl=list(maxfun=100000) argument?
I actually thought that increasing the number of iterations would simply lead to longer computation times (rather than different results), but I was unable to verify this online. Any hint/explanation is appreciated.
As far as I know, you should be fine. As long as the models were fit with the same number of observations you should be able to compare them using the AIC. Hopefully someone else can comment on the nuances of the computations of the AIC itself, but I just fit a bunch of models with the same formula and dataset and different number of max iterations, getting the AIC each time. It didn't change as a function of the iterations. The iterations are just the time the model fitting process can take to maximize the likelihood, which for complex models can be tricky. Once a model is fit, and has converged on an answer, the number of iterations shouldn't change anything about the model itself.
If you look at this question, the top answer explains the AIC quite well:https://stats.stackexchange.com/questions/232465/how-to-compare-models-on-the-basis-of-aic

RandomForest decision confidence?

I'm using accord.net's RandomForestLearning on some data, and have it predicting results correctly, but what I'd really like is a way to look at the decision confidence that goes along with the plain classification results?
In the end I manually compute the confidence by summing the votes for each label from the component DecsionTrees and the dividing the maximal vote by the total votes. Would be nice if there was an official way, though.

Why embedding_lookup_sparse and string_to_hash_bucket in tensorflow slow with large number of rows of embeddings

In tensorflow embedding_lookup_sparse lookup the row of embeddings according the sp_ids. I think it's similar to random access. However when the shape of embeddings is large, i.e 10M rows, the inference spent more time than when the embeddings only has about 1M rows. As I think, the lookup phase and is similar to random access and the hash function spent constant time which is all fast and less sensitive with the size. Is there any wrong with my thought? Is there any way to optimize so that the inference can be faster? Thank you!
Are you sure it is caused by the embedding_lookup? In my case I also have millions of rows to lookup. It is very fast if I use GradientDecend optimizer. It is very slow if I use Adam or the others. Probably it is not the embedding_lookup opr slows down your training but other oprs that depend on the total number of params.
It is true that "embedding_lookup" works slowly when there are many rows in table.
And you may figure out why by reading its source code. Here is the source code in "embedding_lookup":
image of the source code: variable "np" is the length of table
image of the source code: loop with np
As you see there is a loop with a time complexity of O(table length) appearing here. In fact "embedding_lookup" use dynamic partition to separate input data into several partition of ids, and then use this loop to embed words vectors to each id's partition. In my opinion, this trick can fix the time complexity to O(table length) no matter how big the input data is.
So I think the best way for you to increase training speed is to input more samples in each batch.

Getting each example exactly once

For monitoring my model's performance on my evaluation dataset, I'm using tf.train.string_input_producer for the filenames queue on .tfr files, then I feed the parsed examples to the tf.train.batch function, that produces batches of a fixed size.
Assume my evaluation dataset contains exactly 761 examples (a prime number). To read all the examples exactly once, I have to have a batch size that divides 761, but there is no such, except 1 that will be too slow and 761 that will not fit in my GPU. Any standard way for reading each example exactly once?
Actually, my dataset size is not 761, but there is no number in the reasonable range of 50-300 that divides it exactly. Also I'm working with many different datasets, and finding a number that approximately divides the number of examples in each dataset can be a hassle.
Note that using the num_epochs parameter to tf.train.string_input_producer does not solve the issue.
Thanks!
You can use reader.read_up_to as in this example. Your last batch will be smaller, so you need to make sure your network doesn't hard-wire batch-size anywhere

How can I get the number of iterations in AMPL?

I can get the number of variables using _nvars. Then, I tried _niters and _niterations but don't work.
I have also searched it in the manual unsuccessfully.
Is there a simple way to get the number of iterations, other than extracting it from solve_message (e.g. with regular expressions)?
To the best of my knowledge there is no built-in parameter representing a number of iterations in AMPL. In fact, this is very solver specific and different solvers and even different algorithms within a single solver may have multiple different iteration counts, such as MIP iterations, Simplex iterations, master iterations when using decomposition, etc. Your best bet is probably to parse the solver message.