Are there benefits to having Actor and Critic use significantly different models? - tensorflow

In Actor-Critic methods the Actor and Critic are assigned two complimentary, but different goals. I'm trying to understand whether the differences between these goals (updating a policy and updating a value function) are large enough to warrant different models for the Actor and Critic, or if they are of similar enough complexity that the same model should be reused for simplicity. I realize that this could be very situational, but not in what way. For example, does the balance shift as the model complexity grows?
Please let me know if there are any rules of thumb for this, or if you know of a specific publication that addresses the issue.

The empirical results suggest the exact opposite - that it is important to have the same network doing both (up to some final layer/head). The main reason for this is that learning value network (critis) provides signal for shaping represntation of the policy (actor) that otherwise would be nearly impossible to get.
In fact if you think about these, these are extremely similar goals, since for optimal deterministic policy
pi(s) = arg max_a Q(s, a) = arg max_a V(T(s, a))
where T is the transition dynamics.

Related

Neural network hyperparameter tuning - is setting random seed a good idea? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I am trying to tune a basic neural network as practice. (Based on an example from a coursera course: Neural Networks and Deep Learning - DeepLearning.AI)
I face the issue of the random weight initialization. Lets say I try to tune the number of layers in the network.
I have two options:
1.: set the random seed to a fixed value
2.: run my experiments more times without setting the seed
Both version has pros and cons.
My biggest concern is that if I use a random seed (e.g.: tf.random.set_seed(1)) then the determined values can be "over-fitted" to the seed and may not work well without the seed or if the value is changed (e.g.: tf.random.set_seed(1) -> tf.random.set_seed(2). On the other hand, if I run my experiments more times without random seed then I can inspect less option (due to limited computing capacity) and still only inspect a subset of possible random weight initialization.
In both cases I feel that luck is a strong factor in the process.
Is there a best practice how to handle this topic?
Has TensorFlow built in tools for this purpose? I appreciate any source of descriptions or tutorials. Thanks in advance!
Tuning hyperparameters in deep learning (generally in machine learning) is a common issue. Setting the random seed to a fixed number ensures reproducibility and fair comparison. Repeating the same experiment will lead to the same outcomes. As you probably know, best practice to avoid over-fitting is to do a train-test split of your data and then use k-fold cross-validation to select optimal hyperparameters. If you test multiple values for a hyperparameter, you want to make sure other circumstances that might influence the performance of your model (e.g. train-test-split or weight initialization) are the same for each hyperparameter in order to have a fair comparison of the performance. Therefore I would always recommend to fix the seed.
Now, the problem with this is, as you already pointed out, the performance for each model will still depend on the random seed, like the particular data split or weight initialization in your case. To avoid this, one can do repeated k-fold-cross validation. That means you repeat the k-fold cross-validation multiple times, each time with a different seed, select best parameters of that run, test on test data and average the final results to get a good estimate of performance + variance and therefore eliminate the influence the seed has in the validation process.
Alternatively you can perform k-fold cross validation a single time and train each split n-times with a different random seed (eliminating the effect of weight initialization, but still having the effect of the train-test-split).
Finally TensorFlow has no build-in tool for this purpose. You as practitioner have to take care of this.
There is no an absolute right or wrong answer to your question. You are almost answered your own question already. In what follows, however, I will try to expand more, via the following points:
The purpose of random initialization is to break the symmetry that makes neural networks fail to learn:
... the only property known with complete certainty is that the
initial parameters need to “break symmetry” between different units.
If two hidden units with the same activation function are connected to
the same inputs, then these units must have different initial
parameters. If they have the same initial parameters, then a
deterministic learning algorithm applied to a deterministic cost and
model will constantly update both of these units in the same way...
Deep Learning (Adaptive Computation and Machine Learning series)
Hence, we need the neural network components (especially weights) to be initialized by different values. There are some rules of thumb of how to choose those values, such as the Xavier initialization, which samples from normal distribution with mean of 0 and special variance based on the number of the network layer. This is a very interesting article to read.
Having said so, the initial values are important but not extremely critical "if" proper rules are followed, as per mentioned in point 2. They are important because large or improper ones may lead to vanishing or exploding gradient problems. On the other hand, different "proper" weights shall not hugely change the final results, unless they are making the aforementioned problems, or getting the neural network stuck at some local maxima. Please note, however, the the latter depends also on many other aspects, such as the learning rate, the activation functions used (some explode/vanish more than others: this is a great comparison), the architecture of the neural network (e.g. fully connected, convolutional ..etc: this is a cool paper) and the optimizer.
In addition to point 2, bringing a good learning optimizer into the bargain, other than the standard stochastic one, shall in theory not let a huge influence of the initial values to affect the final results quality, noticeably. A good example is Adam, which provides a very adaptive learning technique.
If you still get a noticeably-different results, with different "proper" initialized weights, there are some ways that "might help" to make neural network more stable, for example: use a Train-Test split, use a GridSearchCV for best parameters, and use k-fold cross validation...etc.
At the end, obviously the best scenario is to train the same network with different random initial weights many times then get the average results and variance, for more specific judgement on the overall performance. How many times? Well, if can do it hundreds of times, it will be better, yet that clearly is almost impractical (unless you have some Googlish hardware capability and capacity). As a result, we come to the same conclusion that you had in your question: There should be a tradeoff between time & space complexity and reliability on using a seed, taking into considerations some of the rules of thumb mentioned in previous points. Personally, I am okay to use the seed because I believe that, "It’s not who has the best algorithm that wins. It’s who has the most data". (Banko and Brill, 2001). Hence, using a seed with enough (define enough: it is subjective, but the more the better) data samples, shall not cause any concerns.

In a neural network, why is the number of hidden layer nodes frequently 2^n?

In textbooks, I've noticed the number of nodes in the hidden layer is usually a power of 2. Is there any statistical reason for this? Or is this just an arbitrary convention?
There is not really a reason why a power of 2 number of layers is optimal. It seems that this is a rule of thumb or a heuristic that is broader accepted by the practitioners and coders. Some people claim that since memory hardware is managed in powers of two that might effect the computations but there exists no real proof for that at least to the best of my knowledge.

Definitions of Phenotype and Genotype

Can someone help me understand the definitions of phenotype and genotype in relation to evolutionary algorithms?
Am I right in thinking that the genotype is a representation of the solution. And the phenotype is the solution itself?
Thanks
Summary: For simple systems, yes, you are completely right. As you get into more complex systems, things get messier.
That is probably all most people reading this question need to know. However, for those who care, there are some weird subtleties:
People who study evolutionary computation use the words "genotype" and "phenotype" frustratingly inconsistently. The only rule that holds true across all systems is that the genotype is a lower-level (i.e. less abstracted) encoding than the phenotype. A consequence of this rule is that there can generally be multiple genotypes that map to the same phenotype, but not the other way around. In some systems, there are really only the two levels of abstraction that you mention: the representation of a solution and the solution itself. In these cases, you are entirely correct that the former is the genotype and the latter is the phenotype.
This holds true for:
Simple genetic algorithms where the solution is encoded as a bitstring.
Simple evolutionary strategies problems, where a real-value vector is evolved and the numbers are plugged directly into a function which is being optimized
A variety of other systems where there is a direct mapping between solution encodings and solutions.
But as we get to more complex algorithms, this starts to break down. Consider a simple genetic program, in which we are evolving a mathematical expression tree. The number that the tree evaluates to depends on the input that it receives. So, while the genotype is clear (it's the series of nodes in the tree), the phenotype can only be defined with respect to specific inputs. That isn't really a big problem - we just select a set of inputs and define phenotype based on the set of corresponding outputs. But it gets worse.
As we continue to look at more complex algorithms, we reach cases where there are no longer just two levels of abstraction. Evolutionary algorithms are often used to evolve simple "brains" for autonomous agents. For instance, say we are evolving a neural network with NEAT. NEAT very clearly defines what the genotype is: a series of rules for constructing the neural network. And this makes sense - that it the lowest-level encoding of an individual in this system. Stanley, the creator of NEAT, goes on to define the phenotype as the neural network encoded by the genotype. Fair enough - that is indeed a more abstract representation. However, there are others who study evolved brain models that classify the neural network as the genotype and the behavior as the phenotype. That is also completely reasonable - the behavior is perhaps even a better phenotype, because it's the thing selection is actually based on.
Finally, we arrive at the systems with the least definable genotypes and phenotypes: open-ended artificial life systems. The goal of these systems is basically to create a rich world that will foster interesting evolutionary dynamics. Usually the genotype in these systems is fairly easy to define - it's the lowest level at which members of the population are defined. Perhaps it's a ring of assembly code, as in Avida, or a neural network, or some set of rules as in geb. Intuitively, the phenotype should capture something about what a member of the population does over its lifetime. But each member of the population does a lot of different things. So ultimately, in these systems, phenotypes tend to be defined differently based on what is being studied in a given experiment. While this may seem questionable at first, it is essentially how phenotypes are discussed in evolutionary biology as well. At some point, a system is complex enough that you just need to focus on the part you care about.

Optimizing assignments based on many variables

I was recently talking with someone in Resource Management and we discussed the problem of assigning developers to projects when there are many variables to consider (of possibly different weights), e.g.:
The developer's skills & the technology/domain of the project
The developer's travel preferences & the location of the project
The developer's interests and the nature of the project
The basic problem the RM person had to deal with on a regular basis was this: given X developers where each developers has a unique set of attributes/preferences, assign them to Y projects where each project has its own set of unique attributes/requirements.
It seems clear to me that this is a very mathematical problem; it reminds me of old optimization problems from algebra and/or calculus (I don't remember which) back in high school: you know, find the optimal dimensions for a container to hold the maximum volume given this amount of material—that sort of thing.
My question isn't about the math, but rather whether there are any software projects/libraries out there designed to address this kind of problem. Does anyone know of any?
My question isn't about the math, but rather whether there are any software projects/libraries out there designed to address this kind of problem. Does anyone know of any?
In my humble opinion, I think that this is putting the cart before the horse. You first need to figure out what problem you want to solve. Then, you can look for solutions.
For example, if you formulate the problem by assigning some kind of numerical compatibility score to every developer/project pair with the goal of maximizing the total sum of compatibility scores, then you have a maximum-weight matching problem which can be solved with the Hungarian algorithm. Conveniently, this algorithm is implemented as part of Google's or-tools library.
On the other hand, let's say that you find that computing compatibility scores to be infeasible or unreasonable. Instead, let's say that each developer ranks all the projects from best to worst (e.g.: in terms of preference) and, similarly, each project ranks each developer from best to worst (e.g.: in terms of suitability to the project). In this case, you have an instance of the Stable Marriage problem, which is solved by the Gale-Shapley algorithm. I don't have a pointer to an established library for G-S, but it's simple enough that it seems that lots of people just code their own.
Yes, there are mathematical methods for solving a type of problem which this problem can be shoehorned into. It is the natural consequence of thinking of developers as "resources", like machine parts, largely interchangeable, their individuality easily reduced to simple numerical parameters. You can make up rules such as
The fitness value is equal to the subject skill parameter multiplied by the square root of the reliability index.
and never worry about them again. The same rules can be applied to different developers, different subjects, different scales of projects (with a SLOC scaling factor of, say, 1.5). No insight or real leadership is needed, the equations make everything precise and "assured". The best thing about this approach is that when the resources fail to perform the way your equations say they should, you can just reduce their performance scores to make them fit. And if someone has already written the tool, then you don't even have to worry about the math.
(It is interesting to note that Resource Management people always seem to impose such metrics on others in an organization -- thereby making their own jobs easier-- and never on themselves...)

What model best suits optimizing for a real-time strategy game?

An article has been making the rounds lately discussing the use of genetic algorithms to optimize "build orders" in StarCraft II.
http://lbrandy.com/blog/2010/11/using-genetic-algorithms-to-find-starcraft-2-build-orders/
The initial state of a StarCraft match is pre-determined and constant. And like chess, decisions made in this early stage of the match have long-standing consequences to a player's ability to perform in the mid and late game. So the various opening possibilities or "build orders" are under heavy study and scrutiny. Until the circulation of the above article, computer-assisted build order creation probably wasn't as popularity as it has been recently.
My question is... Is a genetic algorithm really the best way to model optimizing build orders?
A build order is a sequence of actions. Some actions have prerequisites like, "You need building B before you can create building C, but you can have building A at any time." So a chromosome may look like AABAC.
I'm wondering if a genetic algorithm really is the best way to tackle this problem. Although I'm not too familiar with the field, I'm having a difficult time shoe-horning the concept of genes into a data structure that is a sequence of actions. These aren't independent choices that can be mixed and matched like a head and a foot. So what value is there to things like reproduction and crossing?
I'm thinking whatever chess AIs use would be more appropriate since the array of choices at any given time could be viewed as tree-like in a way.
Although I'm not too familiar with the field, I'm having a difficult time shoe-horning the concept of genes into a data structure that is a sequence of actions. These aren't independent choices that can be mixed and matched like a head and a foot. So what value is there to things like reproduction and crossing?
Hmm, that's a very good question. Perhaps the first few moves in Starcraft can indeed be performed in pretty much any order, since contact with the enemy is not as immediate as it can be in Chess, and therefore it is not as important to remember the order of the first few moves as it is to know which of the many moves are included in those first few. But the link seems to imply otherwise, which means the 'genes' are indeed not all that amenable to being swapped around, unless there's something cunning in the encoding that I'm missing.
On the whole, and looking at the link you supplied, I'd say that genetic algorithms are a poor choice for this situation, which could be accurately mathematically modelled in some parts and the search tree expanded out in others. They may well be better than an exhaustive search of the possibility space, but may not be - especially given that there are multiple populations and poorer ones are just wasting processing time.
However, what I mean by "a poor choice" is that it is inefficient relative to a more appropriate approach; that's not to say that it couldn't still produce 98% optimal results in under a second or whatever. In situations such as this where the brute force of the computer is useful, it is usually more important that you have modelled the search space correctly than to have used the most effective algorithm.
As TaslemGuy pointed out, Genetic Algorithms aren't guaranteed to be optimal, even though they usually give good results.
To get optimal results you would have to search through every possible combination of actions until you find the optimal path through the tree-like representation. However, doing this for StarCraft is difficult, since there are so many different paths to reach a goal. In chess you move a pawn from e2 to e4 and then the opponent moves. In StarCraft you can move a unit at instant x or x+1 or x+10 or ...
A chess engine can look at many different aspects of the board (e.g. how many pieces does it have and how many does the opponent have), to guide it's search. It can ignore most of the actions available if it knows that they are strictly worse than others.
For a build-order creator only time really matters. Is it better to build another drone to get minerals faster, or is it faster to start that spawning pool right away? Not as straightforward as with chess.
These kinds of decisions happen pretty early on, so you will have to search each alternative to conclusion before you can decide on the better one, which will take a long time.
If I were to write a build-order optimizer myself, I would probably try to formulate a heuristic that estimates how good (close the to the goal state) the current state is, just as chess engines do:
Score = a*(Buildings_and_units_done/Buildings_and_units_required) - b*Time_elapsed - c*Minerals - d*Gas + e*Drone_count - f*Supply_left
This tries to keep the score tied to the completion percentage as well as StarCraft common knowledge (keep your ressources low, build drones, don't build more supply than you need). The variables a to f would need tweaking, of course.
After you've got a heuristic that can somewhat estimate the worth of a situation, I would use Best-first search or maybe IDDFS to search through the tree of possibilities.
Edit:
I recently found a paper that actually describes build order optimization in StarCraft, in real time even. The authors use depth-first search with branch and bound and heuristics that estimate the minimum amount of effort required to reach the goal based on the tech tree (e.g. zerglings need a spawning pool) and the time needed to gather the required minerals.
Genetic Algorithm can be, or can sometimes not be, the optimal or non-optimal solution. Based on the complexity of the Genetic Algorithm, how much mutation there is, the forms of combinations, and how the chromosomes of the genetic algorithm is interpreted.
So, depending on how your AI is implemented, Genetic Algorithms can be the best.
You are looking at a SINGLE way to implement genetic algorithms, while forgetting about genetic programming, the use of math, higher-order functions, etc. Genetic algorithms can be EXTREMELY sophisticated, and by using clever combining systems for crossbreeding, extremely intelligent.
For instance, neural networks are optimized by genetic algorithms quite often.
Look up "Genetic Programming." It's similar, but uses tree-structures instead of lines of characters, which allows for more complex interactions that breed better. For more complex stuff, they typically work out better.
There's been some research done using hierarchical reinforcement learning to build a layered ordering of actions that efficiently maximizes a reward. I haven't found much code implementing the idea, but there are a few papers describing MAXQ-based algorithms that have been used to explicitly tackle real-time strategy game domains, such as this and this.
This Genetic algorithm only optimizes the strategy for one very specific part of the game: The order of the first few build actions of the game. And it has a very specific goal as well: To have as many roaches as quickly as possible.
The only aspects influencing this system seem to be (I'm no starcraft player):
build time of the various units and
buildings
allowed units and buildings given the available units and buildings
Larva regeneration rate.
This is a relatively limited, relatively well defined problem with a large search space. As such it is very well suited for genetic algorithms (and quite a few other optimization algorithm at that). A full gene is a specific set of build orders that ends in the 7th roach. From what I understand you can just "play" this specific gene to see how fast it finishes, so you have a very clear fitness test.
You also have a few nice constraints on the build order, so you can combine different genes slightly smarter than just randomly.
A genetic algorithm used in this way is a very good tool to find a more optimal build order for the first stage of a game of starcraft. Due to its random nature it is also good at finding a surprising strategy, which might have been an additional goal of the author.
To use a genetic algorithm as the algorithm in an RTS game you'd have to find a way to encode reactions to situations rather than just plain old build orders. This also involves correctly identifying situations which can be a difficult task in itself. Then you'd have to let these genes play thousands of games of starcraft, against each other and (possibly) against humans, selecting and combining winners (or longer-lasting losers). This is also a good application of genetic algorithms, but it involves solving quite a few very hard problems before you even get to the genetic algorithm part.