Conditional GANs to Causal GANS? - conditional-statements

Can we use conditional GANs to show causality in our data?
I tried a Conditional GAN and I want to know how can I convert it into a Causal one.

Finding causal relationships is very difficult and depends on both model and data
Generally speaking, there is no quick fix that can just make any complex ML model into a causal one (this applies to GANs as much as to anything else). It all depends on what data you have and what causal relationships you hope to find or estimate.
For example, if you have data with a lot of interventions (e.g. data collected through many controlled experiments), you may be able to leverage the difference in outcomes between the experiments to estimate causal effects. If you have only an observational dataset, as is the standard for many vanilla machine learning tasks, finding causal relationships is extremely difficult.

Related

Is there a standard way to breakdown huge optimization models to ensure the submodules are working correctly?

Apologies as this may be a general question for optimization:
For truly large scale optimization models it feels as if the model becomes quite complex and cumbersome before it is even testable. For small scale optimization problems, up to even 10-20 constraints its feasible to code the entire program and just stare at it and debug.
However, for large scale models with potentially 10s-100 of constraint equations it feels as if there should be a way to test subsections of the optimization model before putting the entire thing together.
Imagine you are writing a optimization for a rocket that needs to land on the moon the model tells the rocket how to fly itself and land on the moon safely. There might be one piece of the model that would dictate the gravitational effects of the earth and moon and their orbits together that would influence how the rocket should position itself, another module that dictates how the thrusters should fire in order to maneuver the rocket into the correct positions, and perhaps a final module that dictates how to optimally use the various fuel sources.
Is there a good practice to ensure that one small section (e.g the gravitational module) works well independently of the whole model. Then iteratively testing the rocket thruster piece, then the optimal fuel use etc. Since, once you put all the pieces together and the model doesn't resolve (perhaps due to missing constraints or variables) it quickly becomes a nightmare to debug.
What are the best practices if any for iteratively building and testing large-scale optimization models?
I regularly work on models with millions of variables and equations. ("10s-100 of constraint equations" is considered small-scale). Luckily they all have less than say 50 blocks of similar equations (indexed equations). Obviously just eyeballing solutions is impossible. So we add a lot of checks (also on the data, which can contain errors). For debugging, it is a very good idea to have a very small data set around. Finally, it helps to have good tools, such as modeling systems with type/domain checking, automatic differentiation etc.)
Often we cannot really check equations in isolation because we are dealing with simultaneous equations. The model only makes sense when all equations are present. So "iterative building and testing" is usually not possible for me. Sometimes we keep small stylized models around for documentation and education.

How to find good observations for reinforcement learning?

I am starting with my study of RL and was wondering how would one approach the observation features, which are not able to represent the state(hidden)?
Is there some systematic approach or some guidelines on how one would prefer the feature vector to look like? Discrete, dimension, Markov properties, embedding quality...?
I would like to process machine operation data streams and actually have a lot of direct measurements and many high-dim feature-vector (also stream).
Thank you very much for you input.

Encoding invariance for deep neural network

I have a set of data, 2D matrix (like Grey pictures).
And use CNN for classifier.
Would like to know if there is any study/experience on the accuracy impact
if we change the encoding from traditionnal encoding.
I suppose yes, question is rather which transformation of the encoding make the accuracy invariant, which one deteriorates....
To clarify, this concerns mainly the quantization process of the raw data into input data.
EDIT:
Quantize the raw data into input data is already a pre-processing of the data, adding or removing some features (even minor). It seems not very clear the impact in term of accuracy on this quantization process on real dnn computation.
Maybe, some research available.
I'm not aware of any research specifically dealing with quantization of input data, but you may want to check out some related work on quantization of CNN parameters: http://arxiv.org/pdf/1512.06473v2.pdf. Depending on what your end goal is, the "Q-CNN" approach may be useful for you.
My own experience with using various quantizations of the input data for CNNs has been that there's a heavy dependency between the degree of quantization and the model itself. For example, I've played around with using various interpolation methods to reduce image sizes and reducing the color palette size, and in the end, I discovered that each variant required a different tuning of hyper-parameters to achieve optimal results. Generally, I found that minor quantization of data had a negligible impact, but there was a knee in the curve where throwing away additional information dramatically impacted the achievable accuracy. Unfortunately, I'm not aware of any way to determine what degree of quantization will be optimal without experimentation, and even deciding what's optimal involves a trade-off between efficiency and accuracy which doesn't necessarily have a one-size-fits-all answer.
On a theoretical note, keep in mind that CNNs need to be able to find useful, spatially-local features, so it's probably reasonable to assume that any encoding that disrupts the basic "structure" of the input would have a significantly detrimental effect on the accuracy achievable.
In usual practice -- a discrete classification task in classic implementation -- it will have no effect. However, the critical point is in the initial computations for back-propagation. The classic definition depends only on strict equality of the predicted and "base truth" classes: a simple right/wrong evaluation. Changing the class coding has no effect on whether or not a prediction is equal to the training class.
However, this function can be altered. If you change the code to have something other than a right/wrong scoring, something that depends on the encoding choice, then encoding changes can most definitely have an effect. For instance, if you're rating movies on a 1-5 scale, you likely want 1 vs 5 to contribute a higher loss than 4 vs 5.
Does this reasonably deal with your concerns?
I see now. My answer above is useful ... but not for what you're asking. I had my eye on the classification encoding; you're wondering about the input.
Please note that asking for off-site resources is a classic off-topic question category. I am unaware of any such research -- for what little that is worth.
Obviously, there should be some effect, as you're altering the input data. The effect would be dependent on the particular quantization transformation, as well as the individual application.
I do have some limited-scope observations from general big-data analytics.
In our typical environment, where the data were scattered with some inherent organization within their natural space (F dimensions, where F is the number of features), we often use two simple quantization steps: (1) Scale all feature values to a convenient integer range, such as 0-100; (2) Identify natural micro-clusters, and represent all clustered values (typically no more than 1% of the input) by the cluster's centroid.
This speeds up analytic processing somewhat. Given the fine-grained clustering, it has little effect on the classification output. In fact, it sometimes improves the accuracy minutely, as the clustering provides wider gaps among the data points.
Take with a grain of salt, as this is not the main thrust of our efforts.

Definitions of Phenotype and Genotype

Can someone help me understand the definitions of phenotype and genotype in relation to evolutionary algorithms?
Am I right in thinking that the genotype is a representation of the solution. And the phenotype is the solution itself?
Thanks
Summary: For simple systems, yes, you are completely right. As you get into more complex systems, things get messier.
That is probably all most people reading this question need to know. However, for those who care, there are some weird subtleties:
People who study evolutionary computation use the words "genotype" and "phenotype" frustratingly inconsistently. The only rule that holds true across all systems is that the genotype is a lower-level (i.e. less abstracted) encoding than the phenotype. A consequence of this rule is that there can generally be multiple genotypes that map to the same phenotype, but not the other way around. In some systems, there are really only the two levels of abstraction that you mention: the representation of a solution and the solution itself. In these cases, you are entirely correct that the former is the genotype and the latter is the phenotype.
This holds true for:
Simple genetic algorithms where the solution is encoded as a bitstring.
Simple evolutionary strategies problems, where a real-value vector is evolved and the numbers are plugged directly into a function which is being optimized
A variety of other systems where there is a direct mapping between solution encodings and solutions.
But as we get to more complex algorithms, this starts to break down. Consider a simple genetic program, in which we are evolving a mathematical expression tree. The number that the tree evaluates to depends on the input that it receives. So, while the genotype is clear (it's the series of nodes in the tree), the phenotype can only be defined with respect to specific inputs. That isn't really a big problem - we just select a set of inputs and define phenotype based on the set of corresponding outputs. But it gets worse.
As we continue to look at more complex algorithms, we reach cases where there are no longer just two levels of abstraction. Evolutionary algorithms are often used to evolve simple "brains" for autonomous agents. For instance, say we are evolving a neural network with NEAT. NEAT very clearly defines what the genotype is: a series of rules for constructing the neural network. And this makes sense - that it the lowest-level encoding of an individual in this system. Stanley, the creator of NEAT, goes on to define the phenotype as the neural network encoded by the genotype. Fair enough - that is indeed a more abstract representation. However, there are others who study evolved brain models that classify the neural network as the genotype and the behavior as the phenotype. That is also completely reasonable - the behavior is perhaps even a better phenotype, because it's the thing selection is actually based on.
Finally, we arrive at the systems with the least definable genotypes and phenotypes: open-ended artificial life systems. The goal of these systems is basically to create a rich world that will foster interesting evolutionary dynamics. Usually the genotype in these systems is fairly easy to define - it's the lowest level at which members of the population are defined. Perhaps it's a ring of assembly code, as in Avida, or a neural network, or some set of rules as in geb. Intuitively, the phenotype should capture something about what a member of the population does over its lifetime. But each member of the population does a lot of different things. So ultimately, in these systems, phenotypes tend to be defined differently based on what is being studied in a given experiment. While this may seem questionable at first, it is essentially how phenotypes are discussed in evolutionary biology as well. At some point, a system is complex enough that you just need to focus on the part you care about.

deal with black-box on predictive model in data science

I have a question about this kind of situation.
If I have a black-box which contain only the code for one specific model,like Support Vector Machine,with no any other information in the box.
How should I test the model is still effective to use or not?
Thanks.
I would:
-first figure out if it works and how to train and generate predictions
-then pick a couple of datasets and divide it into your training and test data
-train and test the blackbox model and compare the results with a couple of known models
the point to stress here is to make sure you don't train your model(s) with your testing data...because that's the true test of how the model will generalize. If you're new to modelling, this is the most important thing.
It is common that certain models do well on some types of data and not others so that's the trick here...finding where the blackbox can be effective.
If your goal is to try and figure out the model in the box, then select datasets known to favour certain models and if it does well on it you can have an educated guess. But tricky to say for sure.
Not knowing the type of model is not good because it can be a time-waster if you are running a bunch of different algorithms on some data...you don't want to duplicate your efforts and it's nice to know how it can be regularized(unless it does that for you).