How to write a custom policy in tf_agents - tensorflow

I wanted to use the contextual bandit agents (LinearThompson Sampling agent) in the tf_Agents.
I am using a custom environment and my rewards are delayed by 3 days. Hence for training, the observations are generated from the saved historical tables (predictions generated 3 days ago) and their corresponding rewards (Also in the table).
Given this, only during training, how do I make the policy to output an action, for a given observation, from the historical tables? And during evaluation I want the policy to behave the usual way, generating the actions using the policy it learned from.
Looks like I need to write a custom policy, that behaves in a way during training and behaves it's usual self (linearthompsonsampling.policy) during evaluation. Unfortunately I couldn't find any examples or documentation for this usecase. Can someone please explain how to code this - an example would be very useful

Related

Boxcox transformation with tree-based models(XGBoost to be specific)

I have a question regarding boxcox transformation(or log transformation). I am working on a data-set which I have lots of skewed features. Now when I take the boxcox transformation, I get quite a nice distribution but the thing is correlation decrease. Now if I was working with linear models I would just consider correlation to decide I should transform the feature or not. But as I mentioned I am working with tree-based models, so should I transform the feature to get a more dispersed distribution or I leave the feature as it is to avoid a decrease in correlation.
I add a screenshot of distribution and its relationship with the target variable, for both transformed and not transformed(Left 2 plots original feature and target).
PS: Guessing from the plots, it seems to me that if I transform the feature it will be easier for tree to find a split for this particular feature.
Thanks a lot,

What is the difference between the `policy` and `collect_policy` of a tf-agent?

I am looking at tf-agents to learn about reinforcement learning. I am following this tutorial. There is a different policy used, called collect_policy for training than for evaluation (policy).
The tutorial states there is a difference, but in IMO it does not describe the why of having 2 policies as it does not describe a functional difference.
Agents contain two policies:
agent.policy — The main policy that is used for evaluation and deployment.
agent.collect_policy — A second policy that is used for data collection.
I've looked at the source code of the agent. It says
policy: An instance of tf_policy.Base representing the Agent's current policy.
collect_policy: An instance of tf_policy.Base representing the Agent's current data collection policy (used to set self.step_spec).
But I do not see self.step_spec anywhere in the source file. The next closest thing I find is time_step_spec. But that is the first ctor argument of the TFAgent class, so that makes no sense to set via a collect_policy.
So the only thing I can think of was: put it to the test. So I used policy instead of collect_policy for training. And the agent reached the max score in the environment nonetheless.
So what is the functional difference between the two policies?
There are some reinforcement learning algorithms, such as Q-learning, that use a policy to behave in (or interact with) the environment to collect experience, which is different than the policy they are trying to learn (sometimes known as the target policy). These algorithms are known as off-policy algorithms. An algorithm that is not off-policy is known as on-policy (i.e. the behaviour policy is the same as the target policy). An example of an on-policy algorithm is SARSA. That's why we have both policy and collect_policy in TF-Agents, i.e., in general, the behavioural policy can be different than the target policy (though this may not always be the case).
Why should this be the case? Because during learning and interaction with environment, you need to explore the environment (i.e. take random actions), while, once you have learned the near-optimal policy, you may not need to explore anymore and can just take the near-optimal action (I say near-optimal rather than optimal because you may not have learned the optimal one)

C5.0 gives back only a single leaf

I'm doing a data analysis task in SPSS Modeler and I have finally arrived to the point of the stream where I'm trying to fit some models on the data.
However when I tried to run the mentioned c5.0 modeling node on my data, the node generated a modeling nugget containing only a single leaf, so there are no decision rules in the model. I partitioned the data before to train and test subsets (70-30). I did not use misclassification cost, used the properly predefined attribute roles. In the model's model page I checked the use partitioned data, build model for each split, Group symbolics, Use global pruning options in, I also tried to use expert mode, but it fails on simple mode too. I have tried to use different options but it gives the same output without a single split.
How can I make the model give back a more complex decision tree, I suppose that this is not the expected outcome.
Any suggestions are welcomed.
Please, check your distribution of the target variable and share it.
If the balances differs greatly from 50%-50%, you may need to balance your inputs first.
Missclassification cost is another technique to give you an output, but again it should be based on your empirical distributions.

Sampling rule depending on duration

Is it possible in AWS XRay to create the sampling rule somehow that will sample all the calls for some service with duration greater than some value?
The only way right now to find the lagging sub-service is to sample 100% and then filter by service name and duration. This is pretty expensive having 10K+ segments per second.
AWS X-Ray dev here. Biased sampling on longer duration can skew your service graph statistics and cause false negatives. If you are ok with this data skew, depend on which X-Ray SDK you are using you might be able to achieve conditional sampling by making explicit sampling decisions on segment close. This would require you to override some certain part of the SDK behaviors.
We know this is a popular feature request and we are working on improving it. Please use our AWS X-Ray public forum https://forums.aws.amazon.com/forum.jspa?forumID=241&start=0 for latest feature launch or provide any additional comments.

How to make testing data manually for clustering of citation records?

I'm doing a research on the author name disambiguation problem. I want to make some experiments. I want to perform clustering on citation records. My dataset consist of 2000 xml records. I need testing data. The dataset that I'm using is not popular and I need to make testing data manually. I don't know how to do so. I need instruction of how to make testing data manually. Note: I want to compare the performance of a set of techniques in solving the author name disambiguation problem, So I must perform testing.
Even though it is not really clear what kind of testing you want to perform, but general answer to the issue at hand - trying to artificially create more data from the data you have at hand - is a bootstrap. In general it is technique when you perform sampling with replacement from your dataset as many times as you want. It randomly picks up some element from your data repetitively untill you get a sample of the size you want. The sample you get could be larger than your original dataset but should have similar (from statistical point of view) as your original dataset. Bootstrap sampling is available in sklearn.
P.S. You need to keep in mind that this solution is not optimal - best solution to this problem is to actually get more real data somehow.
Classification vs. Clustering
For author name disambiguation, I don't think you want clustering. What you want is classification.
You have a features for each author / publication. Now you give the classifier two of those feature vectors. It classifies "it is the same author" or "those are different authors".
Training / testing data
Having a binary classification problem, the testing suddenly becomes simple: Just use one of the measures used in literature so often (accuracy, precision, recall, confuscation matrix).
Getting the data might be a bit more complicated. You wrote that you have an XML file of 2000 records. I guess you can derive features from those records automatically and authors have an identifier? Then you can simply generate negative examples by having different authors and positive examples by checking if the identifier is the same.
Otherwise you can have a look at http://dblp.uni-trier.de/. Although there are likely many publications under the same author which should be different, they do distinguish authors not only by name but give them identifiers.
Alternatively, you can train a classifier to classify each of the known authors with e.g. > 30 publications. Then remove the softmax layer and use those features to distinguish the authors.