What are the possible feature types in xgboost? - xgboost

On the documentation page for xgboost for python we see a feature_types https://xgboost.readthedocs.io/en/latest/python/python_api.html parameters but have no idea what are the possible values.
The documentation is really bad.
What are the possible values for feature_types?

The documentation seems poor as you say, searching through the XGBoost source code on Github gives some tests that show these options:
int
float
q: quantitative
i: indicator
While it is a bit difficult to figure out what these mean, some other sites list some additional information:
i: "i means this feature is binary indicator feature"
q: "means this feature is a quantitative value, such as age, time, can be missing"
int: "means this feature is integer value (when int is hinted, the decision boundary will be integer)"
Link: another StackOverflow post that mentions the q and i types.
In XGBoosts core.py code you can also find a comment on types:
# use quantitative as default
# {'q': quantitative, 'i': indicator}
Looking at the XGBoost code, from the rest of it, the type parsing goes into the underlying c-based backend code, so that is a bit of a black box still unless you want to go explore it.. :)

Related

How can I get test_value in PyMC(PyMC4)?

I am a newbie in Bayesian and Probabilistic inference, and sorry for this basic question. Recently I am following some examples in Bayesian Methods. And, the examples require me to use "tag.test_value." However, I am trying to use PyMC rather than PyMC3, so there is an error using the sentence. Although I tried to use others such as init_value, initial_value, it does not work...
Could you kindly let me know alternatives for that sentence to check the initial value in PyMC (that was originally test value in PyMC3)?
a = pm.Uniform("b", 0, 50)
print(a.tag.test_value)
AttributeError: 'ValidatingScratchpad' object has no attribute 'test_value
It appears that Aesara does not compute test value by default. You need to set aesara.config.compute_test_value = "warn". Then you can call a.get_test_value(). Hope this helps!

What is the difference between `matplotlib.rc` and `matplotlib.rcParams`? And which one to use?

I have been using matplotlib.rc in my scripts to preprocess my plots. But recently I have realized that using matplotlib.rcParams is much easier before doing a quick plot interactively (e.g. via IPython). This got me into thinking what difference between the two is.
I searched the matplotlib documentation wherein no clear answer was provided in this regard. Moreover, when I issue type(matplotlib.rc), the interpreter says that it is a function. On the other hand, when I issue type(matplotlib.rcParams), I am told that it is a class object. These two answers are not at all helpful and hence I would appreciate some help differentiating the two.
Additionally, I would like to know which one to prefer over the other.
Thanks in advance.
P.S. I went through this question: What's the difference between matplotlib.rc and matplotlib.pyplot.rc? but the answers are specific to the difference between the matplotlib instance and the pyplot instance of the two types I am enquiring about and, hence, is also not that helpful.
matplotlib.rc is a function that updates matplotlib.rcParams.
matplotlib.rcParams is a dict-subclass that provides a validate key-value map for Matplotlib configuration.
The docs for mpl.rc are at https://matplotlib.org/stable/api/matplotlib_configuration_api.html?highlight=rc#matplotlib.rc and the code is here.
The class definition of RcParams is here and it the instance is created here.
If we look at the guts of matplotlib.rc we see:
for g in group:
for k, v in kwargs.items():
name = aliases.get(k) or k
key = '%s.%s' % (g, name)
try:
rcParams[key] = v
except KeyError as err:
raise KeyError(('Unrecognized key "%s" for group "%s" and '
'name "%s"') % (key, g, name)) from err
where we see that matplotlib.rc does indeed update matplotlib.rcParams (after doing some string formatting).
You should use which ever one is more convenient for you. If you know exactly what key you want to update, then interacting with the dict-like is better, if you want to set a whole bunch of values in a group then mpl.rc is likely better!

Automatically detect security identifier columns using Visions

I'm interested in using the Visions library to automate the process of identifying certain types of security (stock) identifiers. The documentation mentions that it could be used in such a way for ISBN codes but I'm looking for a more concrete example of how to do it. I think the process would be pretty much identical for the fields I'm thinking of as they all have check digits (ISIN, SEDOL, CUSIP).
My general idea is that I would create custom types for the different identifier types and could use those types to
Take a dataframe where the types are unknown and identify columns matching the types (even if it's not a 100% match)
Validate the types on a dataframe where the intended type is known
Great question and use-case! Unfortunately, the documentation on making new types probably needs a little love right now as there were API breaking changes with the 0.7.0 release. Both the previous link and this post from August, 2020 should cover the conceptual idea of type creation in greater detail. If any of those examples break then mea culpa and our apologies, we switched to a dispatch based implementation to support different backends (pandas, numpy, dask, spark, etc...) for each type. You shouldn't have to worry about that for now but if you're interested you can find the default type definitions here with their backends here.
Building an ISBN Type
We need to make two basic decisions when defining a type:
What defines the type
What other types are our new type related to?
For the ISBN use-case O'Reilly provides a validation regex to match ISBN-10 and ISBN-13 codes. So,
What defines a type?
We want every element in the sequence to be a string which matches a corresponding ISBN-10 or ISBN-13 regex
What other types are our new type related to?
Since ISBN's are themselves strings we can use the default String type provided by visions.
Type Definition
from typing import Sequence
import pandas as pd
from visions.relations import IdentityRelation, TypeRelation
from visions.types.string import String
from visions.types.type import VisionsBaseType
isbn_regex = "^(?:ISBN(?:-1[03])?:?●)?(?=[0-9X]{10}$|(?=(?:[0-9]+[-●]){3})[-●0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[-●]){4})[-●0-9]{17}$)(?:97[89][-●]?)?[0-9]{1,5}[-●]?[0-9]+[-●]?[0-9]+[-●]?[0-9X]$"
class ISBN(VisionsBaseType):
#staticmethod
def get_relations() -> Sequence[TypeRelation]:
relations = [
IdentityRelation(String),
]
return relations
#staticmethod
def contains_op(series: pd.Series, state: dict) -> bool:
return series.str.contains(isbn_regex).all()
Looking at this closely there are three things to take note of.
The new type inherits from VisionsBaseType
We had to define a get_relations method which is how we relate a new type to others we might want to use in a typeset. In this case, I've used an IdentityRelation to String which means ISBNs are subsets of String. We can also use InferenceRelation's when we want to support relations which change the underlying data (say converting the string '4.2' to the float 4.2).
A contains_op this is our definition of the type. In this case, we are applying a regex string to every element in the input and verifying it matched the regex provided by O'Reilly.
Extensions
In theory ISBNs can be encoded in what looks like a 10 or 13 digit integer as well - to work with those you might want to create an InferenceRelation between Integer and ISBN. A simple implementation would involve coercing Integers to string and applying the above regex.

TFAgents: how to take into account invalid actions

I'm using TF-Agents library for reinforcement learning,
and I would like to take into account that, for a given state,
some actions are invalid.
How can this be implemented?
Should I define a "observation_and_action_constraint_splitter" function when
creating the DqnAgent?
If yes: do you know any tutorial on this?
Yes you need to define the function, pass it to the agent and also appropriately change the environment output so that the function can work with it. I am not aware on any tutorials on this, however you can look at this repo I have been working on.
Note that it is very messy and a lot of the files in there actually are not being used and the docstrings are terrible and often wrong (I forked this and didn't bother to sort everything out). However it is definetly working correctly. The parts that are relevant to your question are:
rl_env.py in the HanabiEnv.__init__ where the _observation_spec is defined as a dictionary of ArraySpecs (here). You can ignore game_obs, hand_obs and knowledge_obs which are used to run the environment verbosely, they are not fed to the agent.
rl_env.py in the HanabiEnv._reset at line 110 gives an idea of how the timestep observations are constructed and returned from the environment. legal_moves are passed through a np.logical_not since my specific environment marks legal_moves with 0 and illegal ones with -inf; whilst TF-Agents expects a 1/True for a legal move. My vector when cast to bool would therefore result in the exact opposite of what it should be for TF-agents.
These observations will then be fed to the observation_and_action_constraint_splitter in utility.py (here) where a tuple containing the observations and the action constraints is returned. Note that game_obs, hand_obs and knowledge_obs are implicitly thrown away (and not fed to the agent as previosuly mentioned.
Finally this observation_and_action_constraint_splitter is fed to the agent in utility.py in the create_agent function at line 198 for example.

wit.ai 'Only if..' not working

I am new in wit.ai. I'm confused with it. I have few questions:
how works Actions: 'Only if..' and 'Always if...'
simply I have 2 entities: 'Hi' and 'Botname',I have 2 stories: when say 'Hi' wit answers 'answer1', when say 'Botname', wit answers 'answer2'. It's Ok, but when combined 'Hi Botname', I want wit to answer 'answer1', but I can't echieveit without adding story. I try to add in Actions ->'Answer2'-'Only if..' 'doesn't have' ->'Hi', but still it answers 'Answer2' and I don't understend why :)
second question I sometimes don't get adequate answer from wit and I don't know how to avoid such cases. For example: entity 'constitution' and in 'understending' when writing 'station' wit gets 'constitution', this two words are different. and what to do? please, help with it.
To the first question, I'd suggest that rather than trying to use the keyword and free-text format of entities, you define and assign a trait entity which will not necessarily try to match the exact word, but the feeling of the sentence.
For example
Given the situation above, if you were to train an intent
called "greeting" to recognize all sentences with "Hi" in it as
greetings, then the result of "Hi Botname" will continue to be the
result of Hi. Also, if you're going to be using branching, enitites
will have to be defined as trait entities in either case.
To the second (And this will help with the first), you just have to spend some time training the bot to understand. You can't rush the brush. You'll have to feed it some examples before it can understand the difference in the words, and start to pick those differences up in future words.
The Wit Bot engine was released only a little while ago, so we're all learning it now, but I hope I could help you with the little knowledge I've gained.