TFAgents: how to take into account invalid actions

TFAgents: how to take into account invalid actions - tensorflow

I'm using TF-Agents library for reinforcement learning,
and I would like to take into account that, for a given state,
some actions are invalid.
How can this be implemented?
Should I define a "observation_and_action_constraint_splitter" function when
creating the DqnAgent?
If yes: do you know any tutorial on this?

Yes you need to define the function, pass it to the agent and also appropriately change the environment output so that the function can work with it. I am not aware on any tutorials on this, however you can look at this repo I have been working on.
Note that it is very messy and a lot of the files in there actually are not being used and the docstrings are terrible and often wrong (I forked this and didn't bother to sort everything out). However it is definetly working correctly. The parts that are relevant to your question are:
rl_env.py in the HanabiEnv.__init__ where the _observation_spec is defined as a dictionary of ArraySpecs (here). You can ignore game_obs, hand_obs and knowledge_obs which are used to run the environment verbosely, they are not fed to the agent.
rl_env.py in the HanabiEnv._reset at line 110 gives an idea of how the timestep observations are constructed and returned from the environment. legal_moves are passed through a np.logical_not since my specific environment marks legal_moves with 0 and illegal ones with -inf; whilst TF-Agents expects a 1/True for a legal move. My vector when cast to bool would therefore result in the exact opposite of what it should be for TF-agents.
These observations will then be fed to the observation_and_action_constraint_splitter in utility.py (here) where a tuple containing the observations and the action constraints is returned. Note that game_obs, hand_obs and knowledge_obs are implicitly thrown away (and not fed to the agent as previosuly mentioned.
Finally this observation_and_action_constraint_splitter is fed to the agent in utility.py in the create_agent function at line 198 for example.

Related

Run-State values within shape script EA

Enterprise Architect 13.5.
I made MDG technology extending Object metatype. I have a shape script for my stereotype working well. I need to print several predefined run-state parameters for element. Is it possible to access to run-state params within Shape ?

As Geert already commented there is no direct way to get the runstate variables from an object. You might send a feature request to Sparx. But I'm pretty sure you can't hold your breath long enough to see it in time (if at all).
So if you really need the runstate in the script the only way is to use an add-in. It's actually not too difficult to create one and Geert has a nice intro how to create it in 10 minutes. In your shape script you can print a string restult returned from an operation like
print("#addin:myAddIn,pFunc1#")
where myAddIn is the name of the registered operation and pFunc1 is a parameter you pass to it. In order to control the script flow you can use
hasproperty('addin:myAddIn,pFunc2','1')
which evaluates the returned string to match or not match the string 1.
I once got that to work with no too much hassle. But until now I never had the real need to use it somewhere in production. Know that the addin is called from the interpreted script for each shaped element on the diagram and might (dramatically) affect rendering times.

#NLConstraint with vectorized constraint JuMP/Julia

I am trying to solve a problem involving the equating of sums of exponentials.
This is how I would do it hardcoded:
#NLconstraint(m, exp(x[25])==exp(x[14])+exp(x[18]))
This works fine with the rest of the code. However, when I try to do it for an arbitrary set of equations like the above I get an error. Here's my code:
#NLconstraint(m,[k=1:length(LHSSum)],sum(exp.(LHSSum[k][i]) for i=1:length(LHSSum[k]))==sum(exp.(RHSSum[k][i]) for i=1:length(RHSSum[k])))
where LHSSum and RHSSum are arrays containing arrays of the elements that need to be exponentiated and then summed over. That is LHSSum[1]=[x[1],x[2],x[3],...,x[n]]. Where x[i] are variables of type JuMP.Variable. Note that length(LHSSum)=length(RHSSum).
The error returned is:
LoadError: exp is not defined for type Variable. Are you trying to build a nonlinear problem? Make sure you use #NLconstraint/#NLobjective.
So a simple solution would be to simply do all the exponentiating and summing outside of the #NLconstraint function, so the input would be a scalar. However, this too presents a problem since exp(x) is not defined since x is of type JuMP.variable, whereas exp expects something of type real. This is strange since I am able to calculate exponentials just fine when the function is called within an #NLconstraint(). I.e. when I code this line#NLconstraint(m,exp(x)==exp(z)+exp(y)) instead of the earlier line, no errors are thrown.
Another thing I thought to do would be a Taylor Series expansion, but this too presents a problem since it goes into #NLconstraint land for powers greater than 2, and then I get stuck with the same vectorization problem.
So I feel stuck, I feel like if JuMP would allow for the vectorized evaluation of #NLconstraint like it does for #constraint, this would not even be an issue. Another fix would be if JuMP implements it's own exp function to allow for the exponentiation of JuMP.Variable type. However, as it is I don't see a way to solve this problem in general using the JuMP framework. Do any of you have any solutions to this problem? Any clever workarounds that I am missing?

I'm confused why i isn't used in the expressions you wrote. Do you mean:
#NLconstraint(m, [k = 1:length(LHSSum)],
sum(exp(LHSSum[k][i]) for i in 1:length(LHSSum[k]))
==
sum(exp(RHSSum[k][i]) for i in 1:length(RHSSum[k])))

Does histogram_summary respect name_scope

I am getting a Duplicate tag error when I try to write out histogram summaries for a multi-layer network that I generate procedurally. I think that the problem might be related to naming. Imagine code like the following:
with tf.name_scope(some_unique_name):
...
_ = tf.histogram_summary('weights', kernel_weights)
I'd naively assumed that 'weights' would be scoped to some_unique_name but I'm suspecting that it is not. Are summary names independent of name_scope?

As Dave points out, the tag argument to tf.histogram_summary(tag, ...) is indeed independent of the current name scope. Part of the reason for this is that the tag may be a string Tensor (i.e. computed by part of your graph), whereas name scopes are a purely client-side construct (i.e. Python-only), so there's no good way to make the scoping work consistently across the two modes of use.
However, if you're using TensorFlow build from source (and should be available in the next release, 0.8.0), you can use the following recipe to scope your tags (using Graph.unique_name(..., mark_as_used=False)):
with tf.name_scope(some_unique_name):
# ...
tf.histogram_summary(
tf.get_default_graph().unique_name('weights', mark_as_used=False),
kernel_weights)
Alternatively, you can do the following in the current version:
with tf.name_scope(some_unique_name) as scope:
# ...
tf.histogram_summary(scope + 'weights', kernel_weights)

They are.
I'm with you in thinking this is a bug, but I haven't run it past the designers of the op yet. Go ahead and open an issue for it on GitHub!
(I've run into this also and found it terribly annoying -- it prevents reuse of the model without deliberately parameterizing the summary op invocations.)

conditional component declaration and a following if equation

I am trying to build a model that will have slightly different equations based on whether or not certain components exist (in my case, fluid ports).
A code like the following will not work:
parameter Boolean use_component=false;
Component component if use_component;
equation
if use_component then
component.x = 0;
end if;
How can I work around this?

If you want to use condition components, there are some restrictions you need to be aware of. Section 4.4.5 of the Modelica 3.3 specification sums it up nicely. It says "If the condition is false, the component, its modifiers, and any connect equations
involving the component, are removed". I'll show you how to use this to solve your problem in just a second, but first I want to explain why your solution doesn't work.
The issue has to do with checking the model. In your case, it is obvious that the equation component.x and the component component either both exist or neither exist. That is because you have tied them to the same Boolean variable. But what if you had don't this:
parameter Real some_number;
Component component if some_number*some_number>4.0;
equation
if some_number>=-2 and some_number<=2 then
component.x = 0;
end if;
We can see that this logically identical to your case. There is no chance for component.x to exist when component is absent. But can we prove such things in general? No.
So, when conditional components were introduced, conservative semantics were implemented which can always trivially ensure that the sets of variables and equations involved never get "out of sync".
Let us to return to what the specification says: "If the condition is false, the component, its modifiers, and any connect equations
involving the component, are removed"
For your case, the solution could potentially be quite simple. Depending on how you declare "x", you could just add a modification to component, i.e.
parameter Boolean use_component=false;
Component component(x=0) if use_component;
The elegance of this is that the modification only applies to component and if component isn't present, neither is the modification (equation). So the variable x and its associated equation are "in sync". But this doesn't work for all cases (IIRC, x has to have an input qualifier for this to work...maybe that is possible in your case?).
There are two remaining alternatives. First, put the equation component.x inside component. The second is to introduce a connector on component that, if connected, will generate the equation you want. As with the modification case (this is not a coincidence), you could associate x with an input connector of some kind and then do this:
parameter Boolean use_component;
Component component if use_component;
Constant zero(k=0);
equation
connect(k.y, component.x);
Now, I could imagine that after considering all three cases (modification, internalize equation and use connect), you come to the conclusion that none of them will work. If this is the case, then I would humbly suggest that you have an issue with how you have designed the component. The reason these restrictions arise is related to the necessity to check components by themselves for correctness. This requires that the component be complete ("balanced" in the terminology of the specification).
If you cannot solve the problem with approaches I mentioned above, then I suspect you really have a balancing issue and that you probably need to redefine the boundaries of your component somehow. If this is the case, I would suggest you open another question here with details of what you are trying to do.

I think that the reason why this will not work is that the parser will look for the declaration of the variable "component.x" that, if the component is not active, does not exist. It does not work even if you insert the "Evaluate=true" in the annotation.
The cleanest solution in my opinion is to work at equation level and enable different sets of equations in the same block. You can create a wrapper model with the correct connectors and paramenters, and then if it is a causal model for example you can use replaceable classes in order to parameterize the models as functions, or else, in case of acausal models, put the equations inside if statements.
Another possible workaround is to place two different models inside one block, so you can use their variables into the equation section, and then build up conditional connections that will enable the usage of the block with the choosen behaviour. In other words you can build up a "wrap model" with two blocks inside, and then place the connection equations to the connectors of the wrap model inside if statements. Remember to build up the model so that there will be a consistent system of quations even for the blocks that are not used.
But this is not the best solution, because if the blocks are big you will have to wait longer time for compilation since everything will be compiled.
I hope this will help,
Marco

You can also make a dummy component that is not visible in the graphical layer:
connector DummyHeatPort
"Dummy heatport to facilitate optional heatport. Use this with a conditional heatport by connecting it to the heatport. Then use the -DummyHeatPort.Q_flow in the thermal energy balance."
Modelica.SIunits.Temperature T "Port temperature";
flow Modelica.SIunits.HeatFlowRate Q_flow
"Heat flow rate (positive if flowing from outside into the component)";
end DummyHeatPort;
Then when this gets used in a two port model
Modelica.Thermal.HeatTransfer.Interfaces.HeatPort_a heatport if use_heat_port;
DummyHeatPort dummy_heatport;
...
equation
flowport_a.H_flow + flowport_b.H_flow - dummy_heatport.Q_flow = storage
"thermal energy balance";
connect(dummy_heatport, heatport);
This way the heatport gets used if present but does not cause an error otherwise.

Linux Kernel Process Management

First, i admit all the things i will ask are about our homework but i assure you i am not asking without struggling at least two hours.
Description: We are supposed to add a field called max_cpu_percent to task_struct data type and manipulate process scheduling algorithm so that processes can not use an higher percentage of the cpu.
for example if i set max_cpu_percent field as 20 for the process firefox, firefox will not be able to use more than 20% of the cpu.
We wrote a system call to set max_cpu_percent field. Now we need to see if the system call works or not but we could not get the value of the max_cpu_percent field from a user-spaced program.
Can we do this? and how?
We tried proc/pid/ etc can we get the value using this util?
By the way, We may add additional questions here if we could not get rid of something else
Thanks All
Solution:
The reason was we did not modify the code block writing the output to the proc queries.
There are some methods in array.c file (fs/proc/array.c) we modified the function so that also print the newly added fields value. kernel is now compiling we'll see the result after about an hour =)
It Worked...

(If you simply extended getrlimit/setrlimit, then you'd be done by now…)
There's already a mechanism where similar parts of task_struct are exposed: /proc/$PID/stat (and /proc/$PID/$TID/stat). Look for functions proc_tgid_stat and proc_tid_stat. You can add new fields to the ends of these files.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas