Reward distribution Reinforcement Learning

Reward distribution Reinforcement Learning - reward

Problem1:
We want to go from s to e. In each cell we can move right R or down D. The environment is fully known. The table has (4*5) 20 cells. The challenge is that we do not know what the reward of each cell is, but we will receive an overall reward as we pass and finish a path.
Example: a solution can be RRDDRDR and the overall reward is 16.
s 3 5 1 5
1 2 4 5 1
7 3 1 2 8
9 2 1 1 e
The target is to find a set of actions from Start to End which maximizes the obtained overall reward. How can we distribute the overall reward among actions?
Problem2:
This problem is the same as Problem1 but the rewards of problem environment is dynamic so that the way we reach a cell will affect the rewards of cells which are ahead.
Example: for two movements of RRD and DRR, both will get us to the same cell but since they have different path, the ahead cells will have different rewards.
s 3 5 1 5
1 2 4 9 -1
7 3 2 -5 18
9 2 9 7 e
(RRD path, selecting this path will result in changes of rewards of ahead cells)
s 3 5 1 5
1 2 4 3 1
7 3 30 7 -8
9 2 40 11 e
(DRR path, selecting this path will result in changes of rewards of ahead cells)
The target is to find a set of actions from Start to End which maximizes the obtained overall reward. How can we distribute the overall reward between actions? (After passing a path from Start to End and the overall reward is obtained)

Can you say more about the research you are doing? (The problem sounds a lot like the sort of thing someone might assign just to get you thinking about temporal credit assignment.)

Related

Is it possible to calculate a feature matrix only for test data?

I have more than 100,000 rows of training data with timestamps and would like to calculate a feature matrix for new test data, of which there are only 10 rows. Some of the features in the test data will end up aggregating some of the training data. I need the implementation to be fast since this is one step in a real-time inference pipeline.
I can think of two ways this can be implemented:
Concatenating the train and test entity sets and running DFS and then only using the last 10 rows and throwing away the rest. This is very time consuming. Is there a way to calculate a subset of an entity set while using data from the entire entity set?
Using the steps outlined in Calculating Feature Matrix for New Data section on the Featuretools Deployment page. However, as demonstrated below, this doesn't seem to work.
Create all/train/test entity sets:
import featuretools as ft
data = ft.demo.load_mock_customer(n_customers=3, n_sessions=15)
df_sessions = data['sessions']
# Create all/train/test entity sets.
all_es = ft.EntitySet(id='sessions')
train_es = ft.EntitySet(id='sessions')
test_es = ft.EntitySet(id='sessions')
all_es = all_es.entity_from_dataframe(
entity_id='sessions',
dataframe=df_sessions, # all sessions
index='session_id',
time_index='session_start',
)
train_es = train_es.entity_from_dataframe(
entity_id='sessions',
dataframe=df_sessions.iloc[:10], # first 10 sessions
index='session_id',
time_index='session_start',
)
test_es = test_es.entity_from_dataframe(
entity_id='sessions',
dataframe=df_sessions.iloc[10:], # last 5 sessions
index='session_id',
time_index='session_start',
)
# Normalise customer entities so we can group by customers.
all_es = all_es.normalize_entity(base_entity_id='sessions',
new_entity_id='customers',
index='customer_id')
train_es = train_es.normalize_entity(base_entity_id='sessions',
new_entity_id='customers',
index='customer_id')
test_es = test_es.normalize_entity(base_entity_id='sessions',
new_entity_id='customers',
index='customer_id')
Set cutoff_time since we are dealing with data with timestamps:
cutoff_time = (df_sessions
.filter(['session_id', 'session_start'])
.rename(columns={'session_id': 'instance_id',
'session_start': 'time'}))
Calculate feature matrix for all data:
feature_matrix, features_defs = ft.dfs(entityset=all_es,
cutoff_time=cutoff_time,
target_entity='sessions')
display(feature_matrix.filter(['customer_id', 'customers.COUNT(sessions)']))
session_id
customer_id
customers.COUNT(sessions)
1
3
1
2
3
2
3
1
1
4
2
1
5
2
2
6
2
3
7
2
4
8
1
2
9
2
5
10
1
3
11
1
4
12
2
6
13
3
3
14
1
5
15
3
4
Calculate feature matrix for train data:
feature_matrix, features_defs = ft.dfs(entityset=train_es,
cutoff_time=cutoff_time.iloc[:10],
target_entity='sessions')
display(feature_matrix.filter(['customer_id', 'customers.COUNT(sessions)']))
session_id
customer_id
customers.COUNT(sessions)
1
3
1
2
3
2
3
1
1
4
2
1
5
2
2
6
2
3
7
2
4
8
1
2
9
2
5
10
1
3
Calculate feature matrix for test data (using method shown in "Feature Matrix for New Data" on the Featuretools Deployment page):
feature_matrix = ft.calculate_feature_matrix(features=features_defs,
entityset=test_es,
cutoff_time=cutoff_time.iloc[10:])
display(feature_matrix.filter(['customer_id', 'customers.COUNT(sessions)']))
session_id
customer_id
customers.COUNT(sessions)
11
1
1
12
2
1
13
3
1
14
1
2
15
3
2
As you can see, the feature matrix generated from train_es matches the first 10 rows of the feature matrix generated from all_es. However, the feature matrix generated from test_es doesn't match the corresponding rows from the feature matrix generated from all_es.

You can control which instances you want to generate features for with the cutoff_time dataframe (or the instance_ids argument in DFS if the cutoff time is a single datetime). Featuretools will only generate features for instances whose IDs are in the cutoff time dataframe and will ignore all others:
feature_matrix, features_defs = ft.dfs(entityset=all_es,
cutoff_time=cutoff_time[10:],
target_entity='sessions')
display(feature_matrix.filter(['customer_id', 'customers.COUNT(sessions)']))
customer_id
customers.COUNT(sessions)
session_id
1
4
2
6
3
3
1
5
3
4
The method in "Feature Matrix for New Data" is useful when you want to calculate the same features but on entirely new data. All the same features will be created, but data isn't shared between the entitysets. That doesn't work in this case, since the goal is to use all the data but only generate features for certain instances.

Labeling rows in pandas using multiple boolean conditions without chaining

I'm trying to label data in the original dataframe, based on multiple boolean conditions. This is easy enough when labeling based on one or two conditions, but as I begin requiring multiple conditions the code becomes difficult to manage. The solution seems to break the code down into copies, but that causes chain errors. Here is one example of the issue...
This is a simplified version of what my data looks like:
df=pd.DataFrame(np.array([['ABC',1,3,3,4], ['std',0,0,2,4],['std',2,1,2,4],['std',4,4,2,4],['std',2,6,2,6]]), columns=['Note', 'Na','Mg','Si','S'])
df
Note Na Mg Si S
0 ABC 1 3 3 4
1 std 0 0 2 4
2 std 2 1 2 4
3 std 4 4 2 4
4 std 2 6 2 6
A standard (std) is located throughout the dataframe. I would like to create a label when the instrument fails. This occurs in the data when:
String condition met (Note = standard/std)
Na>0 & Mg>0
Doesn't fall outside of a calculated range for more than 2 elements.
For requirement 3 - Here is an example of a range:
maxMin=pd.DataFrame(np.array([['Max',3,3,3,7], ['Min',1,1,2,2]]), columns=['Note', 'Na','Mg','Si','S'])
maxMin
Note Na Mg Si S
0 Max 3 3 3 7
1 Min 1 1 2 2
Calculating out of bound standard:
elements=['Na','Mg','Si','S']
std=df[(df['Note'].str.contains('std|standard'))&(df['Na']>0)&(df['Mg'])
df.loc[(std[elements].lt(maxMin.loc[1, :])|std[elements].gt(maxMin.loc[0, :]).select_dtypes(include=['bool'])).sum(axis=1)>2]
Note Na Mg Si S
3 std 4 4 2 4
Now, I would like to label this datapoint within the original dataframe. Desired result:
Note Na Mg Si S Error
0 ABC 1 3 3 4 False
1 std 0 0 2 4 False
2 std 2 1 2 4 False
3 std 4 4 2 4 True
4 std 2 6 2 6 False
I've tried things like:
df['Error'].loc[std.loc[(std[elements].lt(maxMin.loc[1, :])|std[elements].gt(mMmaxMinloc[0, :]).select_dtypes(include=['bool'])).sum(axis=1)>5].index.values.copy()]=True
That unfortunately causes a chain error.
How would you accomplish this without creating a chain error? Most books/tutorial revolve around creating one long expression, but as I dive deeper, I feel there might be a simpler solution. Any input would be appreciated

I figured it out a solution that works for me.
The solution was to use .index.value to create an array of the index that passed the bool conditions. That array can be used to pass edit the original dataframe.
##These two conditions can probably be combined
condition1=df[(df['Note'].str.contains('std|standard'))&(df['Na']>.01)&(df['Mg']>.01)]
##where condition1 is greater/less than the bounds of the known value.
##provides array where condition is true
OutofBounds=condition1.loc[(condition1[elements].lt(maxMin.loc[1, :])|condition1[elements].gt(maxMin.loc[0, :]).select_dtypes(include=['bool'])).sum(axis=1)>5].index.values
OutofBounds
out:array([ 3], dtype=int64)
Now I can pass the array into the original dataframe:
df.loc[OutofBounds, 'Error']=True

SQL table structure for store value against list of combination

I have a requirement from client where I need to store a value against list of combination.
For example I have following LOBs and against each combination I need to store a value.
Auto
WC
Personal
I purposed multiple solutions he is not satisfied with anyone.
Solution 1: create single table, insert value against all possible combination(string) something like
LOB Value
Auto 1
WC 2
Personal 3
Auto,WC 4
Auto, personal 5
WC, Personal 6
Auto, WC, Personal 7
Solution 2: create lkp_lob, lob_group and lob_group_detail tables. Each group combination represent a group.
Lkp_lob
Lob_key Name
1 Auto
2 WC
3 Person
Lob_group (unique query constrain on lob_group_key and lob_key)
Lob_group_key Lob_key
1 1
2 2
3 3
4 1
4 2
5 1
5 3
6 2
6 3
7 1
7 2
7 3
Lob_group_detail
Lob_group_key Value
1 1
2 2
3 3
4 4
5 5
6 6
7 7
Any suggestion would be highly appreciated.

First of all I did not understood that terms you said.
But from database perspective it is always good to have multiple tables for each module. You will be facing less difficulties when doing CRUD. And will be more faster.

Count instances of number in Row

I have a sheet formated somewhat like this
Thing 5 6 7 Person 1 Person 2 Person 3
Thing 1 1 2 7 7 6
Thing 2 5 5
Thing 3 7 6 6
Thing 4 6 6 5
I am trying to find a query formula that I can place in the columns labeled 5,6,7 that will count the number of people who have that amount of Thing 1. For example, I filled out the Thing 1 row, showing that 1 person has 6 of Thing 1 and 2 people have 7 of Thing 1.

You can use this function: "COUNTIF".
The formula to write in the cells will look like this:
=COUNTIF(E2:G2;"=5")
For more information regarding this function, check the documentation: https://support.google.com/docs/answer/3093480?hl=en

Excel VBA function to solve impossible round-robin tournament roster with venue constraint

I am really having difficulty generating a round-robin tournament roster with the following conditions:
10 Teams (Teams 1 - 10)
5 Fields (Field A - E)
9 Rounds (Round 1 - 9)
Each team must play every other team exactly once.
Only two teams can play on a field at any one time. (i.e. all 5 fields always in use)
No team is allowed to play on any particular field more than twice. <- This is the problem!
I have been trying on and off for many years to solve this problem on paper without success. So once and for all, I would like to generate a function in Excel VBA to test every combination to prove it is impossible.
I started creating a very messy piece of code that generates an array using nested if/while loops, but I can already see it's just not going to work.
Is there anyone out there with a juicy piece of code that can solve?
Edit: Thanks to Brian Camire's method below, I've been able to include further desirable constraints and still get a solution:
No team plays the same field twice in a row
A team should play on all the fields once before repeating
The solution is below. I should have asked years ago! Thanks again Brian - you are a genius!
Round 1 2 3 4 5 6 7 8 9
Field A 5v10 1v9 2v4 6v8 3v7 4v10 3v9 7v8 1v2
Field B 1v7 8v10 3v6 2v9 4v5 6v7 1v8 9v10 3v5
Field C 2v6 3v4 1v10 5v7 8v9 1v3 2v5 4v6 7v10
Field D 4v9 2v7 5v8 3v10 1v6 2v8 4v7 1v5 6v9
Field E 3v8 5v6 7v9 1v4 2v10 5v9 6v10 2v3 4v8

I think I've found at least one solution to the problem:
Round Field Team 1 Team 2
1 A 3 10
1 B 7 8
1 C 1 9
1 D 2 4
1 E 5 6
2 A 8 10
2 B 1 5
2 C 2 6
2 D 3 7
2 E 4 9
3 A 1 4
3 B 2 3
3 C 8 9
3 D 5 7
3 E 6 10
4 A 6 7
4 B 4 10
4 C 2 8
4 D 5 9
4 E 1 3
5 A 2 9
5 B 3 8
5 C 4 7
5 D 1 6
5 E 5 10
6 A 3 9
6 B 4 5
6 C 7 10
6 D 6 8
6 E 1 2
7 A 5 8
7 B 6 9
7 C 1 10
7 D 3 4
7 E 2 7
8 A 4 6
8 B 2 10
8 C 3 5
8 D 1 8
8 E 7 9
9 A 2 5
9 B 1 7
9 C 3 6
9 D 9 10
9 E 4 8
I found it using the OpenSolver add-in for Excel (as the problem was too large for the built-in Solver feature). The steps were something like this:
Set up a table with 2025 rows representing the possible matches -- that is, possible combinations of round, field, and pair of teams (with columns like the table above), plus one extra column that will be a binary (0 or 1) decision variable indicating if the match is to be selected.
Set up formulas to use the decision variables to calculate: a) the number matches at each field in each round, b) the number of matches between each pair of teams, c) the number of matches played by each team in each round, and, d) the number of matches played by each team at each field.
Set up a formula to use the decision variables to calculate the total number of matches.
Use OpenSolver to solve a model whose objective is to maximize the result of the formula from Step 3 by changing the decision variables from Step 1, subject to the constraints that the decision variables must be binary, the results of the formulas from Steps 2.a) through c) must equal 1, and the results of the formulas from Step 2.d) must be less than or equal to 2.
The details are as follows...
For Step 1, I set up my table so that columns A, B, C, and D represented the Round, Field, Team 1, and Team 2, respectively, and column E represented the decision variable. Row 1 contained the column headings, and rows 2 through 2026 each represented one possible match.
For Step 2.a), I set up a vertical list of rounds 1 through 9 in cells I2 through I10, a horizontal list of fields A through E in cells J1 through N1, and a series of formulas to calculate the number of matches in each field in each round in cells J2 through N10 by starting with =SUMIFS($E$2:$E$2026,$A$2:$A$2026,$I2,$B$2:$B$2026,J$1) in cell J2 and then copying and pasting.
For Step 2.b), I set up a vertical list of teams 1 through 9 in cells I13 through I21, a horizontal list of opposing teams 2 through 10 in cells J12 through R12, and a series of formulas to calculate the number of matches between each pair of teams in the "upper right triangular half" of cells J13 through R21 (including the diagonal) by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I13,$D$2:$D$2026,J$12) in cell J13 and then copying and pasting.
For Step 2.c), I set up a vertical list of teams 1 through 10 in cells I24 through I33, a horizontal list of rounds 1 through 9 in cells J23 through R23, and a series of formulas to calculate the number of matches played by each team in each round in cells J24 through R33 by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I24,$A$2:$A$2026,J$23)+SUMIFS($E$2:$E$2026,$D$2:$D$2026,$I24,$A$2:$A$2026,J$23) in cell J24 and then copying and pasting.
For Step 2.d), I set up a vertical list of teams 1 through 10 in cells I36 through I45, a horizontal list of fields A through B in cells J35 through N45, and series of formulas to calculate the number of matches played by each team at each field in cells J36 through N45 by starting with =SUMIFS($E$2:$E$2026,$C$2:$C$2026,$I36,$B$2:$B$2026,J$35)+SUMIFS($E$2:$E$2026,$D$2:$D$2026,$I36,$B$2:$B$2026,J$35) in cell J36 and then copying and pasting.
For Step 3, I set up a formula to calculate the total number of matches in cell G2 as =SUM($E$2:$E$2026).
For Step 4, in the OpenSolver Model dialog (available from Data, OpenSolver, Model) I set the Objective Cell to $G$2, the Variable Cells to $E$2:$E$2026, and added constraints as described above and detailed below (sorry that the constraints are not listed in the order that I described them):
Note that, for the constraints described in Step 2.b), I needed to add the constraints separately for each row, since OpenSolver raised an error message if the constraints included the blank cells in the "lower left triangular half".
After setting up the model, OpenSolver highlighted the objective, variable, and constraint cells as shown below:
I then solved the problem using OpenSolver (via Data, OpenSolver, Solve). The selected matches are the ones with a 1 in column E. You might get a different solution than I did, as there might be many feasible ones.

come on ... that's an easy one for manual solution ;-)
T1 T2 VE
1 2 A
1 3 A
1 4 B
1 5 B
1 6 C
1 7 C
1 8 D
1 9 D
1 10 E
2 3 A
2 4 B
2 5 B
2 6 C
2 7 C
2 8 D
2 9 D
2 10 E
3 4 C
3 5 C
3 6 D
3 7 D
3 8 E
3 9 E
3 10 B
4 5 C
4 6 D
4 7 D
4 8 E
4 9 E
4 10 A
5 6 E
5 7 E
5 8 A
5 9 A
5 10 D
6 7 E
6 8 A
6 9 A
6 10 B
7 8 B
7 9 B
7 10 A
8 9 B
8 10 C
9 10 C
As far as I have checked no team more then twice on the same venue. Please double check.
To divide it into rounds should be a easy one.
Edit: this time with only 5 venues :-)
Edit 2: now also with allocated rounds :-)
Edit 3: deleted the round allocation again because it was wrong.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Reward distribution Reinforcement Learning - reward

Can you say more about the research you are doing? (The problem sounds a lot like the sort of thing someone might assign just to get you thinking about temporal credit assignment.)

Related

Is it possible to calculate a feature matrix only for test data?

Labeling rows in pandas using multiple boolean conditions without chaining

SQL table structure for store value against list of combination

Count instances of number in Row

Excel VBA function to solve impossible round-robin tournament roster with venue constraint

Categories

Resources