update a pandas dataset, using pandas.eval or df.apply - pandas

I have a pandas dataframe with a lot rows.
One of the columns, lets say WHERE_CLAUSE has some sql like conditions. So I created a new column which uses these conditions and translates it to python/pandas statements.
Eg: column 'WHERE_CLAUSE' has the value for a row like 'ACCOUNTING_PERIOD NOT IN (999)' , I am translating it to a pandas statement in a new column 'EVAL_EXPR' with value '''base_table.loc[(base_table.ACCOUNTING_PERIOD.isin([999]),"REPORT_TYPE_ID")] = "initial"'''.
Now this is a perfectly valid statement if I execute it directly and it does update the dataset.
The problem I face is, since it's a string, I am using
pandas.eval which has the above value and it fails
So this is what I tried:
eval_str ='''base_table.loc[(base_table.ACCOUNTING_PERIOD.isin([999]),"REPORT_TYPE_ID")] = "initial"'''
print(eval_str)
pd.eval(eval_str)
and this is the error I receive:
> base_table["REPORT_TYPE_ID"].apply(lambda x: "final" if
> base_table.ACCOUNTING_PERIOD.isin([999]))
>
> Traceback (most recent call last):
>
> File "C:\ProgramData\Anaconda3\lib\site-
> packages\IPython\core\interactiveshell.py", line 3331, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
>
> File "<ipython-input-31-96405158714d>", line 9, in <module>
> pd.eval(eval_str)
>
> File
> "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\eval.py",
> line 332, in eval
> parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
>
> File
> "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expr.py",
> line 764, in __init__
> self.terms = self.parse()
>
> File
> "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expr.py",
> line 781, in parse
> return self._visitor.visit(self.expr)
>
> File
> "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expr.py",
> line 375, in visit
> return visitor(node, **kwargs)
>
> File
> "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expr.py",
> line 381, in visit_Module
> return self.visit(expr, **kwargs)
>
> File
> "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expr.py",
> line 375, in visit
> return visitor(node, **kwargs)
>
> File
> "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expr.py",
> line 583, in visit_Assign
> raise SyntaxError("left hand side of an assignment must be a single name")
>
> File "<string>", line unknown SyntaxError: left hand side of an
> assignment must be a single name
I know pandas.apply doesn't take statements but expressions which is obviously what I am trying to execute. But how do I fix this?
I dont know how to tweak this ,so that I can use dataframe['column'].apply to avoid this issue.
Any suggestions much appreciated.

So to execute a dynamically generated python statement, I used this function 'exec'.It worked like charm!
Code Tried:
eval_str = '''base_table.loc[(base_table.ACCOUNTING_PERIOD.isin([999]),"REPORT_TYPE_ID")] = "initial"'''
exec(eval_Str)
Reference:
It was already answered in stackoverflow. Evaluating dynamically generated statements in Python

Related

Panda: How to generate multiple columns in a loop?

The below code (calculation of moving average over N-days) works for well. But I want to replace other numbers (e.g., 5, 10, 20, etc.) with 50. Not sure if I can turn the below code into something in for loop. Could anybody please help me?
df['ma50pfret']= df['ret']
df.loc[df.adjp >= df.ma50, 'adjp > ma50']= 1
df.loc[df.adjp < df.ma50, 'adjp > ma50']= 0
df.iloc[0, -1]= 1
df['adjp > ma50']= df['adjp > ma50'].astype(int)
df.loc[df['adjp > ma50'].shift(1)== 0, 'ma50pfret']= 1.000079 # 1.02**(1/250)
df['cum_ma50pfret']=df['ma50pfret'].cumprod()
df.head(10)

ST2ST is throwing an error when loading lat/log related to decimal

It appears that there is an odd non-displayable character in the value coming from VZW. The hex a0 prefixes the latitude value, and this throws off the method to convert a string to a decimal in the StGeolocationConverter class.
This is the most stack trace for error, and a VZW record id that can be investigated.
10:47:25.872 (12891843489)|USER_DEBUG|[1323]|ERROR|Remote Key: a0C0H00001RnioyUAB
(strk.StSiteTrigger: execution of BeforeInsert
caused by: strk.StException: Error in packaged trigger handler[strk.StSiteExtendibleTriggerHandler]:SObjectType[strk__Site__c]:Event[before_insert]Message[Invalid decimal: 39.98636000]:
Class.strk.StGeolocationConverter.setLatLong: line 97, column 1
Class.strk.StSiteExtendibleTriggerHandler.beforeInsert: line 36, column 1
Class.strk.StTriggerFactory.execute: line 396, column 1
Class.strk.StTriggerFactory.createAndExecuteHandler: line 253, column 1
Class.strk.StTriggerFactory.createAndExecutePackagedHandler: line 221, column 1
Class.strk.StTriggerFactory.executeTrigger: line 168, column 1
Trigger.strk.StSiteTrigger: line 17, column 1
Class.strk.StTriggerFactory.executeTrigger: line 174, column 1
Trigger.strk.StSiteTrigger: line 17, column 1 / CANNOT_INSERT_UPDATE_ACTIVATE_ENTITY)[enter link description here][1]
Link to the zip file containing the logs: https://drive.google.com/open?id=1ddxspysuZq_8LNylUMlfshFBTy_YlYGy

On run 'example/sumo/grid.py'.FatalFlowError:'Not enough vehicles have spawned! Bad start?'

I want to simulate a jam simulation on the grid example,
So I try to increase the number of row and column or increase the number of num_cars_left/nums_cars_right/nums_cars_top/nums_cars_bot.
For example:
n_rows = 5
n_columns = 5
num_cars_left = 50
num_cars_right = 50
num_cars_top = 50
num_cars_bot = 50
So, then run it by command, there is an error:
Loading configuration... done.
Success.
Loading configuration... done.
Traceback (most recent call last):
File "examples/sumo/grid.py", line 237, in <module>
exp.run(1, 1500)
File "/home/dnl/flow/flow/core/experiment.py", line 118, in run
state = self.env.reset()
File "/home/dnl/flow/flow/envs/loop/loop_accel.py", line 167, in reset
obs = super().reset()
File "/home/dnl/flow/flow/envs/base_env.py", line 520, in reset
raise FatalFlowError(msg=msg)
flow.utils.exceptions.FatalFlowError:
Not enough vehicles have spawned! Bad start?
Missing vehicles / initial state:
- human_994: ('human', 'bot4_0', 0, 446, 0)
- human_546: ('human', 'top0_5', 0, 466, 0)
- human_886: ('human', 'bot3_0', 0, 366, 0)
- human_689: ('human', 'bot1_0', 0, 396, 0)
.....
And then I checked the 'flow/flow/envs/base_env.py'
There is a description of it:
# check to make sure all vehicles have been spawned
if len(self.initial_ids) > len(initial_ids):
missing_vehicles = list(set(self.initial_ids) - set(initial_ids))
msg = '\nNot enough vehicles have spawned! Bad start?\n' \
'Missing vehicles / initial state:\n'
for veh_id in missing_vehicles:
msg += '- {}: {}\n'.format(veh_id, self.initial_state[veh_id])
raise FatalFlowError(msg=msg)
So, my question is: if there is a limit number of rows, columns, nums_cars_left(right/bot/top) if I want to simulate a traffic jam on grid, how to do?
The grid example examples/sumo/grid.py doesn't use inflows by default,
instead it spawns the vehicles directly on the input edges. So if you increase the number of vehicles, you have to increase the size of the edges they spawn on. I tried your example and this setting works for me:
inner_length = 300
long_length = 500
short_length = 500
n_rows = 5
n_columns = 5
num_cars_left = 50
num_cars_right = 50
num_cars_top = 50
num_cars_bot = 50
The length of the edges the vehicles spawn on is short_length, it is the one you want to increase if the vehicles don't have enough room to be added.
Also, changing the number of rows and columns doesn't change anything because 50 vehicles will be added to each of them; so in this case you will have 20 input edges of each 50 vehicles, 1000 vehicles total, which will be quite laggy.
If you want to use continuous inflows instead of one-time spawning, have a look at the use_inflows parameter in the grid_example function in examples/sumo/grid.py, and what this parameter does when it's set to True.

PuLP - COIN-CBC error: How to add constraint with double inequality and relaxation?

I want to add this set of constraints:
-M(1-X_(i,j,k,n) )≤S_(i,j,k,n)-ToD_(i,j,k,n)≤M(1-X_(i,j,k,n) ) ∀i,j,k,n
Where M is a big number, S is a integer variable that takes values between 0 and 1440. ToD is a 4-dimensional matrix that takes values from an Excel sheet. X i dual variable, it takes as values 0-1.
I try to implement in code as following:
for n in range(L):
for k in range(M):
for i in range(N):
for j in range(N):
if (i != START_POINT_S & i != END_POINT_T & j != START_POINT_S & j != END_POINT_T):
prob += (-BIG_NUMBER*(1-X[i][j][k][n])) <= (S[i][j][k][n] - ToD[i][j][k][n]), ""
and another constraint as follows:
for i in range(N):
for j in range(N):
for k in range(M):
for n in range(L):
if (i != START_POINT_S & i != END_POINT_T & j != START_POINT_S & j != END_POINT_T):
prob += S[i][j][k][n] - ToD[i][j][k][n] <= BIG_NUMBER*(1-X[i][j][k][n]), ""
According to my experience, in code, those two constraints are totally equivalent to what we want. The problem is that PuLP and CBC won't accept them. The produce the following errors:
PuLP:
Traceback (most recent call last):
File "basic_JP.py", line 163, in <module>
prob.solve()
File "C:\Users\dimri\Desktop\Filesystem\Projects\deliverable_B4\lib\site-packa
ges\pulp\pulp.py", line 1643, in solve
status = solver.actualSolve(self, **kwargs)
File "C:\Users\dimri\Desktop\Filesystem\Projects\deliverable_B4\lib\site-packa
ges\pulp\solvers.py", line 1303, in actualSolve
return self.solve_CBC(lp, **kwargs)
File "C:\Users\dimri\Desktop\Filesystem\Projects\deliverable_B4\lib\site-packa
ges\pulp\solvers.py", line 1366, in solve_CBC
raise PulpSolverError("Pulp: Error while executing "+self.path)
pulp.solvers.PulpSolverError: Pulp: Error while executing C:\Users\dimri\Desktop
\Filesystem\Projects\deliverable_B4\lib\site-packages\pulp\solverdir\cbc\win\64\
cbc.exe
and CBC:
Welcome to the CBC MILP Solver
Version: 2.9.0
Build Date: Feb 12 2015
command line - C:\Users\dimri\Desktop\Filesystem\Projects\deliverable_B4\lib\sit
e-packages\pulp\solverdir\cbc\win\64\cbc.exe 5284-pulp.mps branch printingOption
s all solution 5284-pulp.sol (default strategy 1)
At line 2 NAME MODEL
At line 3 ROWS
At line 2055 COLUMNS
Duplicate row C0000019 at line 10707 < X0001454 C0000019 -1.000000000000e+
00 >
Duplicate row C0002049 at line 10708 < X0001454 C0002049 -1.000000000000e+
00 >
Duplicate row C0000009 at line 10709 < X0001454 C0000009 1.000000000000e+
00 >
Duplicate row C0001005 at line 10710 < X0001454 C0001005 1.000000000000e+
00 >
At line 14153 RHS
At line 16204 BOUNDS
Bad image at line 17659 < UP BND X0001454 1.440000000000e+03 >
At line 18231 ENDATA
Problem MODEL has 2050 rows, 2025 columns and 5968 elements
Coin0008I MODEL read with 5 errors
There were 5 errors on input
** Current model not valid
Option for printingOptions changed from normal to all
** Current model not valid
No match for 5284-pulp.sol - ? for list of commands
Total time (CPU seconds): 0.02 (Wallclock seconds): 0.02
I don't know what's the problem, any help? I am new to this, if information are not enough let me know what I should add.
Alright, I have searched for hours, but right after I posted this question I found the answer. These kinds of problems are mainly because of the names of the variables or the constraints. That is what caused something to duplicate. I am really not used to that kind of software that is why it took me so long to find and answer. Anyway, the problem for me was when I was defining the variables:
# define X[i,j,k,n]
lower_bound_X = 0 # lower bound for variable X
upper_bound_X = 1 # upper bound for variable X
X = LpVariable.dicts(name="X",
indexs=(range(N), range(N), range(M), range(L)),
lowBound=lower_bound_X,
upBound=upper_bound_X,
cat=LpInteger)
and
# define S[i,j,k,n]
lower_bound_S = 0 # lower bound for variable S
upper_bound_S = 1440 # upper bound for variable S
S = LpVariable.dicts(name="X",
indexs=(range(N),
range(N), range(M), range(L)),
lowBound=lower_bound_S,
upBound=upper_bound_S,
cat=LpInteger)
As you see in the definition of S I obviously forgot to change the name of the variable to S because I copy-pasted it. Anyway, the right way to define S is like this:
# define S[i,j,k,n]
lower_bound_S = 0 # lower bound for variable S
upper_bound_S = 1440 # upper bound for variable S
S = LpVariable.dicts(name="S",
indexs=(range(N), range(N), range(M), range(L)),
lowBound=lower_bound_S,
upBound=upper_bound_S,
cat=LpInteger)
This is how I got my code running.

Indentation of boxes in Format.fprintf

Please consider the function f:
open Format
let rec f i = match i with
| x when x <= 0 -> ()
| i ->
pp_open_hovbox std_formatter 2;
printf "This is line %d#." i;
f (i-1);
printf "This is line %d#." i;
close_box ();
()
It recursively opens hovboxes and prints something, followed by a newline hint (#.). When I call the f 3, i obtain the following output:
This is line 3
This is line 2
This is line 1
This is line 1
This is line 2
This is line 3
but I expected:
This is line 3
This is line 2
This is line 1
This is line 1
This is line 2
This is line 3
Can you explain why I obtain the first output and what I need to change to obtain the second one?
#. is not a newline hint, it is equivalent to print_newline which calls print_flush which closes all opened boxes and follows by a new line.
If you want to print line by line with Format you should open a vertical box with open_vbox and use print_cut ("#,") whenever you want to output a new line.
Instead of using #. you should use #\n specificator. The former, will flush the formatter and output a hard newline, actually breaking your pretty printing. It is intended to be used at the end of document, and, since it is not actually composable, I would warn against using it at all.
With #\n, you will get an output that is much closer to what you're expecting:
This is line 3
This is line 2
This is line 1
This is line 1
This is line 2
This is line 3
The same output, by the way, can be obtained by using vbox and emitting #; good break hints, that is better.