Odoo14 service keep crash (Dumping stacktrace of limit exceeding threads before reloading) - odoo

i face the following error
2022-05-30 15:00:26,943 1940 WARNING ? odoo.service.server: Server memory limit (4934283264) reached.
2022-05-30 15:00:26,954 1940 INFO ? odoo.service.server: Dumping stacktrace of limit exceeding threads before reloading
2022-05-30 15:00:26,997 1940 INFO ? odoo.tools.misc:
# Thread: <_MainThread(MainThread, started 140592199739200)> (db:n/a) (uid:n/a) (url:n/a)
File: "/opt/odoo/odoo14/odoo-bin", line 8, in <module>
odoo.cli.main()
File: "/opt/odoo/odoo14/odoo/cli/command.py", line 61, in main
o.run(args)
File: "/opt/odoo/odoo14/odoo/cli/server.py", line 178, in run
main(args)
File: "/opt/odoo/odoo14/odoo/cli/server.py", line 172, in main
rc = odoo.service.server.start(preload=preload, stop=stop)
File: "/opt/odoo/odoo14/odoo/service/server.py", line 1298, in start
rc = server.run(preload, stop)
File: "/opt/odoo/odoo14/odoo/service/server.py", line 546, in run
dumpstacks(thread_idents=[thread.ident for thread in self.limits_reached_threads])
File: "/opt/odoo/odoo14/odoo/tools/misc.py", line 957, in dumpstacks
for line in extract_stack(stack):
2022-05-30 15:00:27,007 1940 INFO ? odoo.service.server: Initiating server reload
and i tried several solutions like increase
limit_request = 8192
limit_time_cpu = 600
limit_time_real = 1200
max_cron_threads = 1
limit_memory_hard = 536870637100
limit_memory_soft = 483183573400
but still facing same issue as error log, also i try to run the server after 30 mins as maximum i got same error again & again..
Best Regards.

Brother take a look at this link Configuration suggestions for Odoo server

If you have a VPS with 4 CPU cores and 16 GB of RAM, the number of workers should be 9 (CPU cores * 2 + 1), total limit-memory-soft value will be 640MB x 9 = 5760 MB , and total limit-memory-hard 768MB x 9 = 6912 MB,
so Odoo will use maximum 5.4 GB of RAM.
You server is 4vCPUs so try the below in your config file:
limit_memory_hard = 640MB * 9 * 1024 * 1024 = 7247757312
limit_memory_soft = 768MB * 9 * 1024 * 1024 = 6039797760
max_cron_threads = 1
workers = 8

Related

Unable to load large pandas dataframe to pyspark

I've been trying to join two large pandas dataframes using pyspark using the following code. I'm trying to vary executor cores allocated for the application and measure scalability of pyspark (strong scaling).
r = 1000000000 # 1Bn rows
it = 10
w = 256
unique = 0.9
TOTAL_MEM = 240
TOTAL_NODES = 14
max_val = r * unique
rng = default_rng()
frame_data = rng.integers(0, max_val, size=(r, 2))
frame_data1 = rng.integers(0, max_val, size=(r, 2))
print(f"data generated", flush=True)
df_l = pd.DataFrame(frame_data).add_prefix("col")
df_r = pd.DataFrame(frame_data1).add_prefix("col")
print(f"data loaded", flush=True)
procs = int(math.ceil(w / TOTAL_NODES))
mem = int(TOTAL_MEM*0.9)
print(f"world sz {w} procs per worker {procs} mem {mem} iter {it}", flush=True)
spark = SparkSession\
.builder\
.appName(f'join {r} {w}')\
.master('spark://node:7077')\
.config('spark.executor.memory', f'{int(mem*0.6)}g')\
.config('spark.executor.pyspark.memory', f'{int(mem*0.4)}g')\
.config('spark.cores.max', w)\
.config('spark.driver.memory', '100g')\
.config('sspark.sql.execution.arrow.pyspark.enabled', 'true')\
.getOrCreate()
sdf0 = spark.createDataFrame(df_l).repartition(w).cache()
sdf1 = spark.createDataFrame(df_r).repartition(w).cache()
print(f"data loaded to spark", flush=True)
try:
for i in range(it):
t1 = time.time()
out = sdf0.join(sdf1, on='col0', how='inner')
count = out.count()
t2 = time.time()
print(f"timings {r} {w} {i} {(t2 - t1) * 1000:.0f} ms, {count}", flush=True)
del out
del count
gc.collect()
finally:
spark.stop()
Cluster:
I am using standalone spark cluster in a 15 node cluster with 48 cores and 240GB RAM each. I've spawned master and the driver code in node1, while other 14 nodes have spawned workers allocating maximum memory.
In the spark context, I am reserving 90% of total memory to executor, splitting 60% to jvm and 40% to pyspark.
Issue:
When I run the above program, I can see that the executors are being assigned to the app. But it doesn't move forward, even after 60 mins. For smaller row count (10M), this was working without a problem.
Driver output
world sz 256 procs per worker 19 mem 216 iter 8
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/08/26 14:52:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/N/u/d/dnperera/.conda/envs/cylonflow/lib/python3.8/site-packages/pyspark/sql/pandas/conversion.py:425: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below:
Negative initial size: -589934400
Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
warn(msg)
Any help on this is much appreciated.

ERROR: org.apache.spark.sql.execution.datasources.FileFormatWriter$.write

I am running on following config:
Cluster type: E64_v3 (1 driver + 3 workers)
other spark cnfigs:
spark.shuffle.io.connectionTimeout 1200s
spark.databricks.io.cache.maxMetaDataCache 40g
spark.rpc.askTimeout 1200s
spark.databricks.delta.snapshotPartitions 576
spark.databricks.optimizer.rangeJoin.binSize 256
spark.sql.inMemoryColumnarStorage.batchSize 10000
spark.sql.legacy.parquet.datetimeRebaseModeInWrite CORRECTED
spark.executor.cores 16
spark.executor.memory 54g
spark.rpc.lookupTimeout 1200s
spark.driver.maxResultSize 220g
spark.databricks.io.cache.enabled true
spark.rpc.io.backLog 256
spark.sql.shuffle.partitions 576
spark.network.timeout 1200s
spark.sql.inMemoryColumnarStorage.compressed true
spark.databricks.io.cache.maxDiskUsage 220g
spark.storage.blockManagerSlaveTimeoutMs 1200s
spark.executor.instances 12
spark.sql.windowExec.buffer.in.memory.threshold 524288
spark.executor.heartbeatInterval 100s
spark.default.parallelism 576
spark.core.connection.ack.wait.timeout 1200s
and this is my error stack:
---> 41 df.write.format("delta").mode("overwrite").save(path)
/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
825 self._jwrite.save()
826 else:
--> 827 self._jwrite.save(path)
Py4JJavaError: An error occurred while calling o784.save.
: org.apache.spark.SparkException: Job aborted.
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:230)
.
.
.
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 13 (execute at DeltaInvariantCheckerExec.scala:88) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Failed to connect to /10.179....
Any idea how to mitigate this?

How to deal with the error when using Gurobi with cvxpy :Unable to retrieve attribute 'BarIterCount'

How to deal with the error when using Gurobi with cvxpy :AttributeError: Unable to retrieve attribute 'BarIterCount'.
I have an Integer programming problem, using cvxpy and set gurobi as a solver.
When the number of variables is small, the result is ok. After the number of variables reaches a level of like 43*13*6, then the error occurred. I suppose it may be caused by the scale of the problem, in which the gurobi solver can not estimate the BarIterCount, which is the max Iterations needed.
Thus, I wonder, is there any way to manually set the BarItercount attribute of gurobi through the interface of the CVX? Or whether there exists another way to solve this problem?
Thanks for any suggestions you may provide for me.
The trace log is as follows:
If my model is small, like I set a number which indicates the scale of model as 3, then the program is ok. The trace is :
Using license file D:\software\lib\site-packages\gurobipy\gurobi.lic
Restricted license - for non-production use only - expires 2022-01-13
Parameter OutputFlag unchanged
Value: 1 Min: 0 Max: 1 Default: 1
D:\software\lib\site-packages\cvxpy\reductions\solvers\solving_chain.py:326: DeprecationWarning: Deprecated, use Model.addMConstr() instead
solver_opts, problem._solver_cache)
Changed value of parameter QCPDual to 1
Prev: 0 Min: 0 Max: 1 Default: 0
Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (win64)
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 126 rows, 370 columns and 2689 nonzeros
Model fingerprint: 0x70d49530
Variable types: 0 continuous, 370 integer (369 binary)
Coefficient statistics:
Matrix range [1e+00, 7e+00]
Objective range [1e+00, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 6e+00]
Found heuristic solution: objective 7.0000000
Presolve removed 4 rows and 90 columns
Presolve time: 0.01s
Presolved: 122 rows, 280 columns, 1882 nonzeros
Variable types: 0 continuous, 280 integer (279 binary)
Root relaxation: objective 4.307692e+00, 216 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 4.30769 0 49 7.00000 4.30769 38.5% - 0s
H 0 0 6.0000000 4.30769 28.2% - 0s
0 0 5.00000 0 35 6.00000 5.00000 16.7% - 0s
0 0 5.00000 0 37 6.00000 5.00000 16.7% - 0s
0 0 5.00000 0 7 6.00000 5.00000 16.7% - 0s
Cutting planes:
Gomory: 4
Cover: 9
MIR: 4
StrongCG: 1
GUB cover: 9
Zero half: 1
RLT: 1
Explored 1 nodes (849 simplex iterations) in 0.12 seconds
Thread count was 32 (of 32 available processors)
Solution count 2: 6 7
Optimal solution found (tolerance 1.00e-04)
Best objective 6.000000000000e+00, best bound 6.000000000000e+00, gap 0.0000%
If the number is 6, then error occurs:
-------------------------------------------------------
Using license file D:\software\lib\site-packages\gurobipy\gurobi.lic
Restricted license - for non-production use only - expires 2022-01-13
Parameter OutputFlag unchanged
Value: 1 Min: 0 Max: 1 Default: 1
D:\software\lib\site-packages\cvxpy\reductions\solvers\solving_chain.py:326: DeprecationWarning: Deprecated, use Model.addMConstr() instead
solver_opts, problem._solver_cache)
Changed value of parameter QCPDual to 1
Prev: 0 Min: 0 Max: 1 Default: 0
Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (win64)
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Traceback (most recent call last):
File "model.py", line 274, in <module>
problem.solve(solver=cp.GUROBI,verbose=True)
File "D:\software\lib\site-packages\cvxpy\problems\problem.py", line 396, in solve
return solve_func(self, *args, **kwargs)
File "D:\software\lib\site-packages\cvxpy\problems\problem.py", line 754, in _solve
self.unpack_results(solution, solving_chain, inverse_data)
File "D:\software\lib\site-packages\cvxpy\problems\problem.py", line 1058, in unpack_results
solution = chain.invert(solution, inverse_data)
File "D:\software\lib\site-packages\cvxpy\reductions\chain.py", line 79, in invert
solution = r.invert(solution, inv)
File "D:\software\lib\site-packages\cvxpy\reductions\solvers\qp_solvers\gurobi_qpif.py", line 59, in invert
s.NUM_ITERS: model.BarIterCount,
File "src\gurobipy\model.pxi", line 343, in gurobipy.gurobipy.Model.__getattr__
File "src\gurobipy\model.pxi", line 1842, in gurobipy.gurobipy.Model.getAttr
File "src\gurobipy\attrutil.pxi", line 100, in gurobipy.gurobipy.__getattr
AttributeError: Unable to retrieve attribute 'BarIterCount'
Hopefully this can provide more hint for solution.
BarIterCount is the number of barrier iterations performed to solve an LP. This is not a limit on the number of iterations and it should only be queried when the current optimization process has been finished. You cannot set this attribute either, of course.
To actually limit the number of iterations the barrier algorithm is allowed to take, you can use the parameter BarIterLimit.
Please inspect your log file for further information about the solver's behavior.

On run 'example/sumo/grid.py'.FatalFlowError:'Not enough vehicles have spawned! Bad start?'

I want to simulate a jam simulation on the grid example,
So I try to increase the number of row and column or increase the number of num_cars_left/nums_cars_right/nums_cars_top/nums_cars_bot.
For example:
n_rows = 5
n_columns = 5
num_cars_left = 50
num_cars_right = 50
num_cars_top = 50
num_cars_bot = 50
So, then run it by command, there is an error:
Loading configuration... done.
Success.
Loading configuration... done.
Traceback (most recent call last):
File "examples/sumo/grid.py", line 237, in <module>
exp.run(1, 1500)
File "/home/dnl/flow/flow/core/experiment.py", line 118, in run
state = self.env.reset()
File "/home/dnl/flow/flow/envs/loop/loop_accel.py", line 167, in reset
obs = super().reset()
File "/home/dnl/flow/flow/envs/base_env.py", line 520, in reset
raise FatalFlowError(msg=msg)
flow.utils.exceptions.FatalFlowError:
Not enough vehicles have spawned! Bad start?
Missing vehicles / initial state:
- human_994: ('human', 'bot4_0', 0, 446, 0)
- human_546: ('human', 'top0_5', 0, 466, 0)
- human_886: ('human', 'bot3_0', 0, 366, 0)
- human_689: ('human', 'bot1_0', 0, 396, 0)
.....
And then I checked the 'flow/flow/envs/base_env.py'
There is a description of it:
# check to make sure all vehicles have been spawned
if len(self.initial_ids) > len(initial_ids):
missing_vehicles = list(set(self.initial_ids) - set(initial_ids))
msg = '\nNot enough vehicles have spawned! Bad start?\n' \
'Missing vehicles / initial state:\n'
for veh_id in missing_vehicles:
msg += '- {}: {}\n'.format(veh_id, self.initial_state[veh_id])
raise FatalFlowError(msg=msg)
So, my question is: if there is a limit number of rows, columns, nums_cars_left(right/bot/top) if I want to simulate a traffic jam on grid, how to do?
The grid example examples/sumo/grid.py doesn't use inflows by default,
instead it spawns the vehicles directly on the input edges. So if you increase the number of vehicles, you have to increase the size of the edges they spawn on. I tried your example and this setting works for me:
inner_length = 300
long_length = 500
short_length = 500
n_rows = 5
n_columns = 5
num_cars_left = 50
num_cars_right = 50
num_cars_top = 50
num_cars_bot = 50
The length of the edges the vehicles spawn on is short_length, it is the one you want to increase if the vehicles don't have enough room to be added.
Also, changing the number of rows and columns doesn't change anything because 50 vehicles will be added to each of them; so in this case you will have 20 input edges of each 50 vehicles, 1000 vehicles total, which will be quite laggy.
If you want to use continuous inflows instead of one-time spawning, have a look at the use_inflows parameter in the grid_example function in examples/sumo/grid.py, and what this parameter does when it's set to True.

Apache httpd poll() takes 38 ms

I have an Apache (OHS) httpd process (1 out of 8 actually) talking to 2 web entry servers (WES), both on RedHat. Plotting the response times taken from the respective logfiles shows a constant delta of around 50 ms between both. Using strace strace -o <trace output> -ttT -s 2048 -f -xx -p <Pid> I found that in 85% of the requests (where encrypted transfer is involved) the httpd process is somehow stuck in poll(), which returns only after 38.something ms. The remaining 10 ms are mainly due to excessive gettimeofday() and other time() related system calls. The WES on the other hand claims he could send the data within below 100 µs but accuses "resource temporarily unavailable" via recvfrom() and now poll()s on his side for some 50 ms before recvfrom() finishes (with the confirmation of the data transfer from the Apache, I suppose).
WES:
40685 16:54:57.111496 poll([{fd=39, events=POLLOUT|POLLWRNORM}], 1, 300000) = 1 ([{fd=39, revents=POLLOUT|POLLWRNORM}]) <0.000071>
40685 16:54:57.111666 sendto(39, " <encrypted data> ) ", 1053, 0, NULL, 0) = 1053 <0.000067>
40685 16:54:57.112249 recvfrom(39, 0x7ff8380c0243, 5, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000061>
40685 16:54:57.112405 poll([{fd=39, events=POLLIN}], 1, 300000 <unfinished ...>
40685 16:54:57.165084 <... poll resumed> ) = 1 ([{fd=39, revents=POLLIN}]) <0.052659>
40685 16:54:57.165177 recvfrom(39, " ", 1, MSG_PEEK, NULL, NULL) = 1 <0.000114>
40685 16:54:57.165388 ioctl(39, FIONREAD, [205]) = 0 <0.000078>
Apache (OHS):
63195 16:54:57.145220 <... poll resumed> ) = 1 ([{fd=22, revents=POLLIN}]) <0.038408>
63195 16:54:57.145299 read(22, " <encrypted data> ", 8000) = 1053 <0.000018>
63195 16:54:57.145739 clock_gettime(CLOCK_REALTIME, {1435589697, 145769536}) = 0 <0.000018>
63195 16:54:57.145810 gettimeofday({1435589697, 145826}, NULL) = 0 <0.000025>
63195 16:54:57.145879 clock_gettime(CLOCK_REALTIME, {1435589697, 145902010}) = 0 <0.000021>
63195 16:54:57.145960 clock_gettime(CLOCK_REALTIME, {1435589697, 145986422}) = 0 <0.000017>
I have two questions:
Is the information in the trace output sufficient to identify the cause of the excessive poll() 38 ms (and obviously what is it then)?
What can be done to improve tracing in case 1) must be answered with no?