debugger throws error, running script doesn't - pandas boxplot - pandas

Title is pretty self explanatory.
Here is a minimal reproducible example (just make a .xlsx file with a column id, a column nb_inf and another called Grantham and some data (integers)).
import matplotlib.pyplot as plt
import pandas as pd
def loader() -> pd.DataFrame:
df = pd.read_excel("your_file.xlsx", "Feuil1")
df = df.set_index('id')
return df
if __name__ == "__main__":
df: pd.DataFrame = loader()
for column in df.columns:
if "Grantham" in column:
print(column)
df.boxplot(column=column, by='nb_inf', figsize=(5, 6))
plt.savefig(f"boxplots/{column}.png")
plt.close()
Running it through the Run command works perfectly well. But running it with the debugger raises the error TypeError: 'NoneType' object is not callable.
I'm using Python 3.10.2 and PyCharm 2022.1.3 (Community Edition)
More details about my PyCharm build:
Build #PC-221.5921.27, built on June 21, 2022
Runtime version: 11.0.15+10-b2043.56 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o.
Windows 11 10.0
GC: G1 Young Generation, G1 Old Generation
Memory: 2030M
Cores: 16
Non-Bundled Plugins:
com.chesterccw.excelreader (2022.1.3)

Works in VSCode, both in debug and standard execution mode.
Have you considered switching to a better IDE than PyCharm such as VSCode?
On a more serious note, check out your debugging default option in PyCharm; it's likely there is an option you don't want there...

Related

Different behaviour of dataclass default_factory to generate list

I'm quite new to Python so please have me excused if this question contain some newbie misunderstandings, but I've failed to google the answer for this:
On my personal laptop running Python 3.9.7 on Windows 11 this code is working without errors.
from dataclasses import dataclass, field
#dataclass
class SomeDataClass:
somelist: list[str] = field(default_factory=lambda:['foo', 'bar'])
if __name__ == '__main__':
instance = SomeDataClass()
print(instance)
But when at work running Python 3.8.5 on Windows 10 I get the following error:
File "c:\...\test_dataclass.py", line 13, in SomeDataClass
somelist: list[str] = field(default_factory=lambda:['foo', 'bar'])
TypeError: 'type' object is not subscriptable
I'd like to understand why this behaves differently and what I could do to make it work.
I would expect dataclasses to behave similarly on both computers.
You have already intuited the reason: this is a new feature in version 3.9. You can see it in the What's New article for 3.9 here.
This feature is available in version 3.8 as well, but it is not enabled by default. You can enable it in your code by including this import:
from __future__ import annotations

Spark: How to debug pandas-UDF in VS Code

I'm looking for a way to debug spark pandas UDF in vscode and Pycharm Community version (place breakpoint and stop inside UDF). At the moment when breakpoint is placed inside UDF debugger doesn't stop.
In the reference below there is described Local mode and Distributed mode.
I'm trying at least to debug in Local mode. Pycharm/VS Code there should be a way to debug local enc by "Attach to Local Process". Just I can not figure out how.
At the moment I can not find any answer how to attach pyspark debugger to local process inside UDF in VS Code(my dev ide).
I found only examples below in Pycharm.
Attache to local process How can PySpark be called in debug mode?
When I try to attach to process I'm getting message below in Pycharm. In VS Code I'm getting msg that process can not be attached.
Attaching to a process with PID=33,692
/home/usr_name/anaconda3/envs/yf/bin/python3.8 /snap/pycharm-community/223/plugins/python-ce/helpers/pydev/pydevd_attach_to_process/attach_pydevd.py --port 40717 --pid 33692
WARNING: The 'kernel.yama.ptrace_scope' parameter value is not 0, attach to process may not work correctly.
Please run 'sudo sysctl kernel.yama.ptrace_scope=0' to change the value temporary
or add the 'kernel.yama.ptrace_scope = 0' line to /etc/sysctl.d/10-ptrace.conf to set it permanently.
Process finished with exit code 0
Server stopped.
pyspark_xray https://github.com/bradyjiang/pyspark_xray
With this package, it is possible to debug rdds running on worker, but I was not able to adjust package to debug UDFs
Example code, breakpoint doesn't stop inside UDF pandas_function(url_json):
import pandas as pd
import pyspark
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, IntegerType,StringType
spark = pyspark.sql.SparkSession.builder.appName("test") \
.master('local[*]') \
.getOrCreate()
sc = spark.sparkContext
# Create initial dataframe respond_sdf
d_list = [('api_1',"{'api': ['api_1', 'api_1', 'api_1'],'A': [1,2,3], 'B': [4,5,6] }"),
(' api_2', "{'api': ['api_2', 'api_2', 'api_2'],'A': [7,8,9], 'B': [10,11,12] }")]
schema = StructType([
StructField('url', StringType(), True),
StructField('content', StringType(), True)
])
jsons = sc.parallelize(rdd_list)
respond_sdf = spark.createDataFrame(jsons, schema)
# Pandas UDF
def pandas_function(url_json):
# Here I want to place breakpoint
df = pd.DataFrame(eval(url_json['content'][0]))
return df
# Pnadas UDF transformation applied to respond_sdf
respond_sdf.groupby(F.monotonically_increasing_id()).applyInPandas(pandas_function, schema=schema).show()
This example demonstrates how to use excellent pyspark_exray library to step into UDF functions passed into Dataframe.mapInPandas function
https://github.com/bradyjiang/pyspark_xray/blob/master/demo_app02/driver.py

Using matplotlib in SublimeREPL : python interpreter stop fonctionning after "plt.show()"

I'm using sublime text 3 with REPL in python. My version of Python is python 3.5#64bits. It's not the Anaconda distribution, but a standalone version. I have been using pandas and numpy for a while without any troubles.
I am having trouble plotting with Matplotlib : my interpreter stop running after having call the show() method on a figure.
Using this code snippet, from the matplotlib documentation :
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
plt.show()
I get the following result. The second print("test") is never evaluated, and i have to restart REPL even if i close the plot's window.
My REPL settings are the following :
{
"cmd": ["C:\\Documents \\Sublime \\python-3.5.3.amd64\\python.exe", "-i", "$file"],
"file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
"selector": "source.python",
"shell": true
}
I have so far tried to :
Switch matplotlib backends
According to matplotib documentation : I have switched to the interactive mode with ion() method / with the matplot.interactive(True)
Reinstall matplotlib
Use matplotilb wit bloc=False param
Does anyone know how to get the interpreter working after having plot ? Thanks in advance.

Is there any 2D plotting library compatible with pypy?

I am a heavy user of jupyter notebook and, lately, I am running it using pypy instead of python to get extra speed. It works perfectly but I am missing matplotlib so much. Is there any decent 2D plotting library compatible with pypy and jupyter notebook? I don't need fancy stuff, scatter, line and bar plots would be more than enough.
Bokeh is working fairly good with pypy. The only problem I have encountered is linked to the use of numpy.datetime64 that is not yet supported by pypy. Fortunately it is enough to monkey-patch bokeh/core/properties.py and bokeh/util/serialization.py to pass in case of datetime64 reference.
I did it in this way:
bokeh/core/properties.py
...
try:
import numpy as np
datetime_types += (np.datetime64,)
except:
pass
...
and
bokeh/util/serialization.py
...
# Check for astype failures (putative Numpy < 1.7)
try:
dt2001 = np.datetime64('2001')
legacy_datetime64 = (dt2001.astype('int64') ==
dt2001.astype('datetime64[ms]').astype('int64'))
except:
legacy_datetime64 = False
pass
...
And managed to get nice looking plots in jupyter using pypy.

Matplotilib, Ipython and inline plotting

I would like to use inline plotting in the ipython notebook, i.e.
%matplotlib inline
import matplotlib.pyplot as plt
x = np.arange(100)
plot(x, x**2)
should show an image.
Yet I only get the following message:
lib/python2.7/site-packages/IPython/core/formatters.py:239: FormatterWarning: Exception in image/png formatter: Could not create write struct
FormatterWarning,
What could be the reason for this?
matplotlib==1.3.1 and ipython==2.1.0
Does it say anything on the terminal (i.e. the server)?
My guess is that this is most probably due to some libpng incompatibility issues. If you are running this on OS X, the following discussion may help:
libpng version incompatibility in fresh installation of IPython
Even if you aren't running OS X, similar situations may occur, if you have several copies of libpng floating around.