seaborn from distplot to displot new input parameters - input

as Seaborn warned to prefer 'displot' to future deprecated 'distplot', I'm trying to change old codes. Unfortunately I find a bit hard finding corresponding parameters for several inputs. Just an example: below I start with the old 'distplot' code working:
c=np.random.normal(5,2,100)
sns.distplot(c,hist=True,kde=True,color='g',kde_kws={'color':'b','lw':2,'label':'Kde'},hist_kws={'color':'purple','alpha':0.8,
'histtype':'bar','edgecolor':'k'})
Now, I want to show the same result with 'displot' but I don't know how to put 'alpha' for histogram as well as all the 'hist_kws' stuff. Below how I started:
sns.displot(data=c,kind='hist',kde=True,facecolor='purple',edgecolor='k',color='b',
alpha=1,line_kws={'lw':2})
I'm looking for a better documentation but I didn't have luck so far

Related

difference between pandas methods, data frame methods and how to distinguish between them

It has been a while I am confused between these and I would like to see if there is a way to easily distinguish between these in a practical and fast way.
assuming df is a pandas data frame object, please see below:
while using pandas, this is what I noticed. To access/perform some methods, you have to use pd.method(df,*args) sometimes. To access some other ones, you need to use df.method(*args). Interestingly, there are some methods that work either way ...
Let's clarify this a bit more with some examples: while it totally makes sense to me to use pd.read_csv (), not df.read_csv, since there is no df created yet, I have a hard time making sense of the following examples:
1- correct: pd.getdummies(df,*args) --- incorrect: df.getdummies(*args)
2- correct: df.groupby(*args) --- incorrect: pd.groupby(df,*args)
3- correct: df.isnull() AND pd.isnull(df)
I am pretty sure you can also come up with many other examples as above. I personally find this challenging to keep in mind which one is which and found myself wasting a lot of time in total code development/analysis cycle trying to guessing if I should use pd.method (df) or df.method() for different things.
My main question is: how do you guys handle this? did you also find this issue challenging? is there any way to quickly understand which one to use ? am I missing something here?
Thanks

How to filter by tag in Jaeger

When trying to filter by tag, there is a small popup:
I have been looking for logfmt around, but all I can find is key=value format.
My questions are:
Is there a way for something more sophisticated? (starts_with, not equal, contains, etc)
I am trying to filter by url using http.url="http://example.com?bla=bla&foo=bar". I am pretty sure the value exists because I am copy/pasting from my trace. I am getting no results. Do I need to escape characters or do something else for this to work?
I did some research around logfmt as well. Based on the documentation of the original implementation and in the Python implementation of the parser (and respective tests), I would say that it doesn't support anything more sophisticated (like starts_with, not equal, contains). And this is because the output of the parser is a simple dictionary (with no regex involved in the values).
As for the second question, using the same mentioned Python parser, I was able to double-check that your filter looks fine:
from logfmt import parse_line
parse_line('http.url="http://example.com?bla=bla&foo=bar"')
Output:
{'http.url': 'http://example.com?bla=bla&foo=bar'}
This makes me suspect of an issue on the Jaeger side, but this is as far as I could go.

What is the difference between `matplotlib.rc` and `matplotlib.rcParams`? And which one to use?

I have been using matplotlib.rc in my scripts to preprocess my plots. But recently I have realized that using matplotlib.rcParams is much easier before doing a quick plot interactively (e.g. via IPython). This got me into thinking what difference between the two is.
I searched the matplotlib documentation wherein no clear answer was provided in this regard. Moreover, when I issue type(matplotlib.rc), the interpreter says that it is a function. On the other hand, when I issue type(matplotlib.rcParams), I am told that it is a class object. These two answers are not at all helpful and hence I would appreciate some help differentiating the two.
Additionally, I would like to know which one to prefer over the other.
Thanks in advance.
P.S. I went through this question: What's the difference between matplotlib.rc and matplotlib.pyplot.rc? but the answers are specific to the difference between the matplotlib instance and the pyplot instance of the two types I am enquiring about and, hence, is also not that helpful.
matplotlib.rc is a function that updates matplotlib.rcParams.
matplotlib.rcParams is a dict-subclass that provides a validate key-value map for Matplotlib configuration.
The docs for mpl.rc are at https://matplotlib.org/stable/api/matplotlib_configuration_api.html?highlight=rc#matplotlib.rc and the code is here.
The class definition of RcParams is here and it the instance is created here.
If we look at the guts of matplotlib.rc we see:
for g in group:
for k, v in kwargs.items():
name = aliases.get(k) or k
key = '%s.%s' % (g, name)
try:
rcParams[key] = v
except KeyError as err:
raise KeyError(('Unrecognized key "%s" for group "%s" and '
'name "%s"') % (key, g, name)) from err
where we see that matplotlib.rc does indeed update matplotlib.rcParams (after doing some string formatting).
You should use which ever one is more convenient for you. If you know exactly what key you want to update, then interacting with the dict-like is better, if you want to set a whole bunch of values in a group then mpl.rc is likely better!

Extracting Data from an Area file

I am trying to extract information at a specific location (lat,lon) from different satellite images. These images are were given to me in the AREA format and I cooked up a simple jython script to extract temperature values like so.
While the script works, here is small snippet from it that prints out the data value at a point.
from edu.wisc.ssec.mcidas import AreaFile as af
url="adde://localhost/imagedata?&PORT=8113&COMPRESS=gzip&USER=idv&PROJ=0& VERSION=1&DEBUG=false&TRACE=0&GROUP=FL&DESCRIPTOR=8712C574&BAND=2&LATLON=29.7276 -85.0274 E&PLACE=ULEFT&SIZE=1 1&UNIT=TEMP&MAG=1 1&SPAC=4&NAV=X&AUX=YES&DOC=X&DAY=2012002 2012002&TIME=&POS=0&TRACK=0"
a=af(url);
value=a.getData();
print value
array([[I, [array([I, [array('i', [2826, 2833, 2841, 2853])])])
So what does this mean?
Please excuse me if the question seems trivial, while I am comfortable with python I am really new to dealing with scientific data.
Note
Here is a link to the entire script.
After asking around, I found out that the Area objects returns data in multiples of four. So the very first value is what I am looking for.
Grabbing the value is as simple as :
ar[0][0][0]

discover degree with pydot?

After a little work with pygraphviz I've returned to pydot. One of the useful methods in pygraphviz is iterdegree(). Can something analogous be done with pydot? ie: find the highest degree node so that I can set it as root?
jjc
No answer after a year and a half? I don't think there is a way with Pydot without writing some code.
But you could use NetworkX with the networkx.from_pydot() function to convert to a NetworkX graph object and then call the degree() method.