difference between pandas methods, data frame methods and how to distinguish between them - pandas

It has been a while I am confused between these and I would like to see if there is a way to easily distinguish between these in a practical and fast way.
assuming df is a pandas data frame object, please see below:
while using pandas, this is what I noticed. To access/perform some methods, you have to use pd.method(df,*args) sometimes. To access some other ones, you need to use df.method(*args). Interestingly, there are some methods that work either way ...
Let's clarify this a bit more with some examples: while it totally makes sense to me to use pd.read_csv (), not df.read_csv, since there is no df created yet, I have a hard time making sense of the following examples:
1- correct: pd.getdummies(df,*args) --- incorrect: df.getdummies(*args)
2- correct: df.groupby(*args) --- incorrect: pd.groupby(df,*args)
3- correct: df.isnull() AND pd.isnull(df)
I am pretty sure you can also come up with many other examples as above. I personally find this challenging to keep in mind which one is which and found myself wasting a lot of time in total code development/analysis cycle trying to guessing if I should use pd.method (df) or df.method() for different things.
My main question is: how do you guys handle this? did you also find this issue challenging? is there any way to quickly understand which one to use ? am I missing something here?
Thanks

Related

easiest way to figure out what waiver() is doing?

If one looks at the (e.g.) ggplot2::scale_y_continuous, the default value of many of the arguments is set to waiver(), e.g. for breaks:
‘waiver()’ for the default breaks computed by the
transformation object
How does one figure out/look at how these defaults are computed? Let's say I want to find the breaks for scale_y_log10(). ?scales::log10_trans doesn't say anything about computation of breakpoints.
I think log10_trans()$breaks might do it, which is the same as ?log_breaks. Not sure how to figure this out in general, though ...

seaborn from distplot to displot new input parameters

as Seaborn warned to prefer 'displot' to future deprecated 'distplot', I'm trying to change old codes. Unfortunately I find a bit hard finding corresponding parameters for several inputs. Just an example: below I start with the old 'distplot' code working:
c=np.random.normal(5,2,100)
sns.distplot(c,hist=True,kde=True,color='g',kde_kws={'color':'b','lw':2,'label':'Kde'},hist_kws={'color':'purple','alpha':0.8,
'histtype':'bar','edgecolor':'k'})
Now, I want to show the same result with 'displot' but I don't know how to put 'alpha' for histogram as well as all the 'hist_kws' stuff. Below how I started:
sns.displot(data=c,kind='hist',kde=True,facecolor='purple',edgecolor='k',color='b',
alpha=1,line_kws={'lw':2})
I'm looking for a better documentation but I didn't have luck so far

How to filter by tag in Jaeger

When trying to filter by tag, there is a small popup:
I have been looking for logfmt around, but all I can find is key=value format.
My questions are:
Is there a way for something more sophisticated? (starts_with, not equal, contains, etc)
I am trying to filter by url using http.url="http://example.com?bla=bla&foo=bar". I am pretty sure the value exists because I am copy/pasting from my trace. I am getting no results. Do I need to escape characters or do something else for this to work?
I did some research around logfmt as well. Based on the documentation of the original implementation and in the Python implementation of the parser (and respective tests), I would say that it doesn't support anything more sophisticated (like starts_with, not equal, contains). And this is because the output of the parser is a simple dictionary (with no regex involved in the values).
As for the second question, using the same mentioned Python parser, I was able to double-check that your filter looks fine:
from logfmt import parse_line
parse_line('http.url="http://example.com?bla=bla&foo=bar"')
Output:
{'http.url': 'http://example.com?bla=bla&foo=bar'}
This makes me suspect of an issue on the Jaeger side, but this is as far as I could go.

Is it acceptable to use `to` to create a `Pair`?

to is an infix function within the standard library. It can be used to create Pairs concisely:
0 to "hero"
in comparison with:
Pair(0, "hero")
Typically, it is used to initialize Maps concisely:
mapOf(0 to "hero", 1 to "one", 2 to "two")
However, there are other situations in which one needs to create a Pair. For instance:
"to be or not" to "be"
(0..10).map { it to it * it }
Is it acceptable, stylistically, to (ab)use to in this manner?
Just because some language features are provided does not mean they are better over certain things. A Pair can be used instead of to and vice versa. What becomes a real issue is that, does your code still remain simple, would it require some reader to read the previous story to understand the current one? In your last map example, it does not give a hint of what it's doing. Imagine someone reading { it to it * it}, they would be most likely confused. I would say this is an abuse.
to infix offer a nice syntactical sugar, IMHO it should be used in conjunction with a nicely named variable that tells the reader what this something to something is. For example:
val heroPair = Ironman to Spiderman //including a 'pair' in the variable name tells the story what 'to' is doing.
Or you could use scoping functions
(Ironman to Spiderman).let { heroPair -> }
I don't think there's an authoritative answer to this.  The only examples in the Kotlin docs are for creating simple constant maps with mapOf(), but there's no hint that to shouldn't be used elsewhere.
So it'll come down to a matter of personal taste…
For me, I'd be happy to use it anywhere it represents a mapping of some kind, so in a map{…} expression would seem clear to me, just as much as in a mapOf(…) list.  Though (as mentioned elsewhere) it's not often used in complex expressions, so I might use parentheses to keep the precedence clear, and/or simplify the expression so they're not needed.
Where it doesn't indicate a mapping, I'd be much more hesitant to use it.  For example, if you have a method that returns two values, it'd probably be clearer to use an explicit Pair.  (Though in that case, it'd be clearer still to define a simple data class for the return value.)
You asked for personal perspective so here is mine.
I found this syntax is a huge win for simple code, especial in reading code. Reading code with parenthesis, a lot of them, caused mental stress, imagine you have to review/read thousand lines of code a day ;(

SDP solver output (SDPA, CSDP ...)

i want to compute a table of SDP-solutions. I create a bash file that calls an SDP-solver (SDPA or CSDP) for different data sets:
problem1.dat-s
problem2.dat-s
...
Because i want to create a table of numbers, i dont want the whole output like iterations etc. Is there a way to avoid these messages? Or even better, a way to create one solution-set-file of the data sets?
Thanks, dalvo
It's a while now, since this questions was asked, maybe you have found an answer yourself by now. If not, try calling
csdp problem1.dat-s problem1.sol > NUL
csdp problem2.dat-s problem2.sol > NUL
...
This way you'll get your solutions written to a solution file. With CSDP you'll have one vector and two matrices. Reading these files, you can easily create any other set of solutions. The information written to stdout are useless, if you're just looking for the solution, since you'll only find the error values and messages and measures of time. So redirecting stdout to NUL will help you avoid these informations.
I don't know, how this would actually work with SDPA, but considering the information found on the man-pages, it should be the same there.