Maximum values along axis of Numpy ndarray? - numpy

I'm afraid that I can't describe the problem so I draw a sketch of it.Anyway,what I need is to find the max values along the 0th axis in a numpy ndarray,i.e.array.shape(5,5,3), and their corresponding "layer numbers", and use the "layer numbers" to create a new 2d array with shape of (1,5,3).Hope I'm giving a clear description here..thanks a lot.

If you check the documentation of np.max, you'll see it takes an axis argument:
a.max(axis=0)
But that won't help you yet. However, there's a function argmax that gives you the indices of the maxima along a given axis:
a.argmax(axis=...)
So, let's find your first (5,5) array: it's a[...,0]. You can find the position of the maxima per rows (or columns) with a[...,0].max(axis=1) (or 0), and use that to find the values on the other sides.

Related

How to fill the different elements of a matrix in python

I want to have a 2D Matrix and then fill the elements of this matrix with different values. I know that I need to create a matrix first with the following definition:
Matrix = np.zeros(10,10)
Now my question is how I can fill of the elements of this matrix by a value lets say the element of [4][7] with value of 5. Thanks
Be careful, because the right sintax for a 10x10 matrix filled by zeros is Matrix = np.zeros((10,10)). Then you can simply write in a different line Matrix[4][7] = 5. I advice you to read a tutorial or a introductory book on Python.

Finding the number of times a line will intersect with other lines on the plot

Using
plt.plot(x[i:i+2], y[i:i+2], 'ro-')
to create some line segments:
If one plots x = 0.6, is there a matplotlib built in method of finding the number of times it will intersect with lines that have already been plotted on the graph?
For a given segment where you know x[2i] and x[2i+1], you have an intersection if your given x falls in between. The best way to check is to compute (x-x[2*i])*(x-x[2*i+1]). If less than zero, you have an intersection. If equal to zero, one of the end points is on your x=0.6 line. If greater than zero it means that the ends of the segment are on the same side of the line, so no intersection.
To program this, assuming that x is a numpy array
prod=(0.6-x[::2])*(0.6-x[1::2])
And the number of intersections is len(numpy.where(prod>=0)[0])

Interpreting the Y values of a normal distribution

I've written this code to generate a normal distribution of a set of values 1,2,3 :
import pandas as pd
import random
import numpy as np
df = pd.DataFrame({'col1':[1,2,3]})
print(df)
fig, ax = plt.subplots(1,1)
df.plot(kind='hist', normed=True, ax=ax)
Returns :
The X values are the range of possible values but how are the Y values interpreted ?
Reading http://www.stat.yale.edu/Courses/1997-98/101/normal.htm the Y value is calculated using :
A normal distribution has a bell-shaped density curve described by its
mean and standard deviation . The density curve is symmetrical,
centered about its mean, with its spread determined by its standard
deviation. The height of a normal density curve at a given point x is
given by
What is the meaning of this formula ?
I think you are confusing two concepts here. A histogram will just plot how many times a certain value appears. So for your list of [1,2,3], the value 1 will appear once and the same for 2 and 3. If you would have set Normed=False you would get the plot you have now with a height of 1.0.
However, when you set Normed=True, you will turn on normalization. Note that this does not have anything to do with a normal distribution. Have a look at the documentation for hist, which you can find here: http://matplotlib.org/api/pyplot_api.html?highlight=hist#matplotlib.pyplot.hist
There you see that what the option Normed does, which is:
If True, the first element of the return tuple will be the counts normalized to form a probability density, i.e., n/(len(x)`dbin), i.e., the integral of the histogram will sum to 1. If stacked is also True, the sum of the histograms is normalized to 1.
So it gives you the formula right there. So in your case, you have three points, i.e. len(x)=3. If you look at your plot you can see that your bins have a width of 0.2 so dbin=0.2. Each value appears only once for for both 1, 2, and 3, you will have n=1. Thus the height of your bars should be 1/(3*0.2) = 1.67, which is exactly what you see in your histogram.
Now for the normal distribution, that is just a specific probability function that is defined as the formula you gave. It is useful in many fields as it relates to uncertainties. You'll see it a lot in statistics for example. The Wikipedia article on it has lots of info.
If want to generate a list of values that conform to a normal distribution, I would suggest reading the documentation of numpy.random.normal which will do this for you: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.normal.html

What exactly do the whiskers in pandas' boxplots specify?

In python-pandas boxplots with default settings, the red bar is the mean median, and the box signifies the 25th and 75th quartiles, but what exactly do the whiskers mean in this case? Where is the documentation to figure out the exact definition (couldn't find it)?
Example code:
df.boxplot()
Example result:
Pandas just wraps the boxplot function from matplotlib. The matplotlib docs have the definition of the whiskers in detail:
whis : float, sequence, or string (default = 1.5)
As a float, determines the reach of the whiskers to the beyond the
first and third quartiles. In other words, where IQR is the
interquartile range (Q3-Q1), the upper whisker will extend to last
datum less than Q3 + whis*IQR). Similarly, the lower whisker will
extend to the first datum greater than Q1 - whis*IQR. Beyond the
whiskers, data are considered outliers and are plotted as individual
points.
Matplotlib (and Pandas) also gives you a lot of options to change this default definition of the whiskers:
Set this to an unreasonably high value to force the whiskers to show
the min and max values. Alternatively, set this to an ascending
sequence of percentile (e.g., [5, 95]) to set the whiskers at specific
percentiles of the data. Finally, whis can be the string 'range' to
force the whiskers to the min and max of the data.
Below a graphic that illustrates this from a stats.stackexchange answer. Note that k=1.5 if you don't supply the whis keyword in Pandas.
From Amelio Vazquez-Reina's answer in Boxplots in matplotlib: Markers and outliers:
The outliers (the + markers in the boxplot) are simply points outside of the wide [(Q1-1.5 IQR), (Q3+1.5 IQR)] margin below.
FYI: Confused by location of fences in box-whisker plots
You mention in your question that the red line is the mean - it is actually the median.
From the matplotlib link mentioned by Chang She above:
The box extends from the lower to upper quartile values of the data,
with a line at the median. The whiskers extend from the box to show
the range of the data. Flier points are those past the end of the
whiskers.
I didn't experiment, but there is a 'meanline' option which might put the line at the mean.
These are specified in the matplotlib documentation. The whiskers are some multiple (1.5 by default) of the interquartile range.

How to change text of y-axes on a matplotlib generated picture

The page is
"http://matplotlib.sourceforge.net/examples/pylab_examples/histogram_demo_extended.html"
Let's look at the y-axis, the numbers there do not make any sense, could we change it to something else that is meaningful?
Except the cumulative distribution plot, and the last one, the rest of the y-axes data show normalized histogram values with normed=1 keyword set (i.e., the are underneath the histogram equals to 1 as in the definition of a probability density function (PDF))
You can use yticks(), see this example.