I'm having a problem with matplotlib. I'm using the linux Chrome OS distro. When I try to creat a graph, it doesn't show me anything, just this:
Graph
This is the script:
import matplotlib.pyplot as plt
x = [10,20,30,40,50,60]
bin = [10]
plt.hist(x,bin)
plt.show()
My IDE is Vim
This happened because you supplied the wrong argument to plt.hist. According to the documentation:
bins : int or sequence or str, default: :rc:`hist.bins`
If *bins* is an integer, it defines the number of equal-width bins
in the range.
If *bins* is a sequence, it defines the bin edges, including the
left edge of the first bin and the right edge of the last bin;
in this case, bins may be unequally spaced. All but the last
(righthand-most) bin is half-open. In other words, if *bins* is::
[1, 2, 3, 4]
then the first bin is ``[1, 2)`` (including 1, but excluding 2) and
the second ``[2, 3)``. The last bin, however, is ``[3, 4]``, which
*includes* 4.
If *bins* is a string, it is one of the binning strategies
supported by `numpy.histogram_bin_edges`: 'auto', 'fd', 'doane',
'scott', 'stone', 'rice', 'sturges', or 'sqrt'.
So by supplying [10], you haven't supplied the end of a bin, only the start, and nothing is generated. You can fix this with plt.hist(x,10) or plt.hist(x, [10, 20, 30 <rest>]).
In addition, try to avoid overwriting builtin functions like bin. It won't affect you now, but might haunt you later.
Related
My goal is to create a stratigraphic column (colored stacked rectangles) using matplotlib like the example below.
Data is in this format:
depth = [1,2,3,4,5,6,7,8,9,10] #depth (feet) below ground surface
lithotype = [4,4,4,5,5,5,6,6,6,2] #lithology type. 4 = clay, 6 = sand, 2 = silt
I tried matplotlib.patches.Rectangle but it's cumbersome. Wondering if someone has another suggestion.
Imho using Rectangle is not so difficult nor cumbersome.
from numpy import ones
from matplotlib.pyplot import show, subplots
from matplotlib.cm import get_cmap
from matplotlib.patches import Rectangle as r
# a simplification is to use, for the lithology types, a qualitative colormap
# here I use Paired, but other qualitative colormaps are displayed in
# https://matplotlib.org/stable/tutorials/colors/colormaps.html#qualitative
qcm = get_cmap('Paired')
# the data, augmented with type descriptions
# note that depths start from zero
depth = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # depth (feet) below ground surface
lithotype = [4, 4, 4, 5, 5, 5, 6, 1, 6, 2] # lithology type.
types = {1:'swiss cheese', 2:'silt', 4:'clay', 5:'silty sand', 6:'sand'}
# prepare the figure
fig, ax = subplots(figsize = (4, 8))
w = 2 # a conventional width, used to size the x-axis and the rectangles
ax.set(xlim=(0,2), xticks=[]) # size the x-axis, no x ticks
ax.set_ylim(ymin=0, ymax=depth[-1])
ax.invert_yaxis()
fig.suptitle('Soil Behaviour Type')
fig.subplots_adjust(right=0.5)
# plot a series of dots, that eventually will be covered by the Rectangle\s
# so that we can draw a legend
for lt in set(lithotype):
ax.scatter(lt, depth[1], color=qcm(lt), label=types[lt], zorder=0)
fig.legend(loc='center right')
ax.plot((1,1), (0,depth[-1]), lw=0)
# do the rectangles
for d0, d1, lt in zip(depth, depth[1:], lithotype):
ax.add_patch(
r( (0, d0), # coordinates of upper left corner
2, d1-d0, # conventional width on x, thickness of the layer
facecolor=qcm(lt), edgecolor='k'))
# That's all, folks!
show()
As you can see, placing the rectangles is not complicated, what is indeed cumbersome is to properly prepare the Figure and the Axes.
I know that I omitted part of the qualifying details from my solution, but I hope these omissions won't stop you from profiting from my answer.
I made a package called striplog for handling this sort of data and making these kinds of plots.
The tool can read CSV, LAS, and other formats directly (if the format is rather particular), but we can also construct a Striplog object manually. First let's set up the basic data:
depth = [1,2,3,4,5,6,7,8,9,10]
lithotype = [4,4,4,5,5,5,6,6,6,2]
KEY = {2: 'silt', 4: 'clay', 5: 'mud', 6: 'sand'}
Now you need to know that a Striplog is composed of Interval objects, each of which can have one or more Component elements:
from striplog import Striplog, Component, Interval
intervals = []
for top, base, lith in zip(depth, depth[1:], lithotype):
comp = Component({'lithology': KEY[lith]})
iv = Interval(top, base, components=[comp])
intervals.append(iv)
s = Striplog(intervals).merge_neighbours() # Merge like with like.
This results in Striplog(3 Intervals, start=1.0, stop=10.0). Now we'd like to make a plot using an appropriate Legend object.
from striplog import Legend
legend_csv = u"""colour, width, component lithology
#F7E9A6, 3, Sand
#A68374, 2.5, Silt
#99994A, 2, Mud
#666666, 1, Clay"""
legend = Legend.from_csv(text=legend_csv)
s.plot(legend=legend, aspect=2, label='lithology')
Which gives:
Admittedly the plotting is a little limited, but it's just matplotlib so you can always add more code. To be honest, if I were to build this tool today, I think I'd probably leave the plotting out entirely; it's often easier for the user to do their own thing.
Why go to all this trouble? Fair question. striplog lets you merge zones, make thickness or lithology histograms, make queries ("show me sandstone beds thicker than 2 m"), make 'flags', export LAS or CSV, and even do Markov chain sequence analysis. But even if it's not what you're looking for, maybe you can recycle some of the plotting code! Good luck.
Given I have the number of axes, can I specify the number of axes to the type hint npt.NDArray (from import numpy.typing as npt)
i.e. if I know it is a 3D array, how can I do npt.NDArray[3, np.float64]
On Python 3.9 and 3.10 the following does the job for me:
data = [[1, 2, 3], [4, 5, 6]]
arr: np.ndarray[Tuple[Literal[2], Literal[3]], np.dtype[np.int_]] = np.array(data)
It is a bit cumbersome, but you might follow numpy issue #16544 for future development on easier specification.
In particular, for now you must declare the full shape and can't only declare the rank of the array.
In the future something like ndarray[Shape[:, :, :], dtype] should be available.
The Problem
Good evening.
I am learning about the Central Limit Theorem. As practice, I ran simulations in an attempt to find the mean of a fair die (I know, a toy problem).
I took 4000 samples, and in each sample I rolled a die 50 times (screenshot of the code at the bottom). For each of these 4000 samples I computed the mean. Then, I plotted these 4000 sample means in a histogram (with bin size 0.03) using matplotlib.
Here is the result:
Question
Why aren't the sample means normally distributed given that the conditions for CLT (sample size >= 30) were respected?
Specifically, why does the histogram look like two normal distributions superimposed on top of each other? More intriguingly, why does the "outer" distribution look "discrete" with empty spaces occurring at regular intervals?
It almost seems like the result is off in a systematic way.
All help is greatly appreciated. I am very lost.
Supplementary Code
The code I used to generate the 4000 sample means.
"""
Take multiple samples of dice rolls. For
each sample, compute the sample mean.
With the sample means, plot a histogram.
By the Central Limit Theorem, the sample
means should be normally distributed.
"""
sample_means = []
num_samples = 4000
for i in range(num_samples):
# Large enough for CLT to hold
num_rolls = 50
sample = []
for j in range(num_rolls):
observation = random.randint(1, 6)
sample.append(observation)
sample_mean = sum(sample) / len(sample)
sample_means.append(sample_mean)
When num_rolls equals 50, each possible mean will be a fraction with denominator 50. So, in reality, you are looking at a discrete distribution.
To create a histogram of a discrete distribution, the bin boundaries are best placed nicely in-between the values. Using a step size of 0.03, some bin boundaries will coincide with the values, putting the double of values into one bin compared to its neighbor. Moreover, due to subtle floating point rounding problems, the result can become unpredictable when values and boundaries coincide.
Here is some code to illustrate what is going on:
from matplotlib import pyplot as plt
import numpy as np
import random
sample_means = []
num_samples = 4000
for i in range(num_samples):
num_rolls = 50
sample = []
for j in range(num_rolls):
observation = random.randint(1, 6)
sample.append(observation)
sample_mean = sum(sample) / len(sample)
sample_means.append(sample_mean)
fig, axs = plt.subplots(2, 2, figsize=(14, 8))
random_y = np.random.rand(len(sample_means))
for (ax0, ax1), step in zip(axs, [0.03, 0.02]):
bins = np.arange(3.01, 4, step)
ax0.hist(sample_means, bins=bins)
ax0.set_title(f'step={step}')
ax0.vlines(bins, 0, ax0.get_ylim()[1], ls=':', color='r') # show the bin boundaries in red
ax1.scatter(sample_means, random_y, s=1) # show the sample means with a random y
ax1.vlines(bins, 0, 1, ls=':', color='r') # show the bin boundaries in red
ax1.set_xticks(np.arange(3, 4, 0.02))
ax1.set_xlim(3.0, 3.3) # zoom in to region to better see the ins
ax1.set_title('bin boundaries between values' if step == 0.02 else 'chaotic bin boundaries')
plt.show()
PS: Note that the code would run much, much faster if instead of Python lists, the code would work completely with numpy.
Is it possible to trim zero 'records' of a structured numpy array without copying it; i.e. free allocated memory for the 'unused' zero entries at the beginning or the end; actually, I am only interested in trimming zeros at the end.
There is a builtin function numpy.trim_zeros() for 1d arrays. Its return value:
Returns:
trimmed : 1-D array or sequence
The result of trimming the input. The input data type is preserved.
However, I can't say from this whether this does not create a copy and only frees memory. I am not proficient enough to tell from its source code its behaviour.
More specifically, I have following code:
import numpy
edges = numpy.zeros(3, dtype=[('i', 'i4'), ('j', 'i4'), ('length', 'f4')])
# fill the first two records with sensible data:
edges[0]['i'] = 0
edges[0]['j'] = 1
edges[0]['length'] = 2.0
edges[1]['i'] = 1
edges[1]['j'] = 2
edges[1]['length'] = 2.0
# list memory adress and size
edges.__array_interface__
edges = numpy.trim_zeros(edges) # does not work for structured array
edges.__array_interface__
UPDATE
My question is somewhat 'twofold':
1) Does the builtin function simply frees memory or does it copy the array?
Answer: it copies creates a slice (=view); [ipython console] import numpy; numpy?? (see also Resize NumPy array to smaller size without copy and View onto a numpy array?)
2) What be a solution to have similar functionality for structured arrays?
Answer:
begin=(edges!=numpy.zeros(1,edges.dtype)).argmax()
end=len(edges)-(edges!=numpy.zeros(1,edges.dtype))[::-1].argmax()
# 1) create slice without copy but no memory is free
goodedges=edges[begin:end]
# 2) or copy and free memory (temporary both arrays exist)
goodedges=edges[begin:end].copy()
del edges
IMHO, there is two problem.
First, the trim_zeros function doesn't recognize zeroes on composite dtype.
You can locate them by begin=(edges!=zeros(1,edges.dtype)).argmax()
and end=len(edges)-(edges!=zeros(1,edges.dtype))[::-1].argmax(). Then goodedges=edges[begin:end] is the interresting data.
Second, the trim_zeros function doesn't free memory:
Returns -------
trimmed : 1-D array or sequence.
The result of trimming the input. The input data type is preserved.
So I think you must do it manually : goodedges=edges[begin:end].copy();del edges.
To expand on my comment, let's try trim_zeros on a simple integer array:
In [252]: arr = np.zeros(10,int)
In [253]: arr[3:8]=np.ones(5)
In [254]: arr
Out[254]: array([0, 0, 0, 1, 1, 1, 1, 1, 0, 0])
In [255]: arr1=np.trim_zeros(arr)
In [256]: arr1
Out[256]: array([1, 1, 1, 1, 1])
Now compare the __array_interface__ dictionaries:
In [257]: arr.__array_interface__
Out[257]:
{'descr': [('', '<i4')],
'shape': (10,),
'version': 3,
'strides': None,
'data': (150760432, False),
'typestr': '<i4'}
In [258]: arr1.__array_interface__
Out[258]:
{'descr': [('', '<i4')],
'shape': (5,),
'version': 3,
'strides': None,
'data': (150760444, False),
'typestr': '<i4'}
shape reflects the change we want. But look at the data pointer, ...432, and ...444. arr1 just points to 12 bytes (3 ints) further along the same buffer.
If I delete arr or reassign it (even arr=arr1), arr1 continues to point to this data buffer. numpy keeps some sort of reference count, and recycles a data buffer only when all references are gone.
The code for trim_zeros is (fetched in ipython with '??')
File: /usr/lib/python3/dist-packages/numpy/lib/function_base.py
def trim_zeros(filt, trim='fb'):
first = 0
trim = trim.upper()
if 'F' in trim:
for i in filt:
if i != 0.: break
else: first = first + 1
last = len(filt)
if 'B' in trim:
for i in filt[::-1]:
if i != 0.: break
else: last = last - 1
return filt[first:last]
The work is in the last line, and clearly returns a slice, a view. Most of the code handles the 2 trim options (F and B). Notice that it uses iteration to find the first and last non-zeros. That should be fine for arrays with just a few extra 0s at beginning or end. But it isn't the 'vectorized' kind of operation that SO questions often seek.
Before this question I didn't even know that trim_zeros existed, but I'm not at all surprised by its code and action.
On a side issue, here's a more compact way of creating your edges array.
In [259]: edges =np.zeros(3, dtype=[('i', 'i4'), ('j', 'i4'), ('length', 'f4')])
In [260]: edges[:2]=[(0,1,2.0),(1,2,2.0)]
To remove all the zero elements you could just use:
edges[edges!=numpy.zeros(1,edges.dtype)]
This is a copy. It does remove 'embedded' zeros as well, but that might not be an issue if the only zeros are those left at the end after filling in the earlier slots.
You may not need this trimming at all if you collect the edges data in a list, and build the array at the end:
edges1 = np.array([(0,1,2.0),(1,2,2.0)], dtype=edges.dtype)
I'm running an IPython Notebook:
$ ipython notebook --pylab inline
Is it possible to scale plots or images which are inline?
E.g. I have
pylab.xlabel("Label X")
pylab.ylabel("Label Y")
pylab.scatter(range(2,15,2), [2, 3, 5, 7, 11, 13, 17], c="r")
and I want to have it bigger.
Sure, I can try to manually change parameters, e.g.
pylab.figure(figsize=(12, 8))
pylab.xlabel("Label X", fontsize = 20)
pylab.ylabel("Label Y", fontsize = 20)
pylab.scatter(range(2,15,2), [2, 3, 5, 7, 11, 13, 17], c="r", s=100)
but it's neither convenient nor exact.
In Python v2.7.4 running IPython v0.13 with matplotlib v1.2.0 32-bit on Windows 8, I get a "handle" in the lower right corner to manually resize (keep aspect ratio and resolution) the inline plot, at least when the figure_format in use is 'png'. As for the other formats I'm not sure, but it appears that this behavior is not present when 'svg' is used.
You can change the default figure_format by uncommenting the line starting with
# c.InlineBackend.figure_format
in the config-file ipython_notebook_config.py in your profile-folder for IPython, and set this parameter to whatever format you want to use when running the notebook, e.g. 'png'.
If you want to change the default size of all inline plots, you can change the parameter c.InlineBackend.rc in the same config-file. If you e.g. want to set the figsize to (12, 8), you simply uncomment the relevant line in the file, making it say
c.InlineBackend.rc = {'figure.figsize': (12, 8)}
This parameter can also change the default fontsize, dpi, etc.