I have a list of numpy arrays, each potentially having a different number of elements, such as:
[array([55]),
array([54]),
array([], dtype=float64),
array([48, 55]),]
I would like to plot this, where each array has an abscissa (x value) assigned, such as [1,2,3,4] so that the plot should show the following points: [[1,55], [2, 54], [4, 48], [4, 55]].
Is there a way I can do that with matplotlib? or how can I transform the data with numpy or pandas first so that it is can be plotted?
What you want to do is chain the original array and generate a new array with "abscissas". There are many way to concatenated, one of the most efficient is using itertools.chain.
import itertools
from numpy import array
x = [array([55]), array([54]), array([]), array([48, 55])]
ys = list(itertools.chain(*x))
# this will be [55, 54, 48, 55]
# generate abscissas
xs = list(itertools.chain(*[[i+1]*len(x1) for i, x1 in enumerate(x)]))
Now you can just plot easily with matplotlib as below
import matplotlib.pyplot as plt
plt.plot(xs, ys)
If you want to have different markers for different groups of data (the colours are automatically cycled by matplotlib):
import numpy as np
import matplotlib.pyplot as plt
markers = ['o', #'circle',
'v', #'triangle_down',
'^', #'triangle_up',
'<', #'triangle_left',
'>', #'triangle_right',
'1', #'tri_down',
'2', #'tri_up',
'3', #'tri_left',
'4', #'tri_right',
'8', #'octagon',
's', #'square',
'p', #'pentagon',
'h', #'hexagon1',
'H', #'hexagon2',
'D', #'diamond',
'd', #'thin_diamond'
]
n_markers = len(markers)
a = [10.*np.random.random(int(np.random.random()*10)) for i in xrange(n_markers)]
fig = plt.figure()
ax = fig.add_subplot(111)
for i, data in enumerate(a):
xs = data.shape[0]*[i,] # makes the abscissas list
marker = markers[i % n_markers] # picks a valid marker
ax.plot(xs, data, marker, label='data %d, %s'%(i, marker))
ax.set_xlim(-1, 1.4*len(a))
ax.set_ylim(0, 10)
ax.legend(loc=None)
fig.tight_layout()
Notice the limits to y scale are hard coded, change accordingly. The 1.4*len(a) is meant to leave room on the right side of the graph for the legend.
The example above has no point in the x=0 (would be dark blue circles) as the randomly assigned size for its data set was zero, but you can easily place a +1 if you don't want to use x=0.
Using pandas to create a numpy array with nans inserted when an array is empty or shorter than the longest array in the list...
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
arr_list = [np.array([55]),
np.array([54]),
np.array([], dtype='float64'),
np.array([48, 55]),]
df = pd.DataFrame(arr_list)
list_len = len(df)
repeats = len(list(df))
vals = df.values.flatten()
xax = np.repeat(np.arange(list_len) + 1, repeats)
df_plot = pd.DataFrame({'xax': xax, 'vals': vals})
plt.scatter(df_plot.xax, df_plot.vals);
with x your list :
[plt.plot(np.repeat(i,len(x[i])), x[i],'.') for i in range(len(x))]
plt.show()
#Alessandro Mariani's answer based on itertools made me think of another way to generate an array containg the data I needed. In some cases it may be more compact. It is also based on itertools.chain:
import itertools
from numpy import array
y = [array([55]), array([54]), array([]), array([48, 55])]
x = array([1,2,3,4])
d = array(list(itertools.chain(*[itertools.product([t], n) for t, n in zip(x,y)])))
d is now the following array:
array([[ 1, 55],
[ 2, 54],
[ 4, 48],
[ 4, 55]])
Related
I have a problem figuring out how to have Seaborn show the right values in a logarithmic barplot. A value of mine should be, in the ideal case, be 1. My dataseries (5,2,1,0.5,0.2) has a set of values that deviate from unity and I want to visualize these in a logarithmic barplot. However, when plotting this in the standard log-barplot it shows the following:
But the values under one are shown to increase from -infinity to their value, whilst the real values ought to look like this:
Strangely enough, I was unable to find a Seaborn, Pandas or Matplotlib attribute to "snap" to a different horizontal axis or "align" or ymin/ymax. I have a feeling I am unable to find it because I can't find the terms to shove down my favorite search engine. Some semi-solutions I found just did not match what I was looking for or did not have either xaxis = 1 or a ylog. A try that uses some jank Matplotlib lines:
If someone knows the right terms or a solution, thank you in advance.
Here are the Jupyter cells I used:
{1}
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
data = {'X': ['A','B','C','D','E'], 'Y': [5,2,1,0.5,0.2]}
df = pd.DataFrame(data)
{2}
%matplotlib widget
g = sns.catplot(data=df, kind="bar", y = "Y", x = "X", log = True)
{3}
%matplotlib widget
plt.vlines(x=data['X'], ymin=1, ymax=data['Y'])
You could let the bars start at 1 instead of at 0. You'll need to use sns.barplot directly.
The example code subtracts 1 of all y-values and sets the bar bottom at 1.
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import seaborn as sns
import pandas as pd
import numpy as np
data = {'X': ['A', 'B', 'C', 'D', 'E'], 'Y': [5, 2, 1, 0.5, 0.2]}
df = pd.DataFrame(data)
ax = sns.barplot(y=df["Y"] - 1, x=df["X"], bottom=1, log=True, palette='flare_r')
ax.axhline(y=1, c='k')
# change the y-ticks, as the default shows too few in this case
ax.set_yticks(np.append(np.arange(.2, .8, .1), np.arange(1, 7, 1)), minor=False)
ax.set_yticks(np.arange(.3, 6, .1), minor=True)
ax.yaxis.set_major_formatter(lambda x, pos: f'{x:.0f}' if x >= 1 else f'{x:.1f}')
ax.yaxis.set_minor_formatter(NullFormatter())
ax.bar_label(ax.containers[0], labels=df["Y"])
sns.despine()
plt.show()
PS: With these specific values, the plot might go without logscale:
this is my second try for the same question and I really hope that someone may help me...
Even thought some really nice people tried to help me. There is a lot I couldn't figure out, despite there help.
From the beginning:
I created a dataframe. This dataframe is huge and gives information about travellers in a city. The dataframe looks like this. This is only the head.
In origin and destination you have the ids of the citylocations, in move how many travelled from origin to destination. longitude and latitude is where the exact point is and the linestring the combination of the points..
I created the linestring with this code:
erg2['Linestring'] = erg2.apply(lambda x: LineString([(x['latitude_origin'], x['longitude_origin']), (x['latitude_destination'], x['longitude_destination'])]), axis = 1)
Now my question is how to plot the ways over a map. Even thought I tried all th eexamples from the geopandas documentary etc. I cant help myself..
I cant show you what I already plotted because it doesnt make sense and I guess it would be smarter to start plotting from the beginning.
You see that in the column move there are some 0. This means that no one travelled this route. So this I dont need to plot..
I have to plot the lines with the information where the traveller started origin and where he went destination.
also I need to outline the different lines depending on movements..
with this plotting code
fig = px.line_mapbox(erg2, lat="latitude_origin", lon="longitude_origin", color="move",
hover_name= gdf["origin"] + " - " + gdf["destination"],
center =dict(lon=13.41053,lat=52.52437), zoom=3, height=600
)
fig.update_layout(mapbox_style="stamen-terrain", mapbox_zoom=4, mapbox_center_lat = 52.52437,
margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
Maybe someone has an idea???
I tried it with thios code:
import requests, io, json
import geopandas as gpd
import shapely.geometry
import pandas as pd
import numpy as np
import itertools
import plotly.express as px
# get some public addressess - hospitals. data that has GPS lat / lon
dfhos = pd.read_csv(io.StringIO(requests.get("http://media.nhschoices.nhs.uk/data/foi/Hospital.csv").text),
sep="¬",engine="python",).loc[:, ["OrganisationName", "Latitude", "Longitude"]]
a = np.arange(len(dfhos))
np.random.shuffle(a)
# establish N links between hospitals
N = 10
df = (
pd.DataFrame({0:a[0:N], 1:a[25:25+N]}).merge(dfhos,left_on=0,right_index=True)
.merge(dfhos,left_on=1, right_index=True, suffixes=("_origin", "_destination"))
)
# build a geopandas data frame that has LineString between two hospitals
gdf = gpd.GeoDataFrame(
data=df,
geometry=df.apply(
lambda r: shapely.geometry.LineString(
[(r["Longitude_origin"], r["Latitude_origin"]),
(r["Longitude_destination"], r["Latitude_destination"]) ]), axis=1)
)
# sample code https://plotly.com/python/lines-on-mapbox/#lines-on-mapbox-maps-from-geopandas
lats = []
lons = []
names = []
for feature, name in zip(gdf.geometry, gdf["OrganisationName_origin"] + " - " + gdf["OrganisationName_destination"]):
if isinstance(feature, shapely.geometry.linestring.LineString):
linestrings = [feature]
elif isinstance(feature, shapely.geometry.multilinestring.MultiLineString):
linestrings = feature.geoms
else:
continue
for linestring in linestrings:
x, y = linestring.xy
lats = np.append(lats, y)
lons = np.append(lons, x)
names = np.append(names, [name]*len(y))
lats = np.append(lats, None)
lons = np.append(lons, None)
names = np.append(names, None)
fig = px.line_mapbox(lat=lats, lon=lons, hover_name=names)
fig.update_layout(mapbox_style="stamen-terrain",
mapbox_zoom=4,
mapbox_center_lon=gdf.total_bounds[[0,2]].mean(),
mapbox_center_lat=gdf.total_bounds[[1,3]].mean(),
margin={"r":0,"t":0,"l":0,"b":0}
)
which looks like the perfect code but I cant really use it for my data..
I am very new to coding. So please be patient a bit;))
Thanks a lot in advance.
All the best
previously answered this question How to plot visualize a Linestring over a map with Python?. I suggested that you update that question, I still recommend that you do
line strings IMHO are not the way to go. plotly does not use line strings, so it's extra complexity to encode to line strings to decode to numpy arrays. check out the examples on official documentation https://plotly.com/python/lines-on-mapbox/. here it is very clear geopandas is just a source that has to be encoded into numpy arrays
data
your sample data it appears should be one Dataframe and has no need for geopandas or line strings
almost all of your sample data is unusable as every row where origin and destination are different have move of zero which you note should be excluded
import pandas as pd
import numpy as np
import plotly.express as px
df = pd.DataFrame({"origin": [88, 88, 88, 88, 88, 87],
"destination": [88, 89, 110, 111, 112, 83],
"move": [20, 0, 5, 0, 0, 10],
"longitude_origin": [13.481016, 13.481016, 13.481016, 13.481016, 13.481016, 13.479667],
"latitude_origin": [52.457055, 52.457055, 52.457055, 52.457055, 52.457055, 52.4796],
"longitude_destination": [13.481016, 13.504075, 13.613772, 13.586891, 13.559341, 13.481016],
"latitude_destination": [52.457055, 52.443923, 52.533194, 52.523562, 52.507418, 52.457055]})
solution
have further refined line_array() function so it can be used to encode hover and color parameters from simplified solution I previously provided
# lines in plotly are delimited by none
def line_array(data, cols=[], empty_val=None):
if isinstance(data, pd.DataFrame):
vals = data.loc[:, cols].values
elif isinstance(data, pd.Series):
a = data.values
vals = np.pad(a.reshape(a.shape[0], -1), [(0, 0), (0, 1)], mode="edge")
return np.pad(vals, [(0, 0), (0, 1)], constant_values=empty_val).reshape(
1, (len(df) * 3))[0]
# only draw lines where move > 0 and destination is different to origin
df = df.loc[df["move"].gt(0) & (df["origin"]!=df["destination"])]
lons = line_array(df, ["longitude_origin", "longitude_destination"])
lats = line_array(df, ["latitude_origin", "latitude_destination"])
fig = px.line_mapbox(
lat=lats,
lon=lons,
hover_name=line_array(
df.loc[:, ["origin", "destination"]].astype(str).apply(" - ".join, axis=1)
),
hover_data={
"move": line_array(df, ["move", "move"], empty_val=-99),
"origin": line_array(df, ["origin", "origin"], empty_val=-99),
},
color=line_array(df, ["origin", "origin"], empty_val=-99),
).update_traces(visible=False, selector={"name": "-99"})
fig.update_layout(
mapbox={
"style": "stamen-terrain",
"zoom": 9.5,
"center": {"lat": lats[0], "lon": lons[0]},
},
margin={"r": 0, "t": 0, "l": 0, "b": 0},
)
Here's an example:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
import numpy as np
dfboth = {
'I': [1,2,3,4,5,6],
'S': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
'DVAR': [800, 300, 820, 330, 910, 350],
'CVAR': [1001, 612, 990, 639, 600, 130]}
dfboth = pd.DataFrame(dfboth)
dfboth = dfboth.assign(DVARCHANGE=dfboth['DVAR'].diff(2))
dfboth = dfboth.assign(CVARCHANGE=dfboth['CVAR'].diff(2))
plt.rcParams["figure.figsize"] = (24, 9) # (w, h)
plt.subplot(2,2,1)
plt.plot('I','DVAR', data=dfboth[dfboth.S=="X"])
plt.plot('I','DVARCHANGE', data=dfboth[dfboth.S=="X"])
plt.title("X-D")
plt.legend()
plt.subplot(2,2,2)
plt.plot('I','DVAR', data=dfboth[dfboth.S=="Y"])
plt.plot('I','DVARCHANGE', data=dfboth[dfboth.S=="Y"])
plt.title("Y-D")
plt.legend()
plt.subplot(2,2,3)
plt.plot('I','CVAR', data=dfboth[dfboth.S=="X"])
plt.plot('I','CVARCHANGE', data=dfboth[dfboth.S=="X"])
plt.title("X-C")
plt.legend()
plt.subplot(2,2,4)
plt.plot('I','CVAR', data=dfboth[dfboth.S=="Y"])
plt.plot('I','CVARCHANGE', data=dfboth[dfboth.S=="Y"])
plt.title("Y-C")
plt.legend()
I have a series of data points (a time series), I=1,2,3 ... and they each pertain to a certain 'S', in this example, X and Y. For each reading, we have two variables DVAR and CVAR. I am trying to make this graph
I compare for S X and Y, DVAR and it's change from the previous reading, and CVAR and it's change in previous reading.
You can also see annoying repetition. But I actually have 12 S's not just X and Y. And I have more variables.
I believe there's a much better way of doing this than I have written using either stacked indexes or some kind of pivot table. But I've not been able to figure it out!
You can use a for-loop:
plot_titles = ["X-D", "Y-D", "X-C", "Y-C"]
y1 = ['DVAR', 'DVAR', 'CVAR', 'CVAR']
y2 = [y + 'CHANGE' for y in y1]
data1 = ["X", "Y", "X", "Y"]
for i in range(4):
plt.subplot(2, 2, i+1)
plt.plot('I', y1[i], data = dfboth[dfboth.S == data1[i]])
plt.plot('I', y2[i], data = dfboth[dfboth.S == data1[i]])
plt.title(plot_titles[i])
plt.legend()
I have two different data sets. I want to plot histogram using two different data sets but keeping the bins same, there width and range of each bin should be same.
Data1 = np.array([1,2,3,3,5,6,7,8])
Data2 = np.array[1,2,3,4,6,7,8,8]
n,bins,patches = plt.hist(Data1,bins=20)
plt.ylabel("no of states")
plt.xlabel("bins")
plt.savefig("./DOS")`
You can look at the documentation for matplotlib.pyplot.hist and you will see that the bins argument can be an integer (defining the number of bins) or a sequence (defining the edges of the bins themselves).
Therefore, you need to manually define the bins you want to use and pass these to plt.hist:
import matplotlib.pyplot as plt
import numpy as np
bin_edges = [0, 2, 4, 6, 8]
data = np.random.rand(50) * 8
plt.hist(data, bins=bin_edges)
You can pass the bins returned from your first histogram plot as an argument to the second histogram to make sure both have the same bin sizes.
Complete answer:
import numpy as np
import matplotlib.pyplot as plt
Data1 = np.array([1, 2, 3, 3, 5, 6, 7, 8])
Data2 = np.array([1, 2, 3, 4, 6, 7, 8, 8])
n, bins, patches = plt.hist(Data1, bins=20, label='Data 1')
plt.hist(Data2, bins=bins, label='Data 2')
plt.ylabel("no of states")
plt.xlabel("bins")
plt.legend()
plt.show()
I am using matplotlib to create 2d line-plots. For the purposes of publication, I would like to have those plots in black and white (not grayscale), and I am struggling to find a non-intrusive solution for that.
Gnuplot automatically alters dashing patterns for different lines, is something similar possible with matplotlib?
Below I provide functions to convert a colored line to a black line with unique style. My quick test showed that after 7 lines, the colors repeated. If this is not the case (and I made a mistake), then a minor adjustment is needed for the "constant" COLORMAP in the provided routine.
Here's the routine and example:
import matplotlib.pyplot as plt
import numpy as np
def setAxLinesBW(ax):
"""
Take each Line2D in the axes, ax, and convert the line style to be
suitable for black and white viewing.
"""
MARKERSIZE = 3
COLORMAP = {
'b': {'marker': None, 'dash': (None,None)},
'g': {'marker': None, 'dash': [5,5]},
'r': {'marker': None, 'dash': [5,3,1,3]},
'c': {'marker': None, 'dash': [1,3]},
'm': {'marker': None, 'dash': [5,2,5,2,5,10]},
'y': {'marker': None, 'dash': [5,3,1,2,1,10]},
'k': {'marker': 'o', 'dash': (None,None)} #[1,2,1,10]}
}
lines_to_adjust = ax.get_lines()
try:
lines_to_adjust += ax.get_legend().get_lines()
except AttributeError:
pass
for line in lines_to_adjust:
origColor = line.get_color()
line.set_color('black')
line.set_dashes(COLORMAP[origColor]['dash'])
line.set_marker(COLORMAP[origColor]['marker'])
line.set_markersize(MARKERSIZE)
def setFigLinesBW(fig):
"""
Take each axes in the figure, and for each line in the axes, make the
line viewable in black and white.
"""
for ax in fig.get_axes():
setAxLinesBW(ax)
xval = np.arange(100)*.01
fig = plt.figure()
ax = fig.add_subplot(211)
ax.plot(xval,np.cos(2*np.pi*xval))
ax.plot(xval,np.cos(3*np.pi*xval))
ax.plot(xval,np.cos(4*np.pi*xval))
ax.plot(xval,np.cos(5*np.pi*xval))
ax.plot(xval,np.cos(6*np.pi*xval))
ax.plot(xval,np.cos(7*np.pi*xval))
ax.plot(xval,np.cos(8*np.pi*xval))
ax = fig.add_subplot(212)
ax.plot(xval,np.cos(2*np.pi*xval))
ax.plot(xval,np.cos(3*np.pi*xval))
ax.plot(xval,np.cos(4*np.pi*xval))
ax.plot(xval,np.cos(5*np.pi*xval))
ax.plot(xval,np.cos(6*np.pi*xval))
ax.plot(xval,np.cos(7*np.pi*xval))
ax.plot(xval,np.cos(8*np.pi*xval))
fig.savefig("colorDemo.png")
setFigLinesBW(fig)
fig.savefig("bwDemo.png")
This provides the following two plots:
First in color:
Then in black and white:
You can adjust how each color is converted to a style. If you just want to only play with the dash style (-. vs. -- vs. whatever pattern you want), set the COLORMAP corresponding 'marker' value to None and adjusted the 'dash' pattern, or vice versa.
For example, the last color in the dictionary is 'k' (for black); originally I had only a dashed pattern [1,2,1,10], corresponding to one pixel shown, two not, one shown, 10 not, which is a dot-dot-space pattern. Then I commented that out, setting the dash to (None,None), a very formal way of saying solid line, and added the marker 'o', for circle.
I also set a 'constant' MARKERSIZE, which will set the size of each marker, because I found the default size to be a little large.
This obviously does not handle the case when your lines already have a dash or marker patter, but you can use these routines as a starting point to build a more sophisticated converter. For example if you original plot had a red solid line and a red dotted line, they both would turn into black dash-dot lines with these routines. Something to keep in mind when you use them.
TL;DR
import matplotlib.pyplot as plt
from cycler import cycler
monochrome = (cycler('color', ['k']) * cycler('marker', ['', '.']) *
cycler('linestyle', ['-', '--', ':', '=.']))
plt.rc('axes', prop_cycle=monochrome)
...
Extended answer
Newer matplotlib releases introduced a new rcParams, namely axes.prop_cycle
In [1]: import matplotlib.pyplot as plt
In [2]: plt.rcParams['axes.prop_cycle']
Out[2]: cycler('color', ['b', 'g', 'r', 'c', 'm', 'y', 'k'])
For the precanned styles, available by plt.style.use(...) or with plt.style.context(...):, the prop_cycle is equivalent to the traditional and deprecated axes.color_cycle
In [3]: plt.rcParams['axes.color_cycle']
/.../__init__.py:892: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
warnings.warn(self.msg_depr % (key, alt_key))
Out[3]: ['b', 'g', 'r', 'c', 'm', 'y', 'k']
but the cycler object has many more possibilities, in particular a complex cycler can be composed from simpler ones, referring to different properties, using + and *, meaning respectively zipping and Cartesian product.
Here we import the cycler helper function, we define 3 simple cycler that refer to different properties and finally compose them using the Cartesian product
In [4]: from cycler import cycler
In [5]: color_c = cycler('color', ['k'])
In [6]: style_c = cycler('linestyle', ['-', '--', ':', '-.'])
In [7]: markr_c = cycler('marker', ['', '.', 'o'])
In [8]: c_cms = color_c * markr_c * style_c
In [9]: c_csm = color_c * style_c * markr_c
Here we have two different(?) complex cycler and yes, they are different because this operation is non-commutative, have a look
In [10]: for d in c_csm: print('\t'.join(d[k] for k in d))
- k
- . k
- o k
-- k
-- . k
-- o k
: k
: . k
: o k
-. k
-. . k
-. o k
In [11]: for d in c_cms: print('\t'.join(d[k] for k in d))
- k
-- k
: k
-. k
- . k
-- . k
: . k
-. . k
- o k
-- o k
: o k
-. o k
The elemental cycle that changes faster is the last in the product, etc., this is important if we want a certain order in the styling of lines.
How to use the composition of cyclers? By the means of plt.rc, or an equivalent way to modify the rcParams of matplotlib. E.g.,
In [12]: %matplotlib
Using matplotlib backend: Qt4Agg
In [13]: import numpy as np
In [14]: x = np.linspace(0, 8, 101)
In [15]: y = np.cos(np.arange(7)+x[:,None])
In [16]: plt.rc('axes', prop_cycle=c_cms)
In [17]: plt.plot(x, y);
In [18]: plt.grid();
Of course this is just an example, and the OP can mix and match different properties to achieve the most pleasing visual output.
PS I forgot to mention that this approach automatically takes care of line samples in the legend box,
I heavily did use Yann's code, but today I read an answer from Can i cycle through line styles in matplotlib So now I will make my BW plots in this way:
import pylab as plt
from itertools import cycle
lines = ["k-","k--","k-.","k:"]
linecycler = cycle(lines)
plt.figure()
for i in range(4):
x = range(i,i+10)
plt.plot(range(10),x,next(linecycler))
plt.show()
Things like plot(x,y,'k-.') will produce the black ('k') dot-dashed ('-.') line. Is that not what you a looking for?