I am having trouble with the code below:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
from pylab import *
import sys
s = (('408b2e00', '24.21'), ('408b2e0c', '22.51'), ('4089e04a', '23.44'), ('4089e04d', '24.10'))
temp = [x[1] for x in s]
print temp
figure(figsize=(15, 8))
pts = [(886.38864047695108, 349.78744809964849), (1271.1506973277974, 187.65500904929195), (1237.272277227723, 860.38363675077176), (910.58751197700428, 816.82566805067597)]
x = map(lambda x: x[0],pts) # Extract the values from pts
y = map(lambda x: x[1],pts)
t = temp
result = zip(x,y,t)
img = mpimg.imread('floor.png')
imgplot = plt.imshow(img, cmap=cm.hot)
scatter(x, y, marker='h', c=t, s=150, vmin=-20, vmax=40)
print t
# Add cmap
Given the temperature in s - I am trying to set the values of the cmap so I can use temperatures between -10 and 30 instead of having to used values between 1 and 0. I have set the vmin and vmax values but it still gives me the error below:
ValueError: to_rgba: Invalid rgba arg "23.44" to_rgb: Invalid rgb arg "23.44" gray (string) must be in range 0-1
I have use earlier code to simplify the problem and have been successful. This example below works and shows what I am trying to (hopefully) do:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
from pylab import *
figure(figsize=(15, 8))
# use ginput to select markers for the sensors
markers = [(269, 792, -5), (1661, 800, 20), (1017, 457, 30)]
x,y,t = zip(*markers)
img = mpimg.imread('floor.png')
imgplot = plt.imshow(img, cmap=cm.hot)
scatter(x, y, marker='h', c=t, s=150, vmin=-10, vmax=30)
Any ideas why only the second solution works? I am working with dynamic values i.e inputs from mysql and user selected points and so the first solution would be much easier to get working later on (the rest of that code is in this question: Full program code )
Any help would be great. Thanks!
You are handing in strings instead of floats, change this line:
temp = [float(x[1]) for x in s]
matplotlib tries to be good about guessing what you mean and lets you define gray as a string of a float between [0, 1] which is what it is trying to do with your string values (and complaining because it is not in than range).
I have the following dataset, ratings in stars for two fictitious places:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'id':['A','A','A','A','A','A','A','B','B','B','B','B','B'],
Since the rating is a category (is not a continuous data) I convert it to a category:
df['rating_cat'] = pd.Categorical(df['rating'])
What I want is to create a bar plot per each fictitious place ('A or B'), and the count per each rating. This is the intended plot:
I guess using a for per each value in id could work, but I have some trouble to decide the size:
fig, ax = plt.subplots(1,2,figsize=(6,6))
axs = ax.flatten()
cats = df['rating_cat'].cat.categories.tolist()
ids_uniques = df.id.unique()
for i in range(len(ids_uniques)):
ax[i].bar(df[df['id']==ids_uniques[i]], df['rating'].size())
But it returns me an error TypeError: 'int' object is not callable
Perhaps it's something complicated what I am doing, please, could you guide me with this code
The pure matplotlib way:
from math import ceil
# Prepare the data for plotting
df_plot = df.groupby(["id", "rating"]).size()
unique_ids = df_plot.index.get_level_values("id").unique()
# Calculate the grid spec. This will be a n x 2 grid
# to fit one chart by id
ncols = 2
nrows = ceil(len(unique_ids) / ncols)
fig = plt.figure(figsize=(6,6))
for i, id_ in enumerate(unique_ids):
# In a figure grid spanning nrows x ncols, plot into the
# axes at position i + 1
ax = fig.add_subplot(nrows, ncols, i+1)
df_plot.xs(id_).plot(axes=ax, kind="bar")
You can simplify things a lot with Seaborn:
import seaborn as sns
sns.catplot(data=df, x="rating", col="id", col_wrap=2, kind="count")
If you're ok with installing a new library, seaborn has a very helpful countplot. Seaborn uses matplotlib under the hood and makes certain plots easier.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'id':['A','A','A','A','A','A','A','B','B','B','B','B','B'],
data = df,
x = 'rating',
hue = 'id',
I am trying to perform a linear interpolation in Python from a graph which have coordinate values say (x1,y1) and (x2,y2). According to my values I will get a straight line in the graph as in this figure
My aim is at 10^6(x-axis value) should give me the value of the parameter on y-axis but presently i am getting the extrapolate value not on the line.
Required Output:OUtput needed
I tried with below Code
import matplotlib.pyplot as plt
import math
import numpy as np
x = np.array([1, 10000000])
y = np.array([0.65, 0.25])
BK = np.asarray(np.interp(0.7,x,y))
plt.plot(1000000,BK, marker="o",markersize=10)
plt.plot([1000000,1000000,0],[0,BK,BK], "b--", linewidth=1)
plt.xlim(1, 100000000)
plt.ylim(0, 1)
Note that the line drawn in the chart is completely unrelated to the data because it is a line in the chart, not in data coordinates. An interpolation of that line hence has zero meaning!
If you still want to interpolate that line you first need to transform to logspace:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 10000000])
y = np.array([0.65, 0.25])
xinp = 1e6
BK = np.asarray(np.interp(np.log(xinp), np.log(x), y))
plt.plot(xinp, BK, marker="o",markersize=10)
plt.plot([1000000,1000000,0],[0,BK,BK], "b--", linewidth=1)
plt.xlim(1, 100000000)
plt.ylim(0, 1)
Does anyone have an idea how to change X axis scale and ticks to display a percentile distribution like the graph below? This image is from MATLAB, but I want to use Python (via Matplotlib or Seaborn) to generate.
From the pointer by #paulh, I'm a lot closer now. This code
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import probscale
import seaborn as sns
clear_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'}
sns.set(style='ticks', context='notebook', palette="muted", rc=clear_bkgd)
fig, ax = plt.subplots(figsize=(8, 4))
x = [30, 60, 80, 90, 95, 97, 98, 98.5, 98.9, 99.1, 99.2, 99.3, 99.4]
y = np.arange(0, 12.1, 1)
ax.set_xlim(40, 99.5)
ax.plot(x, y)
Generates the following plot (notice the re-distributed X-Axis)
Which I find much more useful than a the standard scale:
I contacted the author of the original graph and they gave me some pointers. It is actually a log scale graph, with x axis reversed and values of [100-val], with manual labeling of the x axis ticks. The code below recreates the original image with the same sample data as the other graphs here.
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
clear_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'}
sns.set(style='ticks', context='notebook', palette="muted", rc=clear_bkgd)
x = [30, 60, 80, 90, 95, 97, 98, 98.5, 98.9, 99.1, 99.2, 99.3, 99.4]
y = np.arange(0, 12.1, 1)
# Number of intervals to display.
# Later calculations add 2 to this number to pad it to align with the reversed axis
num_intervals = 3
x_values = 1.0 - 1.0/10**np.arange(0,num_intervals+2)
# Start with hard-coded lengths for 0,90,99
# Rest of array generated to display correct number of decimal places as precision increases
lengths = [1,2,2] + [int(v)+1 for v in list(np.arange(3,num_intervals+2))]
# Build the label string by trimming on the calculated lengths and appending %
labels = [str(100*v)[0:l] + "%" for v,l in zip(x_values, lengths)]
fig, ax = plt.subplots(figsize=(8, 4))
# Labels have to be reversed because axis is reversed
ax.xaxis.set_ticklabels( labels[::-1] )
ax.plot([100.0 - v for v in x], y)
ax.grid(True, linewidth=0.5, zorder=5)
ax.grid(True, which='minor', linewidth=0.5, linestyle=':')
plt.savefig("test.png", dpi=300, format='png')
This is the resulting graph:
These type of graphs are popular in the low-latency community for plotting latency distributions. When dealing with latencies most of the interesting information tends to be in the higher percentiles, so a logarithmic view tends to work better. I've first seen these graphs used in https://github.com/giltene/jHiccup and https://github.com/HdrHistogram/.
The cited graph was generated by the following code
n = ceil(log10(length(values)));
p = 1 - 1./10.^(0:0.01:n);
percentiles = prctile(values, p * 100);
semilogx(1./(1-p), percentiles);
The x-axis was labelled with the code below
labels = cell(n+1, 1);
for i = 1:n+1
labels{i} = getPercentileLabel(i-1);
set(gca, 'XTick', 10.^(0:n));
set(gca, 'XTickLabel', labels);
% {'0%' '90%' '99%' '99.9%' '99.99%' '99.999%' '99.999%' '99.9999%'}
function label = getPercentileLabel(i)
case 0
label = '0%';
case 1
label = '90%';
case 2
label = '99%';
label = '99.';
for k = 1:i-2
label = [label '9'];
label = [label '%'];
The following Python code uses Pandas to read a csv file that contains a list of recorded latency values (in milliseconds), then it records those latency values (as microseconds) in an HdrHistogram, and saves the HdrHistogram to an hgrm file, that will then be used by Seaborn to plot the latency distribution graph.
import pandas as pd
from hdrh.histogram import HdrHistogram
from hdrh.dump import dump
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import sys
import argparse
# Parse the command line arguments.
parser = argparse.ArgumentParser()
args = parser.parse_args()
csv_file = args.csv_file
hgrm_file = args.hgrm_file
png_file = args.png_file
# Read the csv file into a Pandas data frame and generate an hgrm file.
csv_df = pd.read_csv(csv_file, index_col=False)
MAX_LATENCY_USECS = 24 * 60 * 60 * USECS_PER_SEC # 24 hours
# MAX_LATENCY_USECS = int(csv_df['response-time'].max()) * USECS_PER_SEC # 1 hour
for latency_sec in csv_df['response-time'].tolist():
# histogram.record_corrected_value(latency_sec*USECS_PER_SEC, 10)
histogram.output_percentile_distribution(open(hgrm_file, 'wb'), USECS_PER_SEC, TICKS_PER_HALF_DISTANCE)
# Read the generated hgrm file into a Pandas data frame.
hgrm_df = pd.read_csv(hgrm_file, comment='#', skip_blank_lines=True, sep=r"\s+", engine='python', header=0, names=['Latency', 'Percentile'], usecols=[0, 3])
# Plot the latency distribution using Seaborn and save it as a png file.
fig, ax = plt.subplots(1,1,figsize=(20,15))
fig.suptitle('Latency Results')
sns.lineplot(x='Percentile', y='Latency', data=hgrm_df, ax=ax)
ax.set_title('Latency Distribution')
ax.set_xlabel('Percentile (%)')
ax.set_ylabel('Latency (seconds)')
ax.set_xticks([1, 10, 100, 1000, 10000, 100000, 1000000, 10000000])
ax.set_xticklabels(['0', '90', '99', '99.9', '99.99', '99.999', '99.9999', '99.99999'])
AFAIK Mayavi does not come with any perceptually uniform colormaps. I tried naively to just pass it one of Matplotlib's colormaps but it failed:
from mayavi import mlab
import multiprocessing
import matplotlib.pyplot as plt
plasma = plt.get_cmap('plasma')
mlab.pipeline.volume(..., colormap=plasma)
TraitError: Cannot set the undefined 'colormap' attribute of a 'VolumeFactory' object.
Edit: I found a guide to convert Matplotlib colormaps to Mayavi colormaps. However, it unfortunately doesn't work since I am trying to use a volume using a perceptually uniform colormap.
from matplotlib.cm import get_cmap
import numpy as np
from mayavi import mlab
values = np.linspace(0., 1., 256)
lut_dict = {}
lut_dict['plasma'] = get_cmap('plasma')(values.copy())
x, y, z = np.ogrid[-10:10:20j, -10:10:20j, -10:10:20j]
s = np.sin(x*y*z)/(x*y*z)
mlab.pipeline.volume(mlab.pipeline.scalar_field(s), vmin=0, vmax=0.8, colormap=lut_dict['plasma']) # still getting the same error
Instead of setting it as the colormap argument, if you set it as the ColorTransferFunction of the volume, it works as expected.
import numpy as np
from mayavi import mlab
from tvtk.util import ctf
from matplotlib.pyplot import cm
values = np.linspace(0., 1., 256)
x, y, z = np.ogrid[-10:10:20j, -10:10:20j, -10:10:20j]
s = np.sin(x*y*z)/(x*y*z)
volume = mlab.pipeline.volume(mlab.pipeline.scalar_field(s), vmin=0, vmax=0.8)
# save the existing colormap
c = ctf.save_ctfs(volume._volume_property)
# change it with the colors of the new colormap
# in this case 'plasma'
# load the color transfer function to the volume
ctf.load_ctfs(c, volume._volume_property)
# signal for update
volume.update_ctf = True
While the previous answer by like444 helped me partially with a similar problem, it leads to incorrect translation between colormaps. This is because the format in which matplotlib and tvtk store color information is slightly different: Matplotlib uses RGBA, while ColorTransferFunction uses VRGB, where V is the value in the shown data that this part of the colormap is assigned to. So by doing a 1-to-1 copy, green becomes red, blue becomes green and alpha becomes blue. The following code snippet fixes that:
def cmap_to_ctf(cmap_name):
values = list(np.linspace(0, 1, 256))
cmap = cm.get_cmap(cmap_name)(values)
transfer_function = ctf.ColorTransferFunction()
for i, v in enumerate(values):
transfer_function.add_rgb_point(v, cmap[i, 0], cmap[i, 1], cmap[i, 2])
return transfer_function
I want to create a scatter plot with matplotlib where the data points have scalar data attached to them and are assigned a color depending on how large their attached value is relative to the other points in the set. I.e., I want something akin to a heatmap. However, I'm looking for a "discrete" heatmap, i.e. nothing should be ploted where there were no points in the original data set and, in particular, no interpolation (in space) should be performed.
Can this be done?
you can use scatter, and set the attached value to c parameter:
import numpy as np
import pylab as pl
x = np.random.uniform(-1, 1, 1000)
y = np.random.uniform(-1, 1, 1000)
z = np.sqrt(x*x+y*y)
pl.scatter(x, y, c=z)
Solving this in Altair.
import numpy as np
import pylab as pl
x = np.random.uniform(-1, 1, 1000)
y = np.random.uniform(-1, 1, 1000)
z = np.sqrt(x*x+y*y)
df = pd.DataFrame({'x':x,'y':y, 'z':z})
from altair import *
Chart(df).mark_circle().encode(x='x',y='y', color='z')