In matplotlib, is there a method to fix or arrange the order of x-values of a mixed type with a character and digits? - matplotlib

There are several Q/A for x-values in matplotlib and it shows when the x values are int or float, matploblit plots the figure in the right order of x. For example, in character type, the plot shows x values in the order of
1 15 17 2 21 7 etc
but when it became int, it becomes
1 2 7 15 17 21 etc
in human order.
If the x values are mixed with character and digits such as
NN8 NN10 NN15 NN20 NN22 etc
the plot will show in the order of
NN10 NN15 NN20 NN22 NN8 etc
Is there a way to fix the order of x values in the human order or the existing order in the x list without removing 'NN' in x-values.
In more detail, the xvalues are directory names and using grep sort inside linux function, the results are displayed in linux terminal as follows, which can be saved in text file.
joonho#login:~/NDataNpowN$ get_TEFrmse NN 2 | sort -n -t N -k 3
NN7 0.3311
NN8 0.3221
NN9 0.2457
NN10 0.2462
NN12 0.2607
NN14 0.2635
Without sort, the linux shell also displays in the machine order such as
NN10 0.2462
NN12 0.2607
NN14 0.2635
NN7 0.3311
NN8 0.3221
NN9 0.2457

As I said, pandas would make this task easier than dealing with base Python lists and such:
import matplotlib.pyplot as plt
import pandas as pd
#imports the text file assuming that your data are separated by space, as in your example above
df = pd.read_csv("test.txt", delim_whitespace=True, names=["X", "Y"])
#extracting the number in a separate column, assuming you do not have terms like NN1B3X5
df["N"] = df.X.str.replace(r"\D", "", regex=True).astype(int)
#this step is only necessary, if your file is not pre-sorted by Linux
df = df.sort_values(by="N")
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
#categorical plotting
df.plot(x="X", y="Y", ax=ax1)
ax1.set_title("Evenly spaced")
#numerical plotting
df.plot(x="N", y="Y", ax=ax2)
ax2.set_xticks(df.N)
ax2.set_xticklabels(df.X)
ax2.set_title("Numerical spacing")
plt.show()
Sample output:
Since you asked if there is a non-pandas solution - of course. Pandas makes some things just more convenient. In this case, I would revert to numpy. Numpy is a matplotlib dependency, so in contrast to pandas, it must be installed, if you use matplotlib:
import matplotlib.pyplot as plt
import numpy as np
import re
#read file as strings
arr = np.genfromtxt("test.txt", dtype="U15")
#remove trailing strings
Xnums = np.asarray([re.sub(r"\D", "", i) for i in arr[:, 0]], dtype=int)
#sort array
arr = arr[np.argsort(Xnums)]
#extract x-values as strings...
Xstr = arr[:, 0]
#...and y-values as float
Yvals = arr[:, 1].astype(float)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
#categorical plotting
ax1.plot(Xstr, Yvals)
ax1.set_title("Evenly spaced")
#numerical plotting
ax2.plot(np.sort(Xnums), Yvals)
ax2.set_xticks(np.sort(Xnums))
ax2.set_xticklabels(Xstr)
ax2.set_title("Numerical spacing")
plt.show()
Sample output:

Related

pandas subplot, split into rows [duplicate]

I have a few Pandas DataFrames sharing the same value scale, but having different columns and indices. When invoking df.plot(), I get separate plot images. what I really want is to have them all in the same plot as subplots, but I'm unfortunately failing to come up with a solution to how and would highly appreciate some help.
You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2)
df1.plot(ax=axes[0,0])
df2.plot(ax=axes[0,1])
...
Here axes is an array which holds the different subplot axes, and you can access one just by indexing axes.
If you want a shared x-axis, then you can provide sharex=True to plt.subplots.
You can see e.gs. in the documentation demonstrating joris answer. Also from the documentation, you could also set subplots=True and layout=(,) within the pandas plot function:
df.plot(subplots=True, layout=(1,2))
You could also use fig.add_subplot() which takes subplot grid parameters such as 221, 222, 223, 224, etc. as described in the post here. Nice examples of plot on pandas data frame, including subplots, can be seen in this ipython notebook.
You can plot multiple subplots of multiple pandas data frames using matplotlib with a simple trick of making a list of all data frame. Then using the for loop for plotting subplots.
Working code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
#define number of rows and columns for subplots
nrow=3
ncol=2
# make a list of all dataframes
df_list = [df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)
# plot counter
count=0
for r in range(nrow):
for c in range(ncol):
df_list[count].plot(ax=axes[r,c])
count+=1
Using this code you can plot subplots in any configuration. You need to define the number of rows nrow and the number of columns ncol. Also, you need to make list of data frames df_list which you wanted to plot.
You can use the familiar Matplotlib style calling a figure and subplot, but you simply need to specify the current axis using plt.gca(). An example:
plt.figure(1)
plt.subplot(2,2,1)
df.A.plot() #no need to specify for first axis
plt.subplot(2,2,2)
df.B.plot(ax=plt.gca())
plt.subplot(2,2,3)
df.C.plot(ax=plt.gca())
etc...
You can use this:
fig = plt.figure()
ax = fig.add_subplot(221)
plt.plot(x,y)
ax = fig.add_subplot(222)
plt.plot(x,z)
...
plt.show()
You may not need to use Pandas at all. Here's a matplotlib plot of cat frequencies:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x**2)
f, axes = plt.subplots(2, 1)
for c, i in enumerate(axes):
axes[c].plot(x, y)
axes[c].set_title('cats')
plt.tight_layout()
Option 1: Create subplots from a dictionary of dataframes with long (tidy) data
Assumptions:
There is a dictionary of multiple dataframes of tidy data that are either:
Created by reading in from files
Created by separating a single dataframe into multiple dataframes
The categories, cat, may be overlapping, but all dataframes don't necessarily contain all values of cat
hue='cat'
This example uses a dict of dataframes, but a list of dataframes would be similar.
If the dataframes are wide, use pandas.DataFrame.melt to convert them to long form.
Because dataframes are being iterated through, there's no guarantee that colors will be mapped the same for each plot
A custom color map needs to be created from the unique 'cat' values for all the dataframes
Since the colors will be the same, place one legend to the side of the plots, instead of a legend in every plot
Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1, seaborn 0.11.2
Imports and Test Data
import pandas as pd
import numpy as np # used for random data
import matplotlib.pyplot as plt
from matplotlib.patches import Patch # for custom legend - square patches
from matplotlib.lines import Line2D # for custom legend - round markers
import seaborn as sns
import math import ceil # determine correct number of subplot
# synthetic data
df_dict = dict()
for i in range(1, 7):
np.random.seed(i) # for repeatable sample data
data_length = 100
data = {'cat': np.random.choice(['A', 'B', 'C'], size=data_length),
'x': np.random.rand(data_length), 'y': np.random.rand(data_length)}
df_dict[i] = pd.DataFrame(data)
# display(df_dict[1].head())
cat x y
0 B 0.944595 0.606329
1 A 0.586555 0.568851
2 A 0.903402 0.317362
3 B 0.137475 0.988616
4 B 0.139276 0.579745
# display(df_dict[6].tail())
cat x y
95 B 0.881222 0.263168
96 A 0.193668 0.636758
97 A 0.824001 0.638832
98 C 0.323998 0.505060
99 C 0.693124 0.737582
Create color mappings and plot
# create color mapping based on all unique values of cat
unique_cat = {cat for v in df_dict.values() for cat in v.cat.unique()} # get unique cats
colors = sns.color_palette('tab10', n_colors=len(unique_cat)) # get a number of colors
cmap = dict(zip(unique_cat, colors)) # zip values to colors
col_nums = 3 # how many plots per row
row_nums = math.ceil(len(df_dict) / col_nums) # how many rows of plots
# create the figue and axes
fig, axes = plt.subplots(row_nums, col_nums, figsize=(9, 6), sharex=True, sharey=True)
# convert to 1D array for easy iteration
axes = axes.flat
# iterate through dictionary and plot
for ax, (k, v) in zip(axes, df_dict.items()):
sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap, ax=ax)
sns.despine(top=True, right=True)
ax.legend_.remove() # remove the individual plot legends
ax.set_title(f'dataset = {k}', fontsize=11)
fig.tight_layout()
# create legend from cmap
# patches = [Patch(color=v, label=k) for k, v in cmap.items()] # square patches
patches = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v, label=k, markersize=8) for k, v in cmap.items()] # round markers
# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(title='cat', handles=patches, bbox_to_anchor=(1.06, 1.2), loc='center left', borderaxespad=0, frameon=False)
plt.show()
Option 2: Create subplots from a single dataframe with multiple separate datasets
The dataframes must be in a long form with the same column names.
This option uses pd.concat to combine multiple dataframes into a single dataframe, and .assign to add a new column.
See Import multiple csv files into pandas and concatenate into one DataFrame for creating a single dataframes from a list of files.
This option is easier because it doesn't require manually mapping colors to 'cat'
Combine DataFrames
# using df_dict, with dataframes as values, from the top
# combine all the dataframes in df_dict to a single dataframe with an identifier column
df = pd.concat((v.assign(dataset=k) for k, v in df_dict.items()), ignore_index=True)
# display(df.head())
cat x y dataset
0 B 0.944595 0.606329 1
1 A 0.586555 0.568851 1
2 A 0.903402 0.317362 1
3 B 0.137475 0.988616 1
4 B 0.139276 0.579745 1
# display(df.tail())
cat x y dataset
595 B 0.881222 0.263168 6
596 A 0.193668 0.636758 6
597 A 0.824001 0.638832 6
598 C 0.323998 0.505060 6
599 C 0.693124 0.737582 6
Plot a FacetGrid with seaborn.relplot
sns.relplot(kind='scatter', data=df, x='x', y='y', hue='cat', col='dataset', col_wrap=3, height=3)
Both options create the same result, however, it's less complicated to combine all the dataframes, and plot a figure-level plot with sns.relplot.
Building on #joris response above, if you have already established a reference to the subplot, you can use the reference as well. For example,
ax1 = plt.subplot2grid((50,100), (0, 0), colspan=20, rowspan=10)
...
df.plot.barh(ax=ax1, stacked=True)
Here is a working pandas subplot example, where modes is the column names of the dataframe.
dpi=200
figure_size=(20, 10)
fig, ax = plt.subplots(len(modes), 1, sharex="all", sharey="all", dpi=dpi)
for i in range(len(modes)):
ax[i] = pivot_df.loc[:, modes[i]].plot.bar(figsize=(figure_size[0], figure_size[1]*len(modes)),
ax=ax[i], title=modes[i], color=my_colors[i])
ax[i].legend()
fig.suptitle(name)
import numpy as np
import pandas as pd
imoprt matplotlib.pyplot as plt
fig, ax = plt.subplots(2,2)
df = pd.DataFrame({'A':np.random.randint(1,100,10),
'B': np.random.randint(100,1000,10),
'C':np.random.randint(100,200,10)})
for ax in ax.flatten():
df.plot(ax =ax)

Combining Pandas Subplots into a Single Figure

I'm having trouble understanding Pandas subplots - and how to create axes so that all subplots are shown (not over-written by subsequent plot).
For each "Site", I want to make a time-series plot of all columns in the dataframe.
The "Sites" here are 'shark' and 'unicorn', both with 2 variables. The output should be be 4 plotted lines - the time-indexed plot for Var 1 and Var2 at each site.
Make Time-Indexed Data with Nans:
df = pd.DataFrame({
# some ways to create random data
'Var1':pd.np.random.randn(100),
'Var2':pd.np.random.randn(100),
'Site':pd.np.random.choice( ['unicorn','shark'], 100),
# a date range and set of random dates
'Date':pd.date_range('1/1/2011', periods=100, freq='D'),
# 'f':pd.np.random.choice( pd.date_range('1/1/2011', periods=365,
# freq='D'), 100, replace=False)
})
df.set_index('Date', inplace=True)
df['Var2']=df.Var2.cumsum()
df.loc['2011-01-31' :'2011-04-01', 'Var1']=pd.np.nan
Make a figure with a sub-plot for each site:
fig, ax = plt.subplots(len(df.Site.unique()), 1)
counter=0
for site in df.Site.unique():
print(site)
sitedat=df[df.Site==site]
sitedat.plot(subplots=True, ax=ax[counter], sharex=True)
ax[0].title=site #Set title of the plot to the name of the site
counter=counter+1
plt.show()
However, this is not working as written. The second sub-plot ends up overwriting the first. In my actual use case, I have 14 variable number of sites in each dataframe, as well as a variable number of 'Var1, 2, ...'. Thus, I need a solution that does not require creating each axis (ax0, ax1, ...) by hand.
As a bonus, I would love a title of each 'site' above that set of plots.
The current code over-writes the first 'Site' plot with the second. What I missing with the axes here?!
When you are using DataFrame.plot(..., subplot=True) you need to provide the correct number of axes that will be used for each column (and with the right geometry, if using layout=). In your example, you have 2 columns, so plot() needs two axes, but you are only passing one in ax=, therefore pandas has no choice but to delete all the axes and create the appropriate number of axes itself.
Therefore, you need to pass an array of axes of length corresponding to the number of columns you have in your dataframe.
# the grouper function is from itertools' cookbook
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
fig, axs = plt.subplots(len(df.Site.unique())*(len(df.columns)-1),1, sharex=True)
for (site,sitedat),axList in zip(df.groupby('Site'),grouper(axs,len(df.columns)-1)):
sitedat.plot(subplots=True, ax=axList)
axList[0].set_title(site)
plt.tight_layout()

Mutiple plots in a single window

I need to draw many such rows (for a0 .. a128) in a single window. I've searched in FacetGrid, PairGrid and all over around but couldn't find. Only regplot has similar argument ax but it doesn't plot histograms. My data is 128 real valued features with label column [0, 1]. I need the graphs to be shown from my Python code as a separate application on Linux.
Also, it there a way to scale this histogram to show relative values on Y such that the right curve is not skewed?
g = sns.FacetGrid(df, col="Result")
g.map(plt.hist, "a0", bins=20)
plt.show()
Just a simple example using matplotlib. The code is not optimized (ugly, but simple plot-indexing):
import numpy as np
import matplotlib.pyplot as plt
N = 5
data = np.random.normal(size=(N*N, 1000))
f, axarr = plt.subplots(N, N) # maybe you want sharex=True, sharey=True
pi = [0,0]
for i in range(data.shape[0]):
if pi[1] == N:
pi[0] += 1 # next row
pi[1] = 0 # first column again
axarr[pi[0], pi[1]].hist(data[i], normed=True) # i was wrong with density;
# normed=True should be used
pi[1] += 1
plt.show()
Output:

Seaborn scatterplot matrix - adding extra points with custom styles

I'm doing a k-means clustering of activities on some open source projects on GitHub and am trying to plot the results together with the cluster centroids using Seaborn Scatterplot Matrix.
I can successfully plot the results of the clustering analysis (example tsv output below)
user_id issue_comments issues_created pull_request_review_comments pull_requests category
1 0.14936519790888722 2.0100502512562812 0.0 0.60790273556231 Group 0
1882 0.11202389843166542 0.5025125628140703 0.0 0.0 Group 1
2 2.315160567587752 20.603015075376884 0.13297872340425532 1.21580547112462 Group 2
1789 36.8185212845407 82.91457286432161 75.66489361702128 74.46808510638297 Group 3
The problem I'm having is that I'd like to be able to also plot the centroids of the clusters on the matrix plot too. Currently I'm my plotting script looks like this:
import seaborn as sns
import pandas as pd
from pylab import savefig
sns.set()
# By default, Pandas assumes the first column is an index
# so it will be skipped. In our case it's the user_id
data = pd.DataFrame.from_csv('summary_clusters.tsv', sep='\t')
grid = sns.pairplot(data, hue="category", diag_kind="kde")
savefig('normalised_clusters.png', dpi = 150)
This produces the expected output:
I'd like to be able to mark on each of these plots the centroids of the clusters. I can think of two ways to do this:
Create a new 'CENTROID' category and just plot this together with the other points.
Manually add extra points to the plots after calling sns.pairplot(data, hue="category", diag_kind="kde").
If (1) is the solution then I'd like to be able to customise the marker (perhaps a star?) to make it more prominent.
If (2) I'm all ears. I'm pretty new to Seaborn and Matplotlib so any assistance would be very welcome :-)
pairplot isn't going to be all that well suited to this sort of thing, but it's possible to make it work with a few tricks. Here's what I would do.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
sns.set_color_codes()
# Make some random iid data
cov = np.eye(3)
ds = np.vstack([np.random.multivariate_normal([0, 0, 0], cov, 50),
np.random.multivariate_normal([1, 1, 1], cov, 50)])
ds = pd.DataFrame(ds, columns=["x", "y", "z"])
# Fit the k means model and label the observations
km = KMeans(2).fit(ds)
ds["label"] = km.labels_.astype(str)
Now comes the non-obvious part: you need to create a dataframe with the centroid locations and then combine it with the dataframe of observations while identifying the centroids as appropriate using the label column:
centroids = pd.DataFrame(km.cluster_centers_, columns=["x", "y", "z"])
centroids["label"] = ["0 centroid", "1 centroid"]
full_ds = pd.concat([ds, centroids], ignore_index=True)
Then you just need to use PairGrid, which is a bit more flexible than pairplot and will allow you to map other plot attributes by the hue variable along with the color (at the expense of not being able to draw histograms on the diagonals):
g = sns.PairGrid(full_ds, hue="label",
hue_order=["0", "1", "0 centroid", "1 centroid"],
palette=["b", "r", "b", "r"],
hue_kws={"s": [20, 20, 500, 500],
"marker": ["o", "o", "*", "*"]})
g.map(plt.scatter, linewidth=1, edgecolor="w")
g.add_legend()
An alternate solution would be to plot the observations as normal then change the data attributes on the PairGrid object and add a new layer. I'd call this a hack, but in some ways it's more straightforward.
# Plot the data
g = sns.pairplot(ds, hue="label", vars=["x", "y", "z"], palette=["b", "r"])
# Change the PairGrid dataset and add a new layer
centroids = pd.DataFrame(km.cluster_centers_, columns=["x", "y", "z"])
g.data = centroids
g.hue_vals = [0, 1]
g.map_offdiag(plt.scatter, s=500, marker="*")
I know I'm a bit late to the party, but here is a generalized version of mwaskom's code to work with n clusters. Might save someone a few minutes
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
def cluster_scatter_matrix(data_norm, cluster_number):
sns.set_color_codes()
km = KMeans(cluster_number).fit(data_norm)
data_norm["label"] = km.labels_.astype(str)
centroids = pd.DataFrame(km.cluster_centers_, columns=data_norm.columns)
centroids["label"] = [str(n)+" centroid" for n in range(cluster_number)]
full_ds = pd.concat([data_norm, centroids], ignore_index=True)
g = sns.PairGrid(full_ds, hue="label",
hue_order=[str(n) for n in range(cluster_number)]+[str(n)+" centroid" for n in range(cluster_number)],
#palette=["b", "r", "b", "r"],
hue_kws={"s": [ 20 for n in range(cluster_number)]+[500 for n in range(cluster_number)],
"marker": [ 'o' for n in range(cluster_number)]+['*' for n in range(cluster_number)]}
)
g.map(plt.scatter, linewidth=1, edgecolor="w")
g.add_legend()

Annotate data points while plotting from Pandas DataFrame

I would like to annotate the data points with their values next to the points on the plot. The examples I found only deal with x and y as vectors. However, I would like to do this for a pandas DataFrame that contains multiple columns.
ax = plt.figure().add_subplot(1, 1, 1)
df.plot(ax = ax)
plt.show()
What is the best way to annotate all the points for a multi-column DataFrame?
Here's a (very) slightly slicker version of Dan Allan's answer:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)},
index=list(string.ascii_lowercase[:10]))
Which gives:
x y
a 0.541974 0.042185
b 0.036188 0.775425
c 0.950099 0.888305
d 0.739367 0.638368
e 0.739910 0.596037
f 0.974529 0.111819
g 0.640637 0.161805
h 0.554600 0.172221
i 0.718941 0.192932
j 0.447242 0.172469
And then:
fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)
for k, v in df.iterrows():
ax.annotate(k, v)
Finally, if you're in interactive mode you might need to refresh the plot:
fig.canvas.draw()
Which produces:
Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:
from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0,
c=range(len(df)), colormap=cmap)
for k, v in df.iterrows():
ax.annotate(k, v,
xytext=(10,-5), textcoords='offset points',
family='sans-serif', fontsize=18, color='darkslategrey')
Which looks a lot nicer:
Do you want to use one of the other columns as the text of the annotation? This is something I did recently.
Starting with some example data
In [1]: df
Out[1]:
x y val
0 -1.015235 0.840049 a
1 -0.427016 0.880745 b
2 0.744470 -0.401485 c
3 1.334952 -0.708141 d
4 0.127634 -1.335107 e
Plot the points. I plot y against x, in this example.
ax = df.set_index('x')['y'].plot(style='o')
Write a function that loops over x, y, and the value to annotate beside the point.
def label_point(x, y, val, ax):
a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
for i, point in a.iterrows():
ax.text(point['x'], point['y'], str(point['val']))
label_point(df.x, df.y, df.val, ax)
draw()
Let's assume your df has multiple columns, and three of which are x, y, and lbl. To annotate your (x,y) scatter plot with lbl, simply:
ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);
I found the previous answers quite helpful, especially LondonRob's example that improved the layout a bit.
The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.
Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):
ax = df.plot('x', 'y', kind='scatter', s=50 )
def annotate_df(row):
ax.annotate(row.name, row.values,
xytext=(10,-5),
textcoords='offset points',
size=18,
color='darkslategrey')
_ = df.apply(annotate_df, axis=1)
Edit Notes
I edited my code example recently. Originally it used the same:
fig, ax = plt.subplots()
as the other posts to expose the axes, however this is unnecessary and makes the:
import matplotlib.pyplot as plt
line also unnecessary.
Also note:
If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
Depending on the points, you may have to play with the xytext values to get better placements.