I'm trying to plot the data of my DataFarme in a groupedChart and I want the columns to preserve the order I gave them before. The data looks as follows (its not all there but its in the same way organized)
dataframe
When I plot it I get the following Graph:
graph
So the months were sorted even though I specified not to sort in the chart. I used the following code:
chart2 = alt.Chart(melted).mark_bar().encode(
column=alt.Column('variable',sort=None),
x=alt.X('room',sort=None),
y=alt.Y('value'),
color='room',
tooltip= ['room', 'value']
)
Does anyone know how I could fix that?
You've already used sort=None, which is the correct way to make scales in a non-faceted chart reflect the input order.
The missing piece is that faceted charts share scales by default (See Scale and Guide Resolution), so each facet is being forced to share an order.
If you make the x scale resolution independent, then each facet should retain the input order:
chart2 = alt.Chart(melted).mark_bar().encode(
column=alt.Column('variable',sort=None),
x=alt.X('room',sort=None),
y=alt.Y('value'),
color='room',
tooltip= ['room', 'value']
).resolve_scale(x='independent')
Related
I am currently in the process of moving from R and ggplot2 to seaborn for a lot of work because R was struggling with the size of data I was using. I am currently working on a heatmap that is fairly simplistic and I have been able to render the general heatmap without too many issues, but I am not sure how to adjust the ordering of my categoricals for the heatmap.
In this case my data has this header:
Sample Position Depth Order
Sample is the "y-axis" categorical and Position is the "x-axis" categorical. Depth is the value of the cell. Order is a meta-value calculated elsewhere, but I want to use Order as my ordering value for the y-axis, while retaining Sample as the label. Is there a way to do this?
You need to provide a rectangular format, or matrix for sns.heatmap, so though you have a Order column for ordering Sample, it's not clear whether there is a unique value for each 'Order' category.
Below I use a simple example, and basically you change the 'Sample' to a category, according to the mean value of 'Order'. It is like changing the factor levels in R. Also, you need to make sure there is no NaN otherwise the heatmap might complain:
df = pd.DataFrame({'Sample':np.repeat(['A','B','C'],4),
'Position':[1,2,3,4]*3,
'Depth':np.random.normal(0,1,12),
'Order':np.repeat([2,1,3],4)})
y_order = df.groupby('Sample')['Order'].agg('mean').sort_values().index
df['Sample'] = pd.Categorical(df['Sample'],ordered=True,categories=y_order)
sns.heatmap(df.pivot(index='Sample',columns='Position', values='Depth'))
This question already has answers here:
x-axis inverted unexpectedly by pandas.plot(...)
(2 answers)
Closed 4 years ago.
When I was plotting two series of data against eachother, the X axis was inverted unexpectedly. I know this question sounds pretty similar to this other: x-axis inverted unexpectedly by pandas.plot(...) and it actually is, but I want to know if this can be disabled or something, not a workaround. Let me explain myself.
I have a very simple DF that consists on a datetime index and two columns; one has humidity measurements and the other daily weights. Both of them are in descending order because when my sample loses water, it also loses weight and humidity. So my DF looks something like this, where my data is in descending order
But then, when I plot using X = "Peso" (weight), and Y = 'Humedad' (humidity), my X axis goes in ascending order insted of descending order.
My ploting code:
plt.figure(figsize=(12,9))
plt.scatter(data['Peso'],data['Humedad'])
plt.xlabel('Peso (kg)',fontsize=14)
plt.ylabel("Raw Counts",fontsize=14)
plt.xticks(rotation=90,fontsize=10)
plt.grid()
Resulting in this kind of plot, where X axis is inverted
So, I could do two simple types of workaround:
plt.scatter(sorted(data['Peso']),data['Humedad'])
or
plt.scatter(data['Peso'][::-1],data['Humedad'])
Both of them have the same result, they print my data as I wanted, BUT my xticks are still inverted:
So what I did was creating a list with my weight values in order to insert it as it follows:
semin=data['Peso']
semin=semin.tolist()
And then adding it to my plt.xticks like this
plt.xticks(semin,rotation=90,fontsize=10)
It "kind off" worked, overlaping some of the xticks as you can see in the image below:
I know I can solve this with [Locs] and general xticks information, but I really wanted to know if it's possible to just ask Pandas to follow the natural data descending order or anything similiar and avoiding all of this xticks stuff?
I've checked this too: https://github.com/pandas-dev/pandas/issues/10118
and I tried by doing the set_index suggestion:
plt.figure(figsize=(12,9))
data.set_index('Peso').Humedad.plot()
plt.xlabel('Peso (kg)',fontsize=14)
plt.ylabel("Raw Counts",fontsize=14)
plt.xticks(rotation=90,fontsize=10)
plt.grid()
And it went almost perfect, except that I needed it in scatter...
So I tried some stuff to "scatter it"
1. Putting the marker type:
data.set_index('Peso').Humedad.plot(marker='o')
Got a marker + line graph:
2. Changing .plot for .scatter to the plot:
data.set_index('Peso').Humedad.scatter()
Got this error:
AttributeError: 'Series' object has no attribute 'scatter'
3. Using both
data.set_index('Peso').Humedad.plot.scatter()
Got this one:
AttributeError: 'SeriesPlotMethods' object has no attribute 'scatter'
4. Making this giant question. Please help.
And that's all, sorry if I'm missing something or if my post is too long. I'm open to suggestions, corrections or anything you're willing to tell me.
Thanks!
Oh I just saw that the linked question actually does exactly what you need. Will leave this as here. But please refer to the linked question instead.
It's not a solution to change the ticks! The resultung plot may easily get completely wrong.
Instead google for "invert x axis" or so and find that you can invert the axis via
ax = df.plot(...)
ax.invert_xaxis()
This is not a workaround. It is the solution. (How much easier can it get?)
In a context of a line chart displaying time data in regular intervals where multiple series might overlap what would be the optimal way to:
A) hint the user that the chart has overlapping series?
B) give the user the capability to visualize all those series? Like spanning the series somehow?
For overlapping series in a line chart, I would keep the traditional line chart but put a label at the end of the graph with a color legend. The legend and label will help the user get information quickly.
Another version of a line chart for overlapping series can be a line area chat.
If you are not stuck on only line charts, I would suggest a bar chart. Below are some examples that you can use.
Example 1:
Example 2:
Example 3:
There are couple ways to indicate that there are overlapping series on a chart. You can increase the marker radius of one of them. The number of legend elements tells you how many series there is, too. Finally, you can distribute series on a different yAxis, with different top and height properties. Also, in styled mode, when you hover on legend item, other series opacity changes.
API Reference:
http://api.highcharts.com/highcharts/plotOptions.line.marker.radius
Examples:
http://jsfiddle.net/whsgpdyw/ - changing marker radius
http://jsfiddle.net/fuq6j4sg/ - each series on a different yAxis
I'm trying to display 2D data with axis labels using both contour and pcolormesh. As has been noted on the matplotlib user list, these functions obey different conventions: pcolormesh expects the x and y values to specify the corners of the individual pixels, while contour expects the centers of the pixels.
What is the best way to make these behave consistently?
One option I've considered is to make a "centers-to-edges" function, assuming evenly spaced data:
def centers_to_edges(arr):
dx = arr[1]-arr[0]
newarr = np.linspace(arr.min()-dx/2,arr.max()+dx/2,arr.size+1)
return newarr
Another option is to use imshow with the extent keyword set.
The first approach doesn't play nicely with 2D axes (e.g., as created by meshgrid or indices) and the second discards the axis numbers entirely
Your data is a regular mesh? If it doesn't, you can use griddata() to obtain it. I think that if your data is too big, a sub-sampling or regularization always is possible. If the data is too big, maybe your output image always will be small compared with it and you can exploit this.
If you use imshow() with "extent" and "interpolation='nearest'", you will see that the data is cell-centered, and extent provided the lower edges of cells (corners). On the other hand, contour assumes that the data is cell-centered, and X,Y must be the center of cells. So, you need to be care about the input domain for contour. The trivial example is:
x = np.arange(-10,10,1)
X,Y = np.meshgrid(x,x)
P = X**2+Y**2
imshow(P,extent=[-10,10,-10,10],interpolation='nearest',origin='lower')
contour(X+0.5,Y+0.5,P,20,colors='k')
My tests told me that pcolormesh() is a very slow routine, and I always try to avoid it. griddata and imshow() always is a good choose for me.
The page is
"http://matplotlib.sourceforge.net/examples/pylab_examples/histogram_demo_extended.html"
Let's look at the y-axis, the numbers there do not make any sense, could we change it to something else that is meaningful?
Except the cumulative distribution plot, and the last one, the rest of the y-axes data show normalized histogram values with normed=1 keyword set (i.e., the are underneath the histogram equals to 1 as in the definition of a probability density function (PDF))
You can use yticks(), see this example.