I'd like a 95% confidence interval line above and below my data line - as opposed to vertical bars at each point.
Is there a way that I can do this in gnuplot without plotting another line? Or do I need to plot another line and then label it appropriately?
You can use the filledcurves style to fill the region of 95% confidence. Consider the example data file data.dat with the content:
# x y ylow yhigh
1 3 2.6 3.5
2 5 4 6
3 4 3.2 4.3
4 3.5 3.3 3.7
and plot this with the script
set style fill transparent solid 0.2 noborder
plot 'data.dat' using 1:3:4 with filledcurves title '95% confidence', \
'' using 1:2 with lp lt 1 pt 7 ps 1.5 lw 3 title 'mean value'
to get
To plot the data, with the mean and the standard deviation as error bars.
Save the below code as example.gnuplot
set terminal pdf size 6, 4.5 enhanced font "Times-New-Roman,20"
set output 'out.pdf'
red = "#CC0000"; green = "#4C9900"; blue = "#6A5ACD"; skyblue = "#87CEEB"; violet = "#FF00FF"; brown = "#D2691E";
set xrange [0:*] nowriteback;
set yrange [0.0: 10.0]
set title "Line graph with confidence interval"
set xlabel "X Axis"
set ylabel "Y Axis"
plot "data.dat" using 1:2:3 title "l1" with yerrorlines lw 3 lc rgb red,\
''using 1:4:5 title "l2" with yerrorlines lw 3 lc rgb brown
Create a new file called "data.dat" and give some sample values, such as
X Y Stddev y1 std
1 2 0.5 3 0.25
2 4 0.2 5 0.3
3 3 0.3 4 0.35
4 5 0.1 6 0.3
5 6 0.2 7 0.25
Run the script using the command gnuplot example.gnuplot
Related
I want to create a scatter plot that drives its x values from one dataframe and y values from another dataframe having multiple columns.
x_df :
red blue
0 1 2
1 2 3
2 3 4
y_df:
red blue
0 1 2
1 2 3
2 3 4
I want to plot a scatter plot like
I would like to have two red and blue traces such that x values should come from x_df and y values are derived from y_df.
at some layer you need to do data integration. IMHO better to be done at data layer i.e. pandas
have modified your sample data so two traces do not overlap
used join() assuming that index of data frames is the join key
could have further structured dataframe, however I generated multiple traces using plotly express modifying as required to ensure colors and legends are created
have not considered axis labels...
x_df = pd.read_csv(io.StringIO(""" red blue
0 1 2
1 2 3
2 3 4"""), sep="\s+")
y_df = pd.read_csv(io.StringIO(""" red blue
0 1.1 2.2
1 2.1 3.2
2 3.1 4.2"""), sep="\s+")
df = x_df.join(y_df, lsuffix="_x", rsuffix="_y")
px.scatter(df, x="red_x", y="red_y").update_traces(
marker={"color": "red"}, name="red", showlegend=True
).add_traces(
px.scatter(df, x="blue_x", y="blue_y")
.update_traces(marker={"color": "blue"}, name="blue", showlegend=True)
.data
)
I have a list of t-shirt orders along with the corresponding size and I would like to plot them in pie chart for each design showing the percentage in which size sells the most etc.
Design Total
0 Boba L 9
1 Boba M 4
2 Boba S 2
3 Boba XL 5
4 Burger L 6
5 Burger M 2
6 Burger S 3
7 Burger XL 1
8 Donut L 5
9 Donut M 9
10 Donut S 2
11 Donut XL 5
It is not complete clear what you asking, but here is my interpretation:
df[['Design', 'Size']] = df['Design'].str.rsplit(n=1, expand=True)
fig, ax = plt.subplots(1, 3, figsize=(10,8))
ax = iter(ax)
for t, g in df.groupby('Design'):
g.set_index('Size')['Total'].plot.pie(ax=next(ax), autopct='%.2f', title=f'{t}')
Maybe you want:
df = pd.read_clipboard() #create data from above text no modification
dfplot = df.loc[df.groupby(df['Design'].str.rsplit(n=1).str[0])['Total'].idxmax(), :]
ax = dfplot.set_index('Design')['Total'].plot.pie(autopct='%.2f')
ax.set_ylabel('');
Let do groupby.plot.pie:
(df.Design.str.split(expand=True)
.assign(Total=df['Total'])
.groupby(0)
.plot.pie(x=1,y='Total', autopct='%.1f%%')
)
# format the plots
for design, ax in s.iteritems():
ax.set_title(design)
one of the output:
I have a dataset as shown below, each sample has x and y values and the corresponding result
Sr. X Y Resut
1 2 12 Positive
2 4 3 positive
....
Visualization
Grid size is 12 * 8
How I can calculate the nearest distance for each sample from red points (positive ones)?
Red = Positive,
Blue = Negative
Sr. X Y Result Nearest-distance-red
1 2 23 Positive ?
2 4 3 Negative ?
....
dataset
Its a lot easier when there is sample data, make sure to include that next time.
I generate random data
import numpy as np
import pandas as pd
import sklearn
x = np.linspace(1,50)
y = np.linspace(1,50)
GRID = np.meshgrid(x,y)
grid_colors = 1* ( np.random.random(GRID[0].size) > .8 )
sample_data = pd.DataFrame( {'X': GRID[0].flatten(), 'Y':GRID[1].flatten(), 'grid_color' : grid_colors})
sample_data.plot.scatter(x="X",y='Y', c='grid_color', colormap='bwr', figsize=(10,10))
BallTree (or KDTree) can create a tree to query with
from sklearn.neighbors import BallTree
red_points = sample_data[sample_data.grid_color == 1]
blue_points = sample_data[sample_data.grid_color != 1]
tree = BallTree(red_points[['X','Y']], leaf_size=15, metric='minkowski')
and use it with
distance, index = tree.query(sample_data[['X','Y']], k=1)
now add it to the DataFrame
sample_data['nearest_point_distance'] = distance
sample_data['nearest_point_X'] = red_points.X.values[index]
sample_data['nearest_point_Y'] = red_points.Y.values[index]
which gives
X Y grid_color nearest_point_distance nearest_point_X \
0 1.0 1.0 0 2.0 3.0
1 2.0 1.0 0 1.0 3.0
2 3.0 1.0 1 0.0 3.0
3 4.0 1.0 0 1.0 3.0
4 5.0 1.0 1 0.0 5.0
nearest_point_Y
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
Modification to have red point not find themself;
Find the nearest k=2 instead of k=1;
distance, index = tree.query(sample_data[['X','Y']], k=2)
And, with help of numpy indexing, make red points use the second instead of the first found;
sample_size = GRID[0].size
sample_data['nearest_point_distance'] = distance[np.arange(sample_size),sample_data.grid_color]
sample_data['nearest_point_X'] = red_points.X.values[index[np.arange(sample_size),sample_data.grid_color]]
sample_data['nearest_point_Y'] = red_points.Y.values[index[np.arange(sample_size),sample_data.grid_color]]
The output type is the same, but due to randomness it won't agree with earlier made picture.
cKDTree for scipy can calculate that distance for you. Something along those lines should work:
df['Distance_To_Red'] = cKDTree(coordinates_of_red_points).query((df['x'], df['y']), k=1)
I try to make a barchart in pandas, with two data series coming from a groupby:
data.groupby(['popup','UID']).size().groupby(level=0).value_counts().unstack().transpose().plot(kind='bar', layout=(2,2))
The x axis is not continuous, and only shows values that are in the dataset. In this example, it jumps from 11 to 13.
How can I make it continuous?
**EDIT 2: **
I tried JohnE datacentric approach, and it works. It creates a new index with no missing values:
temp = data.groupby(['popup','UID']).size().groupby(level=0).value_counts().unstack().transpose()
temp.reindex(np.arange(temp.index.min(), temp.index.max())).plot(kind='bar', layout=(2,2))
However, I assume there should be a better approach with histogram instead of bar plot. The best I could do with histograms is:
data.groupby(['popup','UID']).size().groupby(level=0).plot(kind='hist', bins=30, alpha=0.5, layout=(2,2), legend=True)
But I didn't find any option in hist plot to get the same rendering than bar plot, without bar overlapping.
**EDIT: ** Here are some information to answer comments.
Data sample:
INSEE C1 popup C3 date \
0 75101.0 0.0 0 NaN 2017-05-17T13:20:16Z
0 75101.0 0.0 0 NaN 2017-05-17T14:23:51Z
1 31557.0 0.0 1 NaN 2017-05-17T14:58:27Z
UID
0 ba4bd353-f14d-4bc5-95ba-6a1f5134cc84
0 ba4bd353-f14d-4bc5-95ba-6a1f5134cc84
1 bafe9715-3a07-4d9b-b85c-0bbf658a9115
First groupby result (sample):
data.groupby(['popup','UID']).size().head(3)
popup UID
0 016d3e7e-1901-4f84-be0e-117988ec57a8 6
01c15455-29cc-4d1e-8743-638fd0f51602 6
03fc9eb0-c5fb-4205-91f0-4b74f78a8b96 3
dtype: int64
Second groupby result (sample):
data.groupby(['popup','UID']).size().groupby(level=0).value_counts().head(3)
popup
0 1 46
3 23
4 22
dtype: int64
After unstack and transpose:
data.groupby(['popup','UID']).size().groupby(level=0).value_counts().unstack().transpose().head(3)
popup 0 1
1 46.0 38.0
2 21.0 35.0
3 23.0 22.0
There is a solution with histogram plot from matplotlib.axes.Axes.hist. It is better to use histograms than bar plots for this purpose, as we can choose the number of bins.
# Separate groups by 'popup' and count number of records for each 'UID'
popup_values = data['popup'].unique()
count_by_popup = [data[data['popup'] == popup_value].groupby(['UID']).size() for popup_value in popup_values]
# Create histogram
fig, ax = plt.subplots()
ax.hist(count_by_popup, 20, histtype='bar', label=[str(x) for x in popup_values])
ax.legend()
plt.show()
I am looking for a code which positions the color bar by itself. Here is graph:
I used the set_pareas.gs script to fix the graphs in columns and color.gs script to color the plots. The color bar script is xcbar.gs. Here are the command lines
c
set_parea 1 3 1 1 -margin 0.8
color 0 12 1.2 -kind red->orange->yellow->dodgerblue->blue
d var1
set_parea 1 3 1 2 -margin 0.8
color 0 12 1.2 -kind red->orange->yellow->dodgerblue->blue
d var2
set_parea 1 3 1 3 -margin 0.8
color -12 12 2.4 -kind blue->white->red
d var1-var2
I would like that the color bar stay just below the differences map and red->orange->yellow->dodgerblue->blue color bar stay just below the orange maps.
You can adjust the position of color bar in the command of xcbar.
The below document maybe necessary for you:
http://kodama.fubuki.info/wiki/wiki.cgi/GrADS/script/xcbar.gs?lang=en