How to create a box plot with SD from table of My Drive? - matplotlib

I am trying to get a basic box-bars plot from a dataset
I am a newbie.
latency = np.array(table1.Latency)
treatment = np.array(table1.group)
I tried matplotlib but else not know what to do.

Related

How to group a Box plot by the column names of a data frame in Seaborn? [duplicate]

This question already has answers here:
Boxplot of Multiple Columns of a Pandas Dataframe on the Same Figure (seaborn)
(4 answers)
Closed 24 days ago.
I'm a beginner trying to learn data science and this is my first time using the seaborn and matplotlib libraries. I have this practice dataset and data frame :
that I want to turn into a box plot and I want the x-axis to have all of the column names and the y-axis to range from 0 - 700 but, I'm not sure what to do.
I tried using : random_variable = sms.boxplot(data = df, x = ?, y = 'TAX')
which does give me the y-axis that is close to what I am looking for but, I don't know what the x-axis should be set equal too.
I thought may I could use the keys of the dataframe but, all I got was this mess that doesn't work:
random_variable = sms.boxplot(x = df.keys(), y = df['TAX'])
I want it to look like this but, I'm really lost on how to do this:
I apologize if this is an easy fix but, I would appreciate any help.
If you just want to display your data like that go with
import seaborn as sns
sns.boxplot(data=df)

How to plot several coordinates from pymongo plotly and mapbox

I'm working in a project in which i need to plot several coordinates from a mongodb query, i can plot a single result and works awesome, but i need to figure out how to plot several coordinates from the query, the query contains the info needed but i don;t know how to iterate correctly in the result file i think that's my problem , i'm running on circles now so i need some help .
Code that works for a single result :
countE2 = mydb.results.find({"d.latitude":{"$gt":0},"d.longitude":{"$gt":0}},{"d.latitude":1, "d.longitude": 1,"a":1,"u.Server":1}).limit(10).sort('t', pymongo.DESCENDING)
for resultado in countE2:
lat = resultado["d"]["latitude"]
lon = resultado["d"]["longitude"]
ip = resultado["a"]
server = resultado["u"]["Server"]
us_cities = [lat,lon,ip,server]
y = us_cities
print(y)
fig = px.scatter_mapbox(y,lat=[y[0]], lon=[y[1]],hover_name=[y[2]],hover_data=[[y[3]],[y[3]]],color_discrete_sequence=["green"], zoom=3, height=1000)
print(fig)
fig.update_layout(mapbox_style="dark", mapbox_accesstoken=token)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
Please consider that the query returns several results ,in this case countE.
Thanks

Visualizing Data, Tracking Specific SD Values

BLUF: I want to track a specific Std Dev, e.g. 1.0 to 1.25, by color coding it and making a separate KDF or other probability density graph.
What I want to do with this is be able to pick out other Std Dev ranges and get back new graphs that I can turn around and use to predict outcomes in that specific Std Dev.
Data: https://www.dropbox.com/s/y78pynq9onyw9iu/Data.csv?dl=0
What I have so far is normalized data that looks like a shotgun blast:
Code used to produce it:
data = pd.read_csv("Data.csv")
sns.jointplot(data.x,data.y, space=0.2, size=10, ratio=2, kind="reg");
What I want to achieve here looks like what I have marked up below:
I kind of know how to do this in RStudio using RidgePlot-type functions, but I'm at a loss here in Python, even while using Seaborn. Any/All help appreciated!
The following code might point you in the right directly, you can tweak the appearance of the plot as you please from there.
tips = sns.load_dataset("tips")
g = sns.jointplot(x="total_bill", y="tip", data=tips)
top_lim = 4
bottom_lim = 2
temp = tips.loc[(tips.tip>=bottom_lim)&(tips.tip<top_lim)]
g.ax_joint.axhline(top_lim, c='k', lw=2)
g.ax_joint.axhline(bottom_lim, c='k', lw=2)
# we have to create a secondary y-axis to the joint-plot, otherwise the
# kde might be very small compared to the scale of the original y-axis
ax_joint_2 = g.ax_joint.twinx()
sns.kdeplot(temp.total_bill, shade=True, color='red', ax=ax_joint_2, legend=False)
ax_joint_2.spines['right'].set_visible(False)
ax_joint_2.spines['top'].set_visible(False)
ax_joint_2.yaxis.set_visible(False)

Convert date/time index of external dataset so that pandas would plot clearly

When you already have time series data set but use internal dtype to index with date/time, you seem to be able to plot the index cleanly as here.
But when I already have data files with columns of date&time in its own format, such as [2009-01-01T00:00], is there a way to have this converted into the object that the plot can read? Currently my plot looks like the following.
Code:
dir = sorted(glob.glob("bsrn_txt_0100/*.txt"))
gen_raw = (pd.read_csv(file, sep='\t', encoding = "utf-8") for file in dir)
gen = pd.concat(gen_raw, ignore_index=True)
gen.drop(gen.columns[[1,2]], axis=1, inplace=True)
#gen['Date/Time'] = gen['Date/Time'][11:] -> cause error, didnt work
filter = gen[gen['Date/Time'].str.endswith('00') | gen['Date/Time'].str.endswith('30')]
filter['rad_tot'] = filter['Direct radiation [W/m**2]'] + filter['Diffuse radiation [W/m**2]']
lis = np.arange(35040) #used the number of rows, checked by printing. THis is for 2009-2010.
plt.xticks(lis, filter['Date/Time'])
plt.plot(lis, filter['rad_tot'], '.')
plt.title('test of generation 2009')
plt.xlabel('Date/Time')
plt.ylabel('radiation total [W/m**2]')
plt.show()
My other approach in mind was to use plotly. Yet again, its main purpose seems to feed in data on the internet. It would be best if I am familiar with all the modules and try for myself, but I am learning as I go to use pandas and matplotlib.
So I would like to ask whether there are anyone who experienced similar issues as I.
I think you need set labels to not visible by loop:
ax = df.plot(...)
spacing = 10
visible = ax.xaxis.get_ticklabels()[::spacing]
for label in ax.xaxis.get_ticklabels():
if label not in visible:
label.set_visible(False)

Tensorboard histograms to matplotlib

I would like to "dump" the tensorboard histograms and plot them via matplotlib. I would have more scientific paper appealing plots.
I managed to hack the way through the Summary file using the tf.train.summary_iterator and dump the histogram that I wanted to dump( tensorflow.core.framework.summary_pb2.HistogramProto object).
By doing that and implementing what the java-script code does with the data (https://github.com/tensorflow/tensorboard/blob/c2fe054231fe77f3a5b05dbc519f713d2e738d1c/tensorboard/plugins/histogram/tf_histogram_dashboard/histogramCore.ts#L104), I managed to get something similar (same trends) with the tensorboard plots, but not the exact same plot.
Can I have some light on this?
Thanks
In order to plot a tensorboard histogram with matplotlib I am doing the following:
event_acc = EventAccumulator(path, size_guidance={
'histograms': STEP_COUNT,
})
event_acc.Reload()
tags = event_acc.Tags()
result = {}
for hist in tags['histograms']:
histograms = event_acc.Histograms(hist)
result[hist] = np.array([np.repeat(np.array(h.histogram_value.bucket_limit), np.array(h.histogram_value.bucket).astype(np.int)) for h in histograms])
return result
h.histogram_value.bucket_limit gives me the value and h.histogram_value.bucket the count of this value. So when i repeat the values accordingly (np.repeat(...)), I get a huge array of expected size. This array can now be plotted with the default matplotlib logic.
The best solution is loading all events and reconstructing all the histogram (as the answer of #khuesmann) but not using EventAccumulator but EventFileLoader. This will give you a histogram per wall time and step as the ones Tensorboard plots. It can be extended to return a list of actions by timestep and wall time.
Don't forget to check which tag will you use.
from tensorboard.backend.event_processing.event_file_loader import EventFileLoader
# Just in case, PATH_OF_FILE is the path of the file, not the folder
loader = EventFileLoader(PATH_Of_FILE)
# Where to store values
wtimes,steps,actions = [],[],[]
for event in loader.Load():
wtime = event.wall_time
step = event.step
if len(event.summary.value) > 0:
summary = event.summary.value[0]
if summary.tag == HISTOGRAM_TAG:
wtimes += [wtime]*int(summary.histo.num)
steps += [step] *int(summary.histo.num)
for num,val in zip(summary.histo.bucket,summary.histo.bucket_limit):
actions += [val] *int(num)
bear in mind that tensorflow approximates the actions and treats the actions as continuous variables, so even if you have discrete actions (e.g. 0,1,3) you will end up actions as 0.2,0.4,0.9,1.4 ... in that case round the values will do it.
A good solution is the one from #khuesmann, but this only allows you to retrieve the accumulated histogram, not the histogram per step -- which is the one actually being showed in tensorboard.
If you want the distribution and so far, what I have understood is that Tensorboard usually compresses the histogram to decrease the memory used to store the data -- imagine storing a 2D histogram over 4 million steps, the memory can increase fast quickly. These compress histograms are accessible by doing this:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
n2n = EventAccumulator(PATH)
n2n.Reload()
# Check the tags under histograms and choose the one you want
n2n.Tags()
# This will give you the list used by tensorboard
# of the compress histograms by timestep and wall time
n2n.CompressedHistograms(HISTOGRAM_TAG)
The only problem is that it compresses the histogram to five percentiles (in Basic points they are 0, 668, 1587, 3085, 5000, 6915, 8413, 9332, 10000) which corresponds to (-Inf, -1.5, -1, -0.5, 0, 0.5, 1, 1.5, Inf) in standard deviations. Check the code here.
I haven't read much, but it wouldn't be hard to reconstruct the temporal histograms that tensorboard shows. If I find a way to do it, I will post it here.
The simplest way is to parse the events with tbparse and plot the histograms with seaborn kde_ridgeplot.
This tutorial generates the stacked distribution plot with around 30 lines of Python code:
Tensorboard preview:
Parse by tbparse & plotted by seaborn:
You can open an issue if you encountered any question during parsing. (I'm the author of tbparse)