jupyter notebook prints output even when suppressed - matplotlib

Why is it printing the bins from the histogram?
Shouldn't the semicolon suppress it?
In [1]
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all";
In [2]
%matplotlib inline
data ={'first':np.random.rand(100),
'second':np.random.rand(100)}
fig, axes = plt.subplots(2)
for idx, k in enumerate(data):
axes[idx].hist(data[k], bins=20);

You've set InteractiveShell.ast_node_interactivity = "all";, so you've set all nodes to have ast interactivity enabled. So you get the values of data = {..}
And ; works only for the last top level expression, axes[idx].hist(data[k], bins=20); is not a top level expression, as it is nested in the for, the last top level node is the for, which is a statement.
Simply add a last no-op statement, and end it with ;
%matplotlib inline
data ={'first':np.random.rand(100),
'second':np.random.rand(100)};
fig, axes = plt.subplots(2);
for idx, k in enumerate(data):
axes[idx].hist(data[k], bins=20)
pass; # or None; 0; "foo"; ...
And you won't have any outputs.
Use codetransformer %%ast magic to quickly see the ast of an expression.

If you read the documentation, you will see exactly what it returns - a three item tuple described below. You can display it in the notebook by placing a ? at the end of the call to the histogram. It looks like your InteractiveShell is making it display. Normally, yes a semicolon would suppress the output, although inside of a loop it would be unnecessary.
Returns
n : array or list of arrays
The values of the histogram bins. See normed and weights
for a description of the possible semantics. If input x is an
array, then this is an array of length nbins. If input is a
sequence arrays [data1, data2,..], then this is a list of
arrays with the values of the histograms for each of the arrays
in the same order.
bins : array
The edges of the bins. Length nbins + 1 (nbins left edges and right
edge of last bin). Always a single array even when multiple data
sets are passed in.
patches : list or list of lists
Silent list of individual patches used to create the histogram
or list of such list if multiple input datasets.

Related

How to overlay hatches on shapefile with condition?

I've been trying to plot hatches (like this pattern, "//") on polygons of a shapefile, based on a condition. The condition is that whichever polygon values ("Sig") are greater than equal to 0.05, there should be a hatch pattern for them. Unfortunately the resulting map doesn't meet my requirements.
So I first plot the "AMOTL" variable and then wanted to plot the hatches (variable Sig) on top of them (if the values are greater than equal to 0.05). I have used the following code:
import contextily as ctx
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker
from matplotlib.patches import Ellipse, Polygon
data = gpd.read_file("mapsignif.shp")
Sig = data.loc[data["Sig"].ge(0.05)]
data.loc[data["AMOTL"].eq(0), "AMOTL"] = np.nan
ax = data.plot(
figsize=(12, 10),
column="AMOTL",
legend=True,
cmap="bwr",
vmin = -1,
vmax= 1,
missing_kwds={"color":"white"},
)
Sig.plot(
ax=ax,
hatch='//'
)
map = Basemap(
llcrnrlon=-50,
llcrnrlat=30,
urcrnrlon=50.0,
urcrnrlat=85.0,
resolution="i",
lat_0=39.5,
lon_0=1,
)
map.fillcontinents(color="lightgreen")
map.drawcoastlines()
map.drawparallels(np.arange(10,90,20),labels=[1,1,1,1])
map.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])
Now the problem is that my original image (on which I want to plot the hatches) is different from the image resulting from the above code:
Original Image -
Resultant image from above code:
I basically want to plot hatches on that first image. This topic is similar to correlation plots where you have places with hatches (if the p-value is greater than 0.05). The first image plots the correlation variable and some of them are significant (defined by Sig). So I want to plot the Sig variable on top of the AMOTL. I've tried variations of the code, but still can't get through.
Would be grateful for some assistance... Here's my file - https://drive.google.com/file/d/10LPNjBtQMdQMw6XmXdJEg6Uq4icx_LD6/view?usp=sharing
I’d bet this is the culprit:
data.loc[data["Sig"].ge(0.05), "Sig"].plot(
column="Sig", hatch='//'
)
In this line, you’re selecting only the 'Sig' column, eliminating all spatial data in the 'geometry' column and returning a pandas.Series instead of a geopandas.GeoDataFrame. In order to plot a data column using the geometries column for your shapes you must maintain at least both of those columns in the object you call .plot on.
So instead, don’t select the column:
data.loc[data["Sig"].ge(0.05)].plot(
column="Sig", hatch='//'
)
You are already telling geopandas to plot the "Sig" column by using the column argument to .plot - no need to limit the actual data too.
Also, when overlaying a plot on an existing axis, be sure to pass in the axis object:
data.loc[data["Sig"].ge(0.05)].plot(
column="Sig", hatch='//', ax=ax
)

Plotting a 3-dimensional numpy array

I have a 3d numpy array with the shape (128,128,384). Let's call this array "S". This array only contains binary values either 0s or 1s.
\now \i want to get a 3d plot of this array in such a way that \ I have a grid of indices (x,y,z) and for every entry of S when it is one \ I should get a point printed at the corresponding indices in the 3d grid. e.g. let's say I have 1 entry at S[120,50,36], so I should get a dot at that point in the grid.
So far I have tried many methods but have been able to implement one method that works which is extremely slow and hence useless in my case. that method is to iterate over the entire array and use a scatter plot. \here is a snippet of my code:
from numpy import np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
for i in range(0,128):
for j in range(0,128):
for k in range(0,384):
if S[i,j,k]==1:
ax.scatter(i,j,k,zdir='z', marker='o')
Please suggest to me any method which would be faster than this.
Also, please note that I am not trying to plot the entries in my array. The entries in my array are only a condition that tells me if I should plot corresponding to certain indices.
Thank you very much
You can use numpy.where.
In your example, remove the for loops and just use:
i, j, k = np.where(S==1)
ax.scatter(i,j,k,zdir='z', marker='o')

Construct NumPy matrix row by row

I'm trying to construct a 2D NumPy array from values in an extant 2D NumPy array using an iterative process. Using ordinary python lists the process I'm describing would look like so:
coords = #data from file contained in a 2D list
d = #integer
edges = []
for i in range(d+1):
for j in range(i+1, d+1):
edge = coords[j] - coords[i]
edges.append(edge)
However, the NumPy array imposes restrictions that do not permit the process shown above. Below I try to do the same thing using NumPy arrays, and it should immediately be clear where the problems are:
coords = np.genfromtxt('Energies.txt', dtype=float, skip_header=1)
d = #integer
#how to initialize?
for i in range(d+1):
for j in range(i+1, d+1):
edge = coords[j] - coords[i]
#how to append?
Because .append does not exist for NumPy arrays I need to rely on concatenate or stack instead. But these functions are designed to join existing arrays, and I don't have anything to concatenate or stack until after the first iteration of my loop. So I suppose I need to change my data flow, but I'm unsure how to go about this.
Any help would be greatly appreciated. Thanks in advance.
that function is numpy.meshgrid [1] , the function does it by default.
[1] https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.meshgrid.html

Using pd.cut to create bins for a graph, but bin values are not coming out as expected

Here is the code I'm running:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
y =titanic.groupby([titanic.fare//1,'sex']).survived.mean().reset_index() #grouping by 'fare' rounded to an integer and 'sex' and then getting the survivability
x =pd.cut(y.fare, (0,17,35,70,300,515)) #I'm not sure if my format is correct but this is how I cut up the fare values
y['Fare_bins']= x # adding the newly created bins to a new column "Fare_bins' in original dataframe.
#graphing with seaborn
sns.set(style="whitegrid")
g = sns.factorplot(x='Fare_bins', y= 'survived', col = 'sex', kind ='bar' ,data= y,
size=4, aspect =2.5 , palette="muted")
g.despine(left=True)
g.set_ylabels("Survival Probability")
g.set_xlabels('Fare')
plt.show()
The problem I'm having is that Fare_values are showing up as (0,17].
The left side is a circle bracket and the right side is square bracket.
If possible I would like to have something like this:
(0-17) or [0-17]
Next, there seems to be a gap between each bar plot. I was expecting them to be adjoined. There are two graphs being represented, so I don't expect of the bars to be ajoined, but the first 5 bars(first graph)should be connected and the last 5 bars to eachother(second graph).
How can I go about fixing these two issues?
It seems I can add labels.
Just by adding labels to the "cut" method parameters, I can display the Fare_values as I want.
x =pd.cut(y.fare, (0,17,35,70,300,515), labels = ('(0-17)', '(17-35)', '(35-70)', '(70-300)','(300-515)') )
As for the brackets showing around the fare_value groups,
according to the documentation:
right : bool, optional
Indicates whether the bins include the rightmost edge or not. If right == True (the default), then the bins [1,2,3,4] indicate (1,2], (2,3], (3,4].
Still not sure if it's possible to join the bars though.

How to set the nodes color with a given graph

I am using networkx and matplotlib
Now I want to set the color of nodes,and I read the graph from text file
G=nx.read_edgelist("Edge.txt")
nx.draw(G)
plt.show()
Here is the Edge file of example
0 1
0 2
3 4
Here is what I did,and is failed
import networkx as nx
import matplotlib.pyplot as plt
G = nx.read_edgelist("Edge.txt")
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G,pos,node_list=[0,1,2],node_color='B')
nx.draw_networkx_nodes(G,pos,node_list=[3,4],node_color='R')
plt.show()
the result is a lot of blue nodes without edges
So if I want to set the NodeListA=[0,1,2] to blue, NodeListB=[3,4] to red
How can I do that?
draw the nodes explicitly by calling the top-level function, draw_networkx_nodes
and pass in your node list as the value for the parameter node_list, and value for the parameter node_color, like so
nx.draw_network_nodes(G, pos, node_list=NodeListA, node_color="#5072A7")
the argument pos is just a python dictionary whose keys are the nodes of the graph and the values are x, y positions; an easy to to supply pos is to pass your graph object to spring_layout which will return the dictionary.
pos = nx.spring_layout(G)
alternatively, you can pass in the dictionary directly, e.g.,
pos = {
0:(2,2),
1:(3,5),
2:(1,2),
3:(5,5),
4:(7,4)
}
the likely cause of the code in the OP to execute is the call to read_edgelist; in particular, the file passed in is probably incorrectly formatted.
here's how to check this and also how to fix it:
G = nx.path_graph(5)
df = "/path/to/my/graphinit.edgelist"
nx.write_edgelist(G, df) # save a properly formatted edgelist file
G = nx.read_edgelist(df) # read that file back in