Binary treatment based on a continuous variable (Stata) - data-visualization

I want to create a scatter plot showing my treatment assignment on the y-axis and the margin of winning on the x-axis.
To create a binary treatment variable, where a margin over 0 indicates that a Republican candidate won the local election.
gen republican_win = (margin>0)
Here is a data example:
* Example generated by -dataex-. For more info, type help dataex
clear
input double margin float republican_win
-.356066316366196 0
-.54347825050354 0
-.204092293977737 0
-.449720650911331 1
-.201149433851242 1
-.505899667739868 0
-.206885248422623 1
end
To generate a scatter plot, I ran this. While the code ran well, I was wondering if it would be possible to display a continuous distribution of the margin of Republican wins and losses?
scatter margin republican_win

You can use the predicted probabilities by storing them in a variable, and then plot it at the same time as your scatter plot.
I would then reverse the axes to show your logistic distribution.
logit republican_win margin
predict win_hat
twoway scatter win_hat republican_win margin, ///
connect(l i) msymbol(i 0) sort ylabel(0 1)
There are not enough data points in your data example to show a nice fitted curve, but I'm sure it will look better on your whole dataset.

Related

Matplotlib: Add contour plot to base of surface plot python

So I've produced a 3-d graph on python using trisruf:
ax.plot_trisurf(x,y,z)
and I end up with the following:
3d plot
So now I want to plot contours on the base of this same plot, When I tried using ax.contour(x,y,z) I get an error saying my z should be in 2-dimensions, however my data comes from three 1-d arrays.
How can I go about plotting contours on the base of my graph?
Ok so I managed to find the answer after a bit of searching,
ax.tricontourf(angle_x,angle_y,nlist,zdir='-z', offset = -0.859, cmap='coolwarm') worked, its important to make the offset just slightly lower than the lowest z point (or whatever direction you want the projection) so you can actually see the contour plot!
Here's the result:
updated plot with contour

How to apply matplotlib quiver autoscale to two vector fields?

I am plotting two vector fields on top of each other and I want to use the auto-scale feature to set the arrow size such that the two fields are at the same scale automatically. (Part of this notebook.)
If I plot them one after the other, they are drawn at different scales. In this case the black arrows are artificially inflated compared to green.
plt.quiver(*XY, *np.real(UV))
plt.quiver(*XY, *np.imag(UV), color='g')
If I use this solution the first plot sets the scale for the second plot. But this fails to take the scale of the second field into account. If the first field has a small magnitude compared to the second, then it looks terrible.
Q = plt.quiver(*XY, *np.real(UV))
Q._init()
plt.quiver(*XY, *np.imag(UV), scale=Q.scale, color='g')
I want to set the auto-scale based on both fields, not just one or the other. Ideas?
You need to pass the same scale argument to both plt.quiver calls.
If you don't provide a scale than a visually pleasing scale is derived automatically. So you could in principle extract the autoscaling code and use it to get the automatic scales for both quiver plots and then use for instance the average of the two values.
Another, easier, way is to first invisibly plot both quiver plots using the do-nothing backend 'template', retrieve the automatically calculated scales and use the average of them in both real plotting calls:
def plot_flow(x,y,q,XY,G=source,args=(),size=(7,7),ymax=None):
"Plot the geometry and induced velocity field"
# Loop through segments, superimposing the velocity
def uv(i): return q[i]*velocity(*XY, x[i], y[i], x[i+1], y[i+1], G, args)
UV = sum(uv(i) for i in range(len(x)-1))
def get_scale(XY, UV):
"""Get autoscale value by plotting to do-nothing backend."""
backend = plt.matplotlib.get_backend()
plt.matplotlib.use('template')
Q = plt.quiver(*XY, *UV, scale=None)
plt.matplotlib.use(backend)
Q._init()
return Q.scale
# Get autoscales
scale_real = get_scale(XY, np.real(UV))
scale_imag = get_scale(XY, np.imag(UV)) if np.iscomplexobj(UV) else scale_real
scale = (scale_real + scale_imag)/2
# Create plot
plt.figure(figsize=size)
ax=plt.axes(); ax.set_aspect('equal', adjustable='box')
# Plot vectors and segments
plt.quiver(*XY, *np.real(UV), scale=scale)
if np.iscomplexobj(UV):
plt.quiver(*XY, *np.imag(UV), scale=scale, color='g')
plt.plot(x,y,c='b')
plt.ylim(None,ymax)
In the example, we get a scale of 7.7 as the average of 12.2 and 3.3:
Normalizing the data before plotting it can help getting similar scales on the arrow sizes:
scale = 1
UV_real = np.real(UV) / np.linalg.norm(UV)
UV_imag = np.imag(UV) / np.linalg.norm(UV)
Q1 = plt.quiver(*XY, *UV_real, scale=scale)
Q2 = plt.quiver(*XY, *UV_imag, scale=scale, color='g')
Tested for multiple magnitude ratios between real and imaginary parts.

How to fill a line in 2D image along a given radius with the data in a given line image?

I want to fill a 2D image along its polar radius, the data are stored in a image where each row or column corresponds to the radius in target image. How can I fill the target image efficiently? Such as with iradius or some functions? I do not prefer a pix-pix operation.
Are you looking for something like this?
number maxR = 100
image rValues := realimage("I(r)",4,maxR)
rValues = 10 + trunc(100*random())
image plot :=realimage("Ring",4,2*maxR,2*maxR)
rValues.ShowImage()
plot.ShowImage()
plot = rValues.warp(iradius,0)
You might also want to check out the relevant example code from the F1 help documentation of GMS itself:
Explaining warp a bit:
plot = rValues.warp(iradius,0)
Assigns values to plot based on a value-lookup in rValues.
For each pixel in plot a coordinate position in rValues is computed, and the value is simply looked up. If the computed coordinate is non-integer, bilinear interpolation between the 4 closest points is used.
In the example, the two 'formulas' for the coordinate calculation are simple x' = iradius and y' = 0 where iradius is an expression computed from the coordinate in plot, for convenience.
You can feed any expression into the parameters for warp( ) and the command is closely related to just using the square bracket notation of addressing values. In fact, the only difference is that warp performs the bilinear interpolation of values instead of truncating the coordinates to integer values.

Using matplotlib to plot a matrix with the third variable as source for a color map

Say you have the matrix given by three arrays, being:
x = N-dimensional array.
y = M-dimensional array.
And z is a set of "somewhat random" values from -0.3 to 0.3 in a NxM shape. I need to create a plot in which the x values are in the x-axis, y values are in the y-axis and using z as the source to indicate the intensity of each pixel with a color map.
So far, I have tried using
plt.contourf(x,y,z)
and the resulting plot is very nice for me (attached at the end of this paragraph), but a smoothing is automatically applied to the plot! I need to be able to distinguish the pixels and I cannot find a way to do it.
contourf result
I have also studied the possibility of using
ax.matshow(z)
in order to sucesfully see the pixels... but then I am struggling trying to personalize the x and y axis, since only the index of the pixel is shown (see below).
matshow result
Would you please give me some ideas? Thank you.
Without more information on your x,y data it's hard to know, but I would guess you are looking for pcolormesh.
plt.pcolormesh(x,y,z)
This would take the x and y data as input and hence shows the z data at the appropriate coordinates.
You can use imshow with the keyword interpolation='nearest'.
plt.imshow(z, interpolation='nearest')

Faster way to perform point-wise interplation of numpy array?

I have a 3D datacube, with two spatial dimensions and the third being a multi-band spectrum at each point of the 2D image.
H[x, y, bands]
Given a wavelength (or band number), I would like to extract the 2D image corresponding to that wavelength. This would be simply an array slice like H[:,:,bnd]. Similarly, given a spatial location (i,j) the spectrum at that location is H[i,j].
I would also like to 'smooth' the image spectrally, to counter low-light noise in the spectra. That is for band bnd, I choose a window of size wind and fit a n-degree polynomial to the spectrum in that window. With polyfit and polyval I can find the fitted spectral value at that point for band bnd.
Now, if I want the whole image of bnd from the fitted value, then I have to perform this windowed-fitting at each (i,j) of the image. I also want the 2nd-derivative image of bnd, that is, the value of the 2nd-derivative of the fitted spectrum at each point.
Running over the points, I could polyfit-polyval-polyder each of the x*y spectra. While this works, this is a point-wise operation. Is there some pytho-numponic way to do this faster?
If you do least-squares polynomial fitting to points (x+dx[i],y[i]) for a fixed set of dx and then evaluate the resulting polynomial at x, the result is a (fixed) linear combination of the y[i]. The same is true for the derivatives of the polynomial. So you just need a linear combination of the slices. Look up "Savitzky-Golay filters".
EDITED to add a brief example of how S-G filters work. I haven't checked any of the details and you should therefore not rely on it to be correct.
So, suppose you take a filter of width 5 and degree 2. That is, for each band (ignoring, for the moment, ones at the start and end) we'll take that one and the two on either side, fit a quadratic curve, and look at its value in the middle.
So, if f(x) ~= ax^2+bx+c and f(-2),f(-1),f(0),f(1),f(2) = p,q,r,s,t then we want 4a-2b+c ~= p, a-b+c ~= q, etc. Least-squares fitting means minimizing (4a-2b+c-p)^2 + (a-b+c-q)^2 + (c-r)^2 + (a+b+c-s)^2 + (4a+2b+c-t)^2, which means (taking partial derivatives w.r.t. a,b,c):
4(4a-2b+c-p)+(a-b+c-q)+(a+b+c-s)+4(4a+2b+c-t)=0
-2(4a-2b+c-p)-(a-b+c-q)+(a+b+c-s)+2(4a+2b+c-t)=0
(4a-2b+c-p)+(a-b+c-q)+(c-r)+(a+b+c-s)+(4a+2b+c-t)=0
or, simplifying,
22a+10c = 4p+q+s+4t
10b = -2p-q+s+2t
10a+5c = p+q+r+s+t
so a,b,c = p-q/2-r-s/2+t, (2(t-p)+(s-q))/10, (p+q+r+s+t)/5-(2p-q-2r-s+2t).
And of course c is the value of the fitted polynomial at 0, and therefore is the smoothed value we want. So for each spatial position, we have a vector of input spectral data, from which we compute the smoothed spectral data by multiplying by a matrix whose rows (apart from the first and last couple) look like [0 ... 0 -9/5 4/5 11/5 4/5 -9/5 0 ... 0], with the central 11/5 on the main diagonal of the matrix.
So you could do a matrix multiplication for each spatial position; but since it's the same matrix everywhere you can do it with a single call to tensordot. So if S contains the matrix I just described (er, wait, no, the transpose of the matrix I just described) and A is your 3-dimensional data cube, your spectrally-smoothed data cube would be numpy.tensordot(A,S).
This would be a good point at which to repeat my warning: I haven't checked any of the details in the few paragraphs above, which are just meant to give an indication of how it all works and why you can do the whole thing in a single linear-algebra operation.