seaborn plot violin of the arrays - pandas

I have two arrays:
l1 = [2,1,5,6,1,2,4,5,6,1]
l2 = [7,8,4,1,3,4,8]
I want to plot a seaborn violinplot with different color per list (a separate violin for l1 and l2).
Is there a way to do so without creating a dataframe and pd.melt from them?

You can give the parameters to sns.violinplot() without provding a dataframe, as follows:
import seaborn as sns
l1 = [2, 1, 5, 6, 1, 2, 4]
l2 = [7, 8, 4, 1, 3, 4, 8]
flag = [0, 1, 1, 1, 0, 0, 1]
sns.violinplot(y=l1 + l2, x=["l1"]*len(l1) + ["l2"]*len(l2),
hue=flag + flag, palette=['crimson', 'cornflowerblue'])
To only use the l1 vs l2 information:
import seaborn as sns
l1 = [2, 1, 5, 6, 1, 2, 4]
l2 = [7, 8, 4, 1, 3, 4, 8]
sns.violinplot(y=l1 + l2, x=["l1"] * len(l1) + ["l2"] * len(l2), palette=['tomato', 'cornflowerblue'])

Related

Create triangular mesh from vertex coordinates

Given a set of 2d data points with coordinates x and y (left picture), is there an easy way to construct a triangular mesh on top of it (right picture)? i.e. return a list of tuples that indicates which vertices are connected. The solution is not unique, but any reasonable mesh would suffice.
You can use scipy.spatial.Delaunay. Here is an example from the
import numpy as np
points = np.array([[-1,1],[-1.3, .6],[0,0],[.2,.8],[1,.85],[-.1,-.4],[.4,-.15],[.6,-.6],[.9,-.2]])
from scipy.spatial import Delaunay
tri = Delaunay(points)
import matplotlib.pyplot as plt
plt.triplot(points[:,0], points[:,1], tri.simplices)
plt.plot(points[:,0], points[:,1], 'o')
plt.show()
Here is the result on an input similar to yours:
The triangles are stored in the simplices attribute of the Delaunay object which reference the coordinates stored in the points attribute:
>>> tri.points
array([[-1. , 1. ],
[-1.3 , 0.6 ],
[ 0. , 0. ],
[ 0.2 , 0.8 ],
[ 1. , 0.85],
[-0.1 , -0.4 ],
[ 0.4 , -0.15],
[ 0.6 , -0.6 ],
[ 0.9 , -0.2 ]])
>>> tri.simplices
array([[5, 2, 1],
[0, 3, 4],
[2, 0, 1],
[3, 0, 2],
[8, 6, 7],
[6, 5, 7],
[5, 6, 2],
[6, 3, 2],
[3, 6, 4],
[6, 8, 4]], dtype=int32)
If you are looking for which vertices are connected, there is an attribute containing that info also:
>>> tri.vertex_neighbor_vertices
(array([ 0, 4, 7, 12, 16, 20, 24, 30, 33, 36], dtype=int32), array([3, 4, 2, 1, 5, 2, 0, 5, 1, 0, 3, 6, 0, 4, 2, 6, 0, 3, 6, 8, 2, 1,
6, 7, 8, 7, 5, 2, 3, 4, 8, 6, 5, 6, 7, 4], dtype=int32))
You can try scipy.spatial.Delaunay. From that link:
points = np.array([[0, 0], [0, 1.1], [1, 0], [1, 1]])
from scipy.spatial import Delaunay
tri = Delaunay(points)
plt.triplot(points[:,0], points[:,1], tri.simplices)
plt.plot(points[:,0], points[:,1], 'o')
plt.show()
Output:
I think Delanuay gives something closer to a convex hull. In OP's picture A is not connected to C, it is connected to B which is connected to C which gives a different shape.
One solution could be running Delanuay first then removing triangles whose angles exceed a certain degree, eg 90, or 100. A prelim code could look like
from scipy.spatial import Delaunay
points = [[101, 357], [198, 327], [316, 334], [ 58, 299], [162, 258], [217, 240], [310, 236], [153, 207], [257, 163]]
points = np.array(points)
tri = Delaunay(points,furthest_site=False)
newsimp = []
for t in tri.simplices:
A,B,C = points[t[0]],points[t[1]],points[t[2]]
e1 = B-A; e2 = C-A
num = np.dot(e1, e2)
denom = np.linalg.norm(e1) * np.linalg.norm(e2)
d1 = np.rad2deg(np.arccos(num/denom))
e1 = C-B; e2 = A-B
num = np.dot(e1, e2)
denom = np.linalg.norm(e1) * np.linalg.norm(e2)
d2 = np.rad2deg(np.arccos(num/denom))
d3 = 180-d1-d2
degs = np.array([d1,d2,d3])
if np.any(degs > 110): continue
newsimp.append(t)
plt.triplot(points[:,0], points[:,1], newsimp)
which gives the shape seen above. For more complicated shapes removing large sides could be necessary too,
for t in tri.simplices:
...
n1 = np.linalg.norm(e1); n2 = np.linalg.norm(e2)
...
res.append([n1,n2,d1,d2,d3])
res = np.array(res)
m = res[:,[0,1]].mean()*res[:,[0,1]].std()
mask = np.any(res[:,[2,3,4]] > 110) & (res[:,0] < m) & (res[:,1] < m )
plt.triplot(points[:,0], points[:,1], tri.simplices[mask])

How to make a simple Vandermonde matrix with numpy?

My question is how to make a vandermonde matrix. This is the definition:
In linear algebra, a Vandermonde matrix, named after Alexandre-Théophile Vandermonde, is a matrix with the terms of a geometric progression in each row, i.e., an m × n matrix
I would like to make a 4*4 version of this.
So farI have defined values but only for one row as follows
a=2
n=4
for a in range(n):
for i in range(n):
v.append(a**i)
v = np.array(v)
print(v)
I dont know how to scale this. Please help!
Given a starting column a of length m you can create a Vandermonde matrix v with n columns a**0 to a**(n-1)like so:
import numpy as np
m = 4
n = 4
a = range(1, m+1)
v = np.array([a]*n).T**range(n)
print(v)
#[[ 1 1 1 1]
# [ 1 2 4 8]
# [ 1 3 9 27]
# [ 1 4 16 64]]
As proposed by michael szczesny you could use numpy.vander.
But this will not be according to the definition on Wikipedia.
x = np.array([1, 2, 3, 5])
N = 4
np.vander(x, N)
#array([[ 1, 1, 1, 1],
# [ 8, 4, 2, 1],
# [ 27, 9, 3, 1],
# [125, 25, 5, 1]])
So, you'd have to use numpy.fliplr aswell:
x = np.array([1, 2, 3, 5])
N = 4
np.fliplr(np.vander(x, N))
#array([[ 1, 1, 1, 1],
# [ 1, 2, 4, 8],
# [ 1, 3, 9, 27],
# [ 1, 5, 25, 125]])
This could also be achieved without numpy using nested list comprehensions:
x = [1, 2, 3, 5]
N = 4
[[xi**i for i in range(N)] for xi in x]
# [[1, 1, 1, 1],
# [1, 2, 4, 8],
# [1, 3, 9, 27],
# [1, 5, 25, 125]]
# Vandermonde Matrix
def Vandermonde_Matrix(D, k):
'''
D = {(x_i,y_i): 0<=i<=n}
----------------
k degree
'''
n = len(D)
V = np.zeros(shape=(n, k))
for i in range(n):
V[i] = np.power(np.array(D[i][0]), np.arange(k))
return V

Elegantly generate result array in numpy

I have my X and Y numpy arrays:
X = np.array([0,1,2,3])
Y = np.array([0,1,2,3])
And my function which maps x,y values to Z points:
def z(x,y):
return x+y
I wish to produce the obvious thing required for a 3D plot: the 2-dimensional numpy array for the corresponding Z-values. I believe it should look like:
Z = np.array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
I can do this in several lines, but I'm looking for the briefest most elegant piece of code.
For a function that is array aware it is more economical to use an open grid:
>>> import numpy as np
>>>
>>> X = np.array([0,1,2,3])
>>> Y = np.array([0,1,2,3])
>>>
>>> def z(x,y):
... return x+y
...
>>> XX, YY = np.ix_(X, Y)
>>> XX, YY
(array([[0],
[1],
[2],
[3]]), array([[0, 1, 2, 3]]))
>>> z(XX, YY)
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
If your grid axes are ranges you can directly create the grid using np.ogrid
>>> XX, YY = np.ogrid[:4, :4]
>>> XX, YY
(array([[0],
[1],
[2],
[3]]), array([[0, 1, 2, 3]]))
If the function is not array aware you can make it so using np.vectorize:
>>> def f(x, y):
... if x > y:
... return x
... else:
... return -x
...
>>> np.vectorize(f)(*np.ogrid[-3:4, -3:4])
array([[ 3, 3, 3, 3, 3, 3, 3],
[-2, 2, 2, 2, 2, 2, 2],
[-1, -1, 1, 1, 1, 1, 1],
[ 0, 0, 0, 0, 0, 0, 0],
[ 1, 1, 1, 1, -1, -1, -1],
[ 2, 2, 2, 2, 2, -2, -2],
[ 3, 3, 3, 3, 3, 3, -3]])
One very short way to achieve what you want is to produce a meshgrid from your coordinates:
X,Y = np.meshgrid(x,y)
z = X+Y
or more general:
z = f(X,Y)
or even in one line:
z = f(*np.meshgrid(x,y))
EDIT:
If your function also may return a constant, you have to somehow infer the dimensions that the result should have. If you want to continue using meshgrids one very simple way would be re-write your function in this way:
def f(x,y):
return x*0+y*0+a
where a would be your constant. numpy would then take care of the dimensions for you. This is of course a bit weird looking, so instead you could write
def f(x,y):
return np.full(x.shape, a)
If you really want to go with functions that work both on scalars and arrays, it's probably best to go with np.vectorize as in #PaulPanzer's answer.

tensorflow expand counts into ranges

We have a Tensor of unknown length N, containing some int32 values.
How can we generate another Tensor that will contain N ranges concatenated together, each one between 0 and the int32 value from the original tensor ?
For example, if we have [4, 4, 5, 3, 1], the output Tensor should look like [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0].
Thank you for any advice.
You can make this work with a tensor as input by using a tf.RaggedTensor which can contain dimensions of non-uniform length.
# Or any other N length tensor
tf_counts = tf.convert_to_tensor([4, 4, 5, 3, 1])
tf.print(tf_counts)
# [4 4 5 3 1]
# Create a ragged tensor, each row is a sequence of length tf_counts[i]
tf_ragged = tf.ragged.range(tf_counts)
tf.print(tf_ragged)
# <tf.RaggedTensor [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2], [0]]>
# Read values
tf.print(tf_ragged.flat_values, summarize=-1)
# [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0]
For this 2-dimensional case the ragged tensor tf_ragged is a “matrix“ of rows with varying length:
[[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2],
[0]]
Check tf.ragged.range for more options on how to create the sequences on each row: starts for inclusive lower limits, limits for exclusive upper limit, deltas for increment. Each may vary for each sequence.
Also mind that the dtype of the tf_counts tensor will propagate to the final values.
If you want to have everything as a tensorflow object, then use tf.range() along with tf.concat().
In [88]: vals = [4, 4, 5, 3, 1]
In [89]: tf_range = [tf.range(0, limit=item, dtype=tf.int32) for item in vals]
# concat all `tf_range` objects into a single tensor
In [90]: concatenated_tensor = tf.concat(tf_range, 0)
In [91]: concatenated_tensor.eval()
Out[91]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
There're other approaches to do this as well. Here, I assume that you want a constant tensor but you can construct any tensor once you have the full range list.
First, we construct the full range list using a list comprehension, make a flat list out of it, and then construct a tensor.
In [78]: from itertools import chain
In [79]: vals = [4, 4, 5, 3, 1]
In [80]: range_list = list(chain(*[range(item) for item in vals]))
In [81]: range_list
Out[81]: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
In [82]: const_tensor = tf.constant(range_list, dtype=tf.int32)
In [83]: const_tensor.eval()
Out[83]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
On the other hand, we can also use tf.range() but then it returns an array when you evaluate it. So, you'd have to construct the list from the arrays and then make a flat list out of it and finally construct the tensor as in the following example.
list_of_arr = [tf.range(0, limit=item, dtype=tf.int32).eval() for item in vals]
range_list = list(chain(*[arr.tolist() for arr in list_of_arr]))
# output
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
const_tensor = tf.constant(range_list, dtype=tf.int32)
const_tensor.eval()
#output tensor as numpy array
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)

Swap a subset of multi-values in numpy

Given a starting numpy array that looks like:
B = np.array( [1, 1, 1, 0, 2, 2, 1, 3, 3, 0, 4, 4, 4, 4] )
What it the most efficient way to swap one set of values for another when there are duplicates? For example, let
s1 = [1,2,4]
s2 = [4,1,2]
An inefficient swapping method would iterate through s1 and s2 as so:
B2 = B.copy()
for x,y in zip(s1,s2):
B2[B==x] = y
Giving as output
B2 -> [4, 4, 4, 0, 1, 1, 4, 3, 3, 0, 2, 2, 2, 2]
Is there a way to do this essentially in-place without the zip loop?
>>> B = np.array( [1, 1, 1, 0, 2, 2, 1, 3, 3, 0, 4, 4, 4, 4] )
>>> s1 = [1,2,4]
>>> s2 = [4,1,2]
>>> B2 = B.copy()
>>> c, d = np.where(B == np.array(s1)[:,np.newaxis])
>>> B2[d] = np.repeat(s2,np.bincount(c))
>>> B2
array([4, 4, 4, 0, 1, 1, 4, 3, 3, 0, 2, 2, 2, 2])
If you have only integers that are between 0 and n (if not its no problem to generalize to any integer range unless its very sparse), the most efficient way is the use of take/fancy indexing:
swap = np.arange(B.max() + 1) # all values in B
swap[s1] = s2 # replace the values you want to be replaced
B2 = swap.take(B) # or swap[B]
This is seems almost twice as fast for the small B given here, but with larger B it gets even more speedup repeating B to a length of about 100000 gives 8x already. This also avoids the == operation for every s1 element, so will scale much better as s1/s2 get large.
EDIT: you could also use np.put (also in the other answer) for some speedup for swap[s1] = s2. For these 1D problems take/put are simply faster.