This question follows from Efficiently plot set of {coordinate+value}s to (numpy array) bitmap
A solution for plotting from x, y, color lists to a bitmap is given:
bitmap = np.zeros((10, 10, 3))
s_x = (0,1,2) ## tuple
s_y = (1,2,3) ## tuple
pixal_val = np.array([[0,0,1],[1,0,0],[0,1,0]]) ## np
bitmap[s_x, s_y] = pixal_val
But how to handle the case where some (x,y) pairs lie outside the bitmap?
Efficiency is paramount.
If I could map offscreen coords to the first row/col of the bitmap (-42, 7) -> (0, 7), (15, -6) -> (15, 0), I could simply black out the first row&col with bitmap[:,0,:] = 0; bitmap[0,:,:] = 0.
Is this doable?
Is there a smarter way?

Are you expecting offscreen coords? if so don't worry otherwise I was just wondering if it was using a non-traditional coordinate system - where the zero may be in the center of the image for whatever reason
Anyway, after my revelation that you can use numpy arrays to store the coordinates, mapping outliers to the first row/col is pretty straightforward, simply using: s_x[s_x < 0] = 0, however, i believe the most efficient way to use logic to find the index of the pixels you want to use so only they are allocated - see below:
bitmap = np.zeros((15, 16, 3))
## generate data
s_x = np.array([a for a in range(-3,22)], dtype=int)
s_y = np.array([a for a in range(-4,21)], dtype=int)
pixel_val = np.random.rand(25,3)
## generate is done
use = np.logical_and(np.logical_and(s_x >= 0, s_x < bitmap.shape[1]), np.logical_and(s_y >= 0, s_y < bitmap.shape[0]))
bitmap[s_y[use], s_x[use]] = pixel_val[use]
[ 8 3 21 9 -2 -3 5 14 -1 18 13 16 0 11 7 1 2 12 15 6 19 10 4 17 20]
[ 8 14 1 9 2 4 7 15 3 -3 19 16 6 -1 0 17 5 13 -2 20 -4 11 10 12 18]
I ran a test where it had to allocate 3145728 (four times the size of the bitmap you gave in your other question), around half of which were outside the image and on average it took around 140ms, whereas remapping the outliers and then setting the first row/col to zero took 200ms for the same task


Why does Perl 6 try to evaluate an infinite list only in one of two similar situations?

Suppose I define a lazy, infinite array using a triangular reduction at the REPL, with a single element pasted onto the front:
> my #s = 0, |[\+] (1, 2 ... *)
I can print out the first few elements:
> #s[^10]
(0 1 3 6 10 15 21 28 36 45)
I'd like to move the zero element inside the reduction like so:
> my #s = [\+] (0, |(1, 2 ... *))
However, in response to this, the REPL hangs, presumably by trying to evaluate the infinite list.
If I do it in separate steps, it works:
> my #s = 0, |(1, 2 ... *)
> ([\+] #s)[^10]
(0 1 3 6 10 15 21 28 36 45)
Why doesn't the way that doesn't work...work?
Short answer:
It is probably a bug.
Long answer:
(1, 2 ... *) produces a lazy sequence because it is obviously infinite, but somehow that is not making the resulting sequence from being marked as lazy.
Putting a sequence into an array #s causes it to be eagerly evaluated unless it is marked as being lazy.
Quick fix:
Append lazy to the front.
> my #s = [\+] lazy 0, |(1, 2 ... *)
> #s[^10]
(0 1 3 6 10 15 21 28 36 45)

Array Numpy Side Effect

I found a strange effect when permuting array with numpy:
def permute(yy, kmax) :
kk= np.random.uniform(1,kmax)
nn= int(np.floor(len(yy)/kk))
yy3= np.zeros_like(yy );
for ii in range(0, nn):
ax= kk*ii-kk*nn
aux= yy[ax]
aux2= yy[kk*ii]
yy3[ax] = aux
yy3[kk*ii] = aux2
return yy3
yy= np.random.normal(0,1,50000)
yy1= permute(yy,2)
( np.var(yy)- np.var(yy1) )
( np.mean(yy)- np.mean(yy1) )
Result is not zero !!!
Do you think this comes from reference assignment in the array ?
I ran your function with np.arange(10) and got
1752:~/mypy$ python stack35004877.py
[0 1 2 3 4 5 6 7 8 9] # yy
[0 1 2 3 4 5 6 7 8 9] # yy1
And repeated it with the large random array, with the same 0s for the statistics.
Note that your code did not permute the input
Maybe it will be clearer if I clean it up:
def permute(yy, kmax=5) :
kk= np.random.randint(1,kmax) # int rather than float
nn= int(np.floor(len(yy)/kk))
yy3= yy.copy()
for ii in range(0, nn):
ind1 = kk*ii
ind2 = ind1-kk*nn
yy3[ind2] = yy[ind2]
yy3[ind1] = yy[ind1]
return yy3
You aren't moving anything; and with kmax=2 you just copy every thing from yy to yy3 - something you already did outside the loop. With kmax=5 you don't copy everything in the loop - but the initial copy hides that.
With random.uniform(), kk is a float, and the indexes are also floats. That's not desirable, but apparently not a problem.
But even if I switch the indices:
yy3[ind2] = yy[ind1]
yy3[ind1] = yy[ind2]
I don't permute anything, because ind2 a negative value, that maps on to the same element as ind1. yy[-1] is the last item of yy.
[(0, -10), (1, -9), (2, -8),... (9, -1)]
I could work out the details, but I think you should do that yourself - with a small test case. And skip that initial copyto, that just hides errors in the iteration. Print the details, not just summary statistics from large random arrays.
And in the long run you don't want to use an iteration like this. You want to do the permutation with one indexing call. But first get this version working correctly.

Most efficient way to shift MultiIndex time series

I have a DataFrame that consists of many stacked time series. The index is (poolId, month) where both are integers, the "month" being the number of months since 2000. What's the best way to calculate one-month lagged versions of multiple variables?
Right now, I do something like:
cols_to_shift = ["bal", ...5 more columns...]
df_shift = df[cols_to_shift].groupby(level=0).transform(lambda x: x.shift(-1))
For my data, this took me a full 60 s to run. (I have 48k different pools and a total of 718k rows.)
I'm converting this from R code and the equivalent data.table call:
dt.shift <- dt[, list(bal=myshift(bal), ...), by=list(poolId)]
only takes 9 s to run. (Here "myshift" is something like "function(x) c(x[-1], NA)".)
Is there a way I can get the pandas verison to be back in line speed-wise? I tested this on 0.8.1.
Edit: Here's an example of generating a close-enough data set, so you can get some idea of what I mean:
ids = np.arange(48000)
lens = np.maximum(np.round(15+9.5*np.random.randn(48000)), 1.0).astype(int)
id_vec = np.repeat(ids, lens)
lens_shift = np.concatenate(([0], lens[:-1]))
mon_vec = np.arange(lens.sum()) - np.repeat(np.cumsum(lens_shift), lens)
n = len(mon_vec)
df = pd.DataFrame.from_items([('pool', id_vec), ('month', mon_vec)] + [(c, np.random.rand(n)) for c in 'abcde'])
df = df.set_index(['pool', 'month'])
%time df_shift = df.groupby(level=0).transform(lambda x: x.shift(-1))
That took 64 s when I tried it. This data has every series starting at month 0; really, they should all end at month np.max(lens), with ragged start dates, but good enough.
Edit 2: Here's some comparison R code. This takes 0.8 s. Factor of 80, not good.
ids <- 1:48000
lens <- as.integer(pmax(1, round(rnorm(ids, mean=15, sd=9.5))))
id.vec <- rep(ids, times=lens)
lens.shift <- c(0, lens[-length(lens)])
mon.vec <- (1:sum(lens)) - rep(cumsum(lens.shift), times=lens)
n <- length(id.vec)
dt <- data.table(pool=id.vec, month=mon.vec, a=rnorm(n), b=rnorm(n), c=rnorm(n), d=rnorm(n), e=rnorm(n))
setkey(dt, pool, month)
myshift <- function(x) c(x[-1], NA)
system.time(dt.shift <- dt[, list(month=month, a=myshift(a), b=myshift(b), c=myshift(c), d=myshift(d), e=myshift(e)), by=pool])
I would suggest you reshape the data and do a single shift versus the groupby approach:
result = df.unstack(0).shift(1).stack()
This switches the order of the levels so you'd want to swap and reorder:
result = result.swaplevel(0, 1).sortlevel(0)
You can verify it's been lagged by one period (you want shift(1) instead of shift(-1)):
In [17]: result.ix[1]
a b c d e
1 0.752511 0.600825 0.328796 0.852869 0.306379
2 0.251120 0.871167 0.977606 0.509303 0.809407
3 0.198327 0.587066 0.778885 0.565666 0.172045
4 0.298184 0.853896 0.164485 0.169562 0.923817
5 0.703668 0.852304 0.030534 0.415467 0.663602
6 0.851866 0.629567 0.918303 0.205008 0.970033
7 0.758121 0.066677 0.433014 0.005454 0.338596
8 0.561382 0.968078 0.586736 0.817569 0.842106
9 0.246986 0.829720 0.522371 0.854840 0.887886
10 0.709550 0.591733 0.919168 0.568988 0.849380
11 0.997787 0.084709 0.664845 0.808106 0.872628
12 0.008661 0.449826 0.841896 0.307360 0.092581
13 0.727409 0.791167 0.518371 0.691875 0.095718
14 0.928342 0.247725 0.754204 0.468484 0.663773
15 0.934902 0.692837 0.367644 0.061359 0.381885
16 0.828492 0.026166 0.050765 0.524551 0.296122
17 0.589907 0.775721 0.061765 0.033213 0.793401
18 0.532189 0.678184 0.747391 0.199283 0.349949
In [18]: df.ix[1]
a b c d e
0 0.752511 0.600825 0.328796 0.852869 0.306379
1 0.251120 0.871167 0.977606 0.509303 0.809407
2 0.198327 0.587066 0.778885 0.565666 0.172045
3 0.298184 0.853896 0.164485 0.169562 0.923817
4 0.703668 0.852304 0.030534 0.415467 0.663602
5 0.851866 0.629567 0.918303 0.205008 0.970033
6 0.758121 0.066677 0.433014 0.005454 0.338596
7 0.561382 0.968078 0.586736 0.817569 0.842106
8 0.246986 0.829720 0.522371 0.854840 0.887886
9 0.709550 0.591733 0.919168 0.568988 0.849380
10 0.997787 0.084709 0.664845 0.808106 0.872628
11 0.008661 0.449826 0.841896 0.307360 0.092581
12 0.727409 0.791167 0.518371 0.691875 0.095718
13 0.928342 0.247725 0.754204 0.468484 0.663773
14 0.934902 0.692837 0.367644 0.061359 0.381885
15 0.828492 0.026166 0.050765 0.524551 0.296122
16 0.589907 0.775721 0.061765 0.033213 0.793401
17 0.532189 0.678184 0.747391 0.199283 0.349949
Perf isn't too bad with this method (it might be a touch slower in 0.9.0):
In [19]: %time result = df.unstack(0).shift(1).stack()
CPU times: user 1.46 s, sys: 0.24 s, total: 1.70 s
Wall time: 1.71 s

Hash function to iterate through a matrix

Given a NxN matrix and a (row,column) position, what is a method to select a different position in a random (or pseudo-random) order, trying to avoid collisions as much as possible?
For example: consider a 5x5 matrix and start from (1,2)
0 0 0 0 0
0 0 X 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
I'm looking for a method like
(x,y) hash (x,y);
to jump to a different position in the matrix, avoiding collisions as much as possible
(do not care how to return two different values, it doesn't matter, just think of an array).
Of course, I can simply use
row = rand()%N;
column = rand()%N;
but it's not that good to avoid collisions.
I thought I could apply twice a simple hash method for both row and column and use the results as new coordinates, but I'm not sure this is a good solution.
Any ideas?
Can you determine the order of the walk before you start iterating? If your matrices are large, this approach isn't space-efficient, but it is straightforward and collision-free. I would do something like:
Generate an array of all of the coordinates. Remove the starting position from the list.
Shuffle the list (there's sample code for a Fisher-Yates shuffle here)
Use the shuffled list for your walk order.
Edit 2 & 3: A modular approach: Given s array elements, choose a prime p of form 2+3*n, p>s. For i=1 to p, use cells (iii)%p when that value is in range 1...s-1. (For row-length r, cell #c subscripts are c%r, c/r.)
Effectively, this method uses H(i) = (iii) mod p as a hash function. The reference shows that as i ranges from 1 to p, H(i) takes on each of the values from 0 to p-1, exactly one time each.
For example, with s=25 and p=29 or 47, this uses cells in following order:
p=29: 1 8 6 9 13 24 19 4 14 17 22 18 11 7 12 3 15 10 5 16 20 23 2 21 0
p=47: 1 8 17 14 24 13 15 18 7 4 10 2 6 21 3 22 9 12 11 23 5 19 16 20 0
according to bc code like
s=25;p=29;for(i=1;i<=p;++i){t=(i^3)%p; if(t<s){print " ",t}}
The text above shows the suggestion I made in Edit 2 of my answer. The text below shows my first answer.
Edit 0: (This is the suggestion to which Seamus's comment applied): A simple method to go through a vector in a "random appearing" way is to repeatedly add d (d>1) to an index. This will access all elements if d and s are coprime (where s=vector length). Note, my example below is in terms of a vector; you could do the same thing independently on the other axis of your matrix, with a different delta for it, except a problem mentioned below would occur. Note, "coprime" means that gcd(d,s)=1. If s is variable, you'd need gcd() code.
Example: Say s is 10. gcd(s,x) is 1 for x in {1,3,7,9} and is not 1 for x in {2,4,5,6,8,10}. Suppose we choose d=7, and start with i=0. i will take on values 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, which modulo 10 is 0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0.
Edit 1 & 3: Unfortunately this will have a problem in the two-axis case; for example, if you use d=7 for x axis, and e=3 for y-axis, while the first 21 hits will be distinct, it will then continue repeating the same 21 hits. To address this, treat the whole matrix as a vector, use d with gcd(d,s)=1, and convert cell numbers to subscripts as above.
If you just want to iterate through the matrix, what is wrong with row++; if (row == N) {row = 0; column++}?
If you iterate through the row and the column independently, and each cycles back to the beginning after N steps, then the (row, column) pair will interate through only N of the N^2 cells of the matrix.
If you want to iterate through all of the cells of the matrix in pseudo-random order, you could look at questions here on random permutations.
This is a companion answer to address a question about my previous answer: How to find an appropriate prime p >= s (where s = the number of matrix elements) to use in the hash function H(i) = (i*i*i) mod p.
We need to find a prime of form 3n+2, where n is any odd integer such that 3*n+2 >= s. Note that n odd gives 3n+2 = 3(2k+1)+2 = 6k+5 where k need not be odd. In the example code below, p = 5+6*(s/6); initializes p to be a number of form 6k+5, and p += 6; maintains p in this form.
The code below shows that half-a-dozen lines of code are enough for the calculation. Timings are shown after the code, which is reasonably fast: 12 us at s=half a million, 200 us at s=half a billion, where us denotes microseconds.
// timing how long to find primes of form 2+3*n by division
// jiw 20 Sep 2011
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
double ttime(double base) {
struct timeval tod;
gettimeofday(&tod, NULL);
return tod.tv_sec + tod.tv_usec/1e6 - base;
int main(int argc, char *argv[]) {
int d, s, p, par=0;
double t0=ttime(0);
++par; s=5000; if (argc > par) s = atoi(argv[par]);
p = 5+6*(s/6);
while (1) {
for (d=3; d*d<p; d+=2)
if (p%d==0) break;
if (d*d >= p) break;
p += 6;
printf ("p = %d after %.6f seconds\n", p, ttime(t0));
return 0;
Timing results on 2.5GHz Athlon 5200+:
qili ~/px > for i in 0 00 000 0000 00000 000000; do ./divide-timing 500$i; done
p = 5003 after 0.000008 seconds
p = 50021 after 0.000010 seconds
p = 500009 after 0.000012 seconds
p = 5000081 after 0.000031 seconds
p = 50000021 after 0.000072 seconds
p = 500000003 after 0.000200 seconds
qili ~/px > factor 5003 50021 500009 5000081 50000021 500000003
5003: 5003
50021: 50021
500009: 500009
5000081: 5000081
50000021: 50000021
500000003: 500000003
Update 1 Of course, timing is not determinate (ie, can vary substantially depending on the value of s, other processes on machine, etc); for example:
qili ~/px > time for i in 000 004 010 058 070 094 100 118 184; do ./divide-timing 500000$i; done
p = 500000003 after 0.000201 seconds
p = 500000009 after 0.000201 seconds
p = 500000057 after 0.000235 seconds
p = 500000069 after 0.000394 seconds
p = 500000093 after 0.000200 seconds
p = 500000099 after 0.000201 seconds
p = 500000117 after 0.000201 seconds
p = 500000183 after 0.000211 seconds
p = 500000201 after 0.000223 seconds
real 0m0.011s
user 0m0.002s
sys 0m0.004s
Consider using a double hash function to get a better distribution inside the matrix,
but given that you cannot avoid colisions, what I suggest is to use an array of sentinels
and mark the positions you visit, this way you are sure you get to visit a cell once.

A programming challenge with Mathematica

I am interfacing an external program with Mathematica. I am creating an input file for the external program. Its about converting geometry data from a Mathematica generated graphics into a predefined format. Here is an example Geometry.
Figure 1
The geometry can be described in many ways in Mathematica. One laborious way is the following.
This generates the required 3D geometry in GraphicsComplex format of MMA.
This geometry is described as the following input file for my external program.
# x y z [m]
1. -1. 0.
0. -1. 0.5
0. -1. -0.5
1. -0.3333 0.
0. -0.3333 0.50. -0.3333 -0.5
1. 0.3333 0.
0. 0.3333 0.5
0. 0.3333 -0.5
1. 1. 0.
0. 1. 0.5
0. 1. -0.5
10. -1. 0.
10. -0.3333 0.
10. 0.3333 0.
10. 1. -0.
# type node_id1 node_id2 node_id3 node_id4 elem_id1 elem_id2 elem_id3 elem_id4
1 1 4 5 2 4 2 10 0
1 2 5 6 3 1 5 3 10
1 3 6 4 1 2 6 10 0
1 4 7 8 5 7 5 1 0
1 5 8 9 6 4 8 6 2
1 6 9 7 4 5 9 3 0
1 7 10 11 8 8 4 11 0
1 8 11 12 9 7 9 5 11
1 9 12 10 7 8 6 11 0
2 1 2 3 1 2 3
2 10 12 11 9 8 7
10 4 1 13 14 1 3
10 7 4 14 15 4 6
10 10 7 15 16 7 9
# end of input file
Now the description I have from the documentation of this external program is pretty short. I am quoting it here.
First keyword NODES states total number of
nodes. After this line there should be no comment or empty lines. Next lines consist of
three values x, y and z node coordinates and number of lines must be the same as number
of nodes.
Next keyword is PANEL and states how many panels we have. After that we have lines
defining each panel. First integer defines panel type
ID 1 – quadrilateral panel - is defined by four nodes and four neighboring panels.
Neighboring panels are panels that share same sides (pair of nodes) and is needed for
velocity and pressure calculation (methods 1 and 2). Missing neighbors (for example for
panels near the trailing edge) are filled with value 0 (see Figure 1).
ID 2 – triangular panel – is defined by three nodes and three neighboring panels.
ID 10 – wake panel – is quadrilateral panel defined with four nodes and with two
(neighboring) panels which are located on the trailing edge (panels to which wake panel is
applying Kutta condition).
Panel types 1 and 2 must be defined before type 10 in input file.
Important to notice is the surface normal; order of nodes defining panels should be
counter clockwise. By the right-hand rule if fingers are bended to follow numbering,
thumb will show normal vector that should point “outwards” geometry.
We are given with a 3D CAD model in a file called One.obj and it is exported fine in MMA.
cd = Import["One.obj"]
The output is a MMA Graphics3D object
Now I can get easily access the geometry data as MMA internally reads them.
{ver1, pol1} = cd[[1]][[2]] /. GraphicsComplex -> List;
MyPol = pol1 // First // First;
Graphics3D[GraphicsComplex[ver1,MyPol],Axes-> True]
How we can use the vertices and polygon information contained in ver1 and pol1 and write them in a text file as described in the input file example above. In this case we will only have ID2 type (triangular) panels.
Using the Mathematica triangulation how to find the surface area of this 3D object. Is there any inbuilt function that can compute surface area in MMA?
No need to create the wake panel or ID10 type elements right now. A input file with only triangular elements will be fine.
Sorry for such a long post but its a puzzle that I am trying to solve for a long time. Hope some of you expert may have the right insight to crack it.
Q1 and Q2 are easy enough that you could drop the "challenge" labels in your question. Q3 could use some clarification.
edges = cd[[1, 2, 1]];
polygons = cd[[1, 2, 2, 1, 1, 1]];
Update Q1
The main problem is to find the neighbor of each polygon. The following does this:
(* Split every triangle in 3 edges, with nodes in each edge sorted *)
triangleEdges = (Sort /# Subsets[#, {2}]) & /# polygons;
(* Generate a list of edges *)
singleEdges = Union[Flatten[triangleEdges, 1]];
(* Define a function which, given an edge (node number list), returns the bordering *)
(* triangle numbers. It's done by working through each of the triangles' edges *)
edgesNeighbors[_] = {};
edgesNeighbors[#1[[1]]] = Flatten[{edgesNeighbors[#1[[1]]], #2[[1]]}];
edgesNeighbors[#1[[2]]] = Flatten[{edgesNeighbors[#1[[2]]], #2[[1]]}];
edgesNeighbors[#1[[3]]] = Flatten[{edgesNeighbors[#1[[3]]], #2[[1]]}];
) &, triangleEdges
(* Build a triangle relation table. Each '1' indicates a triangle relation *)
relations = ConstantArray[0, {triangleEdges // Length, triangleEdges // Length}];
(n = edgesNeighbors[##];
If[Length[n] == 2,
{n1, n2} = n;
relations[[n1, n2]] = 1; relations[[n2, n1]] = 1];
) &, singleEdges
(* Build a neighborhood list *)
triangleNeigbours =
Table[Flatten[Position[relations[[i]], 1]], {i,triangleEdges // Length}];
(* Test: Which triangles border on triangle number 1? *)
(* ==> {32, 61, 83} *)
(* Check this *)
polygons[[{1, 32, 61, 83}]]
(* ==> {{1, 2, 3}, {3, 2, 52}, {1, 3, 50}, {19, 2, 1}} *)
(* Indeed, they all share an edge with #1 *)
You can use the low level output functions described here to output these. I'll leave the details to you (that's my challenge to you).
The area of the wing is the summed area of the individual polygons. The individual areas can be calculated as follows:
polygonArea[pts_List] :=
Module[{dtpts = Append[pts, pts[[1]]]},
If[Length[pts] < 3,
1/2 Sum[Det[{dtpts[[i]], dtpts[[i + 1]]}], {i, 1, Length[dtpts] - 1}]
based on this Mathworld page.
The area is signed BTW, so you may want to use Abs.
The above area function is only usable for general polygons in 2D. For the area of a triangle in 3D the following can be used:
polygonArea[pts_List?(Length[#] == 3 &)] :=
Norm[Cross[pts[[2]] - pts[[1]], pts[[3]] - pts[[1]]]]/2