quick pandas groupby calculations with cumprod - pandas

This question is linked to Speedup of pandas groupby. It is about speeding up a groubby cumproduct calculation. The DataFrame is 2D and has a multi index consisting of 3 integers.
The HDF5 file for the dataframe can be found here: http://filebin.ca/2Csy0E2QuF2w/phi.h5
The actual calculation that I'm performing is similar to this:
>>> phi = pd.read_hdf('phi.h5', 'phi')
>>> %timeit phi.groupby(level='atomic_number').cumprod()
100 loops, best of 3: 5.45 ms per loop
The other speedup that might be possible is that I do this calculation about 100 times using the same index structure but with different numbers. I wonder if it can somehow cache the index.
Any help will be appreciated.

Numba appears to work pretty well here. In fact, these results seem almost too good to be true with the numba function below being about 4,000x faster than the original method and 5x faster than plain cumprod without a groupby. Hopefully these are correct, let me know if there is an error.
np.random.seed(1234)
df=pd.DataFrame({ 'x':np.repeat(range(200),4), 'y':np.random.randn(800) })
df = df.sort('x')
df['cp_groupby'] = df.groupby('x').cumprod()
from numba import jit
#jit
def group_cumprod(x,y):
z = np.ones(len(x))
for i in range(len(x)):
if x[i] == x[i-1]:
z[i] = y[i] * z[i-1]
else:
z[i] = y[i]
return z
df['cp_numba'] = group_cumprod(df.x.values,df.y.values)
df['dif'] = df.cp_groupby - df.cp_numba
Test that both ways give the same answer:
all(df.cp_groupby==df.cp_numba)
Out[1447]: True
Timings:
%timeit df.groupby('x').cumprod()
10 loops, best of 3: 102 ms per loop
%timeit df['y'].cumprod()
10000 loops, best of 3: 133 µs per loop
%timeit group_cumprod(df.x.values,df.y.values)
10000 loops, best of 3: 24.4 µs per loop

pure numpy solution, assuming the data is sorted by the index, though no handling of NaN:
res = np.empty_like(phi.values)
l = 0
r = phi.index.levels[0]
for i in r:
phi.values[l:l+i,:].cumprod(axis=0, out=res[l:l+i])
l += i
about 40 times faster on the multiindex data from the question.
Though a problem is that this does rely on how pandas stores the data in its backend array. So it may stop working when pandas changes.
>>> phi = pd.read_hdf('phi.h5', 'phi')
>>> %timeit phi.groupby(level='atomic_number').cumprod()
100 loops, best of 3: 4.33 ms per loop
>>> %timeit np_cumprod(phi)
10000 loops, best of 3: 111 µs per loop

If you want a fast but not very pretty workaround, you could do something like the following. Here's some sample data and your default approach.
df=pd.DataFrame({ 'x':np.repeat(range(200),4), 'y':np.random.randn(800) })
df = df.sort('x')
df['cp_group'] = df.groupby('x').cumprod()
And here's the workaround. It's looks rather long (it is) but each individual step is simple and fast. (The timings are at the bottom.) The key is simply to avoid using groupby at all in this case by replacing with shift and such -- but because of that you also need to make sure your data is sorted by the groupby column.
df['cp_nogroup'] = df.y.cumprod()
df['last'] = np.where( df.x == df.x.shift(-1), 0, df.y.cumprod() )
df['last'] = np.where( df['last'] == 0., np.nan, df['last'] )
df['last'] = df['last'].shift().ffill().fillna(1)
df['cp_fast'] = df['cp_nogroup'] / df['last']
df['dif'] = df.cp_group - df.cp_fast
Here's what it looks like. 'cp_group' is your default and 'cp_fast' is the above workaround. If you look at the 'dif' column you'll see that several of these are off by very small amounts. This is just a precision issue and not anything to worry about.
x y cp_group cp_nogroup last cp_fast dif
0 0 1.364826 1.364826 1.364826 1.000000 1.364826 0.000000e+00
1 0 0.410126 0.559751 0.559751 1.000000 0.559751 0.000000e+00
2 0 0.894037 0.500438 0.500438 1.000000 0.500438 0.000000e+00
3 0 0.092296 0.046189 0.046189 1.000000 0.046189 0.000000e+00
4 1 1.262172 1.262172 0.058298 0.046189 1.262172 0.000000e+00
5 1 0.832328 1.050541 0.048523 0.046189 1.050541 2.220446e-16
6 1 -0.337245 -0.354289 -0.016364 0.046189 -0.354289 -5.551115e-17
7 1 0.758163 -0.268609 -0.012407 0.046189 -0.268609 -5.551115e-17
8 2 -1.025820 -1.025820 0.012727 -0.012407 -1.025820 0.000000e+00
9 2 1.175903 -1.206265 0.014966 -0.012407 -1.206265 0.000000e+00
Timings
Default method:
In [86]: %timeit df.groupby('x').cumprod()
10 loops, best of 3: 100 ms per loop
Standard cumprod but without the groupby. This should be a good approximation of the maximum possible speed you could achieve.
In [87]: %timeit df.cumprod()
1000 loops, best of 3: 536 µs per loop
And here's the workaround:
In [88]: %%timeit
...: df['cp_nogroup'] = df.y.cumprod()
...: df['last'] = np.where( df.x == df.x.shift(-1), 0, df.y.cumprod() )
...: df['last'] = np.where( df['last'] == 0., np.nan, df['last'] )
...: df['last'] = df['last'].shift().ffill().fillna(1)
...: df['cp_fast'] = df['cp_nogroup'] / df['last']
...: df['dif'] = df.cp_group - df.cp_fast
100 loops, best of 3: 2.3 ms per loop
So the workaround is about 40x faster for this sample dataframe but the speedup will depend on the dataframe (in particular on the number of groups).

Related

fastest way to use numpy.interp on a 2-D array

I have the following problem. I am trying to find the fastest way to use the interpolation method of numpy on a 2-D array of x-coordinates.
import numpy as np
xp = [0.0, 0.25, 0.5, 0.75, 1.0]
np.random.seed(100)
x = np.random.rand(10)
fp = np.random.rand(10, 5)
So basically, xp would be the x-coordinates of the data points, x would be an array containing the x-coordinates of the values I want to interpolate, and fp would be a 2-D array containing y-coordinates of the datapoints.
xp
[0.0, 0.25, 0.5, 0.75, 1.0]
x
array([ 0.54340494, 0.27836939, 0.42451759, 0.84477613, 0.00471886,
0.12156912, 0.67074908, 0.82585276, 0.13670659, 0.57509333])
fp
array([[ 0.89132195, 0.20920212, 0.18532822, 0.10837689, 0.21969749],
[ 0.97862378, 0.81168315, 0.17194101, 0.81622475, 0.27407375],
[ 0.43170418, 0.94002982, 0.81764938, 0.33611195, 0.17541045],
[ 0.37283205, 0.00568851, 0.25242635, 0.79566251, 0.01525497],
[ 0.59884338, 0.60380454, 0.10514769, 0.38194344, 0.03647606],
[ 0.89041156, 0.98092086, 0.05994199, 0.89054594, 0.5769015 ],
[ 0.74247969, 0.63018394, 0.58184219, 0.02043913, 0.21002658],
[ 0.54468488, 0.76911517, 0.25069523, 0.28589569, 0.85239509],
[ 0.97500649, 0.88485329, 0.35950784, 0.59885895, 0.35479561],
[ 0.34019022, 0.17808099, 0.23769421, 0.04486228, 0.50543143]])
The desired outcome should look like this:
array([ 0.17196795, 0.73908678, 0.85459966, 0.49980648, 0.59893702,
0.9344241 , 0.19840596, 0.45777785, 0.92570835, 0.17977264])
Again, looking for the fastest way to do cause this is a simplified version of my problem, which has a length of about 1 million versus 10.
Thanks
So basically you want output equivalent to
np.array([np.interp(x[i], xp, fp[i]) for i in range(x.size)])
But that for loop is going to make that pretty slow for large x.size
This should work:
def multiInterp(x, xp, fp):
i, j = np.nonzero(np.diff(np.array(xp)[None,:] < x[:,None]))
d = (x - xp[j]) / np.diff(xp)[j]
return fp[i, j] + np.diff(fp)[i, j] * d
EDIT: This works even better and can handle bigger arrays:
def multiInterp2(x, xp, fp):
i = np.arange(x.size)
j = np.searchsorted(xp, x) - 1
d = (x - xp[j]) / (xp[j + 1] - xp[j])
return (1 - d) * fp[i, j] + fp[i, j + 1] * d
Testing:
multiInterp2(x, xp, fp)
Out:
array([ 0.17196795, 0.73908678, 0.85459966, 0.49980648, 0.59893702,
0.9344241 , 0.19840596, 0.45777785, 0.92570835, 0.17977264])
Timing tests with original data:
%timeit multiInterp2(x, xp, fp)
The slowest run took 6.87 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 25.5 µs per loop
%timeit np.concatenate([compiled_interp(x[[i]], xp, fp[i]) for i in range(fp.shape[0])])
The slowest run took 4.03 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.3 µs per loop
Seems to be faster even for a small size of x
Let's try something much, much bigger:
n = 10000
m = 10000
xp = np.linspace(0, 1, n)
x = np.random.rand(m)
fp = np.random.rand(m, n)
%timeit b() # kazemakase's above
10 loops, best of 3: 38.4 ms per loop
%timeit multiInterp2(x, xp, fp)
100 loops, best of 3: 2.4 ms per loop
The advantages scale a lot better even than the complied version of np.interp
np.interp is basically a wrapper around the compiled numpy.core.multiarray.interp. We can shave off a bit of performance by using it directly:
from numpy.core.multiarray import interp as compiled_interp
def a(x=x, xp=xp, fp=fp):
return np.array([np.interp(x[i], xp, fp[i]) for i in range(fp.shape[0])])
def b(x=x, xp=xp, fp=fp):
return np.concatenate([compiled_interp(x[[i]], xp, fp[i]) for i in range(fp.shape[0])])
def multiInterp(x=x, xp=xp, fp=fp):
i, j = np.nonzero(np.diff(xp[None,:] < x[:,None]))
d = (x - xp[j]) / np.diff(xp)[j]
return fp[i, j] + np.diff(fp)[i, j] * d
Timing tests show that for the example arrays this is en par with Daniel Forsman's nice solution:
%timeit a()
10000 loops, best of 3: 44.7 µs per loop
%timeit b()
10000 loops, best of 3: 32 µs per loop
%timeit multiInterp()
10000 loops, best of 3: 33.3 µs per loop
update
For somewhat larger arrays multiInterp owns the floor:
n = 100
m = 1000
xp = np.linspace(0, 1, n)
x = np.random.rand(m)
fp = np.random.rand(m, n)
%timeit a()
100 loops, best of 3: 4.14 ms per loop
%timeit b()
100 loops, best of 3: 2.97 ms per loop
%timeit multiInterp()
1000 loops, best of 3: 1.42 ms per loop
But for even larger ones it falls behind:
n = 1000
m = 10000
%timeit a()
10 loops, best of 3: 43.3 ms per loop
%timeit b()
10 loops, best of 3: 32.9 ms per loop
%timeit multiInterp()
10 loops, best of 3: 132 ms per loop
Finally, for very big arrays (I'm on 32 bit) temporary arrays become a problem:
n = 10000
m = 10000
%timeit a()
10 loops, best of 3: 46.2 ms per loop
%timeit b()
10 loops, best of 3: 32.1 ms per loop
%timeit multiInterp()
# MemoryError

Nicer way to do nested dot products in numpy?

I'm finding this happening to me a lot: I want to compute a matrix multiplication of the sort (X^TX)^{-1}XX^T, or something along these lines. I end up doing something like
X = np.array([[1,2],[3,4]])
a = np.dot(np.transpose(X), X)
b = np.dot(np.linalg.inv(a), X)
answer = np.dot(b, np.transpose(X))
Is there a better way to do this without resorting to the np.matrix type? Is there a way to do transpose without typing np.transpose?
Let's explore the options a bit
inv=np.linalg.inv
def array1(X):
a = np.dot(X.T, X)
b = np.dot(inv(a), X)
return np.dot(b, X.T)
Basically your code, but using the method expression dot and .T notation.
Testing with your X:
In [12]: array1(X)
Out[12]:
array([[-13.5, -32.5],
[ 10. , 24. ]])
What's the matrix equivalent?
In [17]: M=np.matrix(X)
In [18]: (M.T*M).I*M*M.T
Out[18]:
matrix([[-13.5, -32.5],
[ 10. , 24. ]])
The matrix version is more compact, but is it clearer? It's not faster.
In [22]: timeit array1(X)
10000 loops, best of 3: 48.7 µs per loop
In [23]: timeit (M.T*M).I*M*M.T
10000 loops, best of 3: 95.4 µs per loop
First stab at a einsum equivalent
In [32]: np.einsum('ij,jk,lk',inv(np.einsum('ji,jk',X,X)),X,X)
Out[32]:
array([[-13.5, -32.5],
[ 10. , 24. ]])
In [33]: timeit np.einsum('ij,jk,lk',inv(np.einsum('ji,jk',X,X)),X,X)
10000 loops, best of 3: 55.1 µs per loop
basically the same as the dot version.
The matrix version shows me that I can simplify the array version to:
inv(X.T.dot(X)).dot(X.dot(X.T))
(same timing)

split data frame based on integer index

In pandas how do I split Series/dataframe into two Series/DataFrames where odd rows in one Series, even rows in different? Right now I am using
rng = range(0, n, 2)
odd_rows = df.iloc[rng]
This is pretty slow.
Use slice:
In [11]: s = pd.Series([1,2,3,4])
In [12]: s.iloc[::2] # even
Out[12]:
0 1
2 3
dtype: int64
In [13]: s.iloc[1::2] # odd
Out[13]:
1 2
3 4
dtype: int64
Here's some comparisions
In [100]: df = DataFrame(randn(100000,10))
simple method (but I think range makes this slow), but will work regardless of the index
(e.g. does not have to be a numeric index)
In [96]: %timeit df.iloc[range(0,len(df),2)]
10 loops, best of 3: 21.2 ms per loop
The following require an Int64Index that is range based (which is easy to get, just reset_index()).
In [107]: %timeit df.iloc[(df.index % 2).astype(bool)]
100 loops, best of 3: 5.67 ms per loop
In [108]: %timeit df.loc[(df.index % 2).astype(bool)]
100 loops, best of 3: 5.48 ms per loop
make sure to give it index positions
In [98]: %timeit df.take(df.index % 2)
100 loops, best of 3: 3.06 ms per loop
same as above but no conversions on negative indicies
In [99]: %timeit df.take(df.index % 2,convert=False)
100 loops, best of 3: 2.44 ms per loop
This winner is #AndyHayden soln; this only works on a single dtype
In [118]: %timeit DataFrame(df.values[::2],index=df.index[::2])
10000 loops, best of 3: 63.5 us per loop

Vectorized way of calculating row-wise dot product two matrices with Scipy

I want to calculate the row-wise dot product of two matrices of the same dimension as fast as possible. This is the way I am doing it:
import numpy as np
a = np.array([[1,2,3], [3,4,5]])
b = np.array([[1,2,3], [1,2,3]])
result = np.array([])
for row1, row2 in a, b:
result = np.append(result, np.dot(row1, row2))
print result
and of course the output is:
[ 26. 14.]
Straightforward way to do that is:
import numpy as np
a=np.array([[1,2,3],[3,4,5]])
b=np.array([[1,2,3],[1,2,3]])
np.sum(a*b, axis=1)
which avoids the python loop and is faster in cases like:
def npsumdot(x, y):
return np.sum(x*y, axis=1)
def loopdot(x, y):
result = np.empty((x.shape[0]))
for i in range(x.shape[0]):
result[i] = np.dot(x[i], y[i])
return result
timeit npsumdot(np.random.rand(500000,50),np.random.rand(500000,50))
# 1 loops, best of 3: 861 ms per loop
timeit loopdot(np.random.rand(500000,50),np.random.rand(500000,50))
# 1 loops, best of 3: 1.58 s per loop
Check out numpy.einsum for another method:
In [52]: a
Out[52]:
array([[1, 2, 3],
[3, 4, 5]])
In [53]: b
Out[53]:
array([[1, 2, 3],
[1, 2, 3]])
In [54]: einsum('ij,ij->i', a, b)
Out[54]: array([14, 26])
Looks like einsum is a bit faster than inner1d:
In [94]: %timeit inner1d(a,b)
1000000 loops, best of 3: 1.8 us per loop
In [95]: %timeit einsum('ij,ij->i', a, b)
1000000 loops, best of 3: 1.6 us per loop
In [96]: a = random.randn(10, 100)
In [97]: b = random.randn(10, 100)
In [98]: %timeit inner1d(a,b)
100000 loops, best of 3: 2.89 us per loop
In [99]: %timeit einsum('ij,ij->i', a, b)
100000 loops, best of 3: 2.03 us per loop
Note: NumPy is constantly evolving and improving; the relative performance of the functions shown above has probably changed over the years. If performance is important to you, run your own tests with the version of NumPy that you will be using.
Played around with this and found inner1d the fastest. That function however is internal, so a more robust approach is to use
numpy.einsum("ij,ij->i", a, b)
Even better is to align your memory such that the summation happens in the first dimension, e.g.,
a = numpy.random.rand(3, n)
b = numpy.random.rand(3, n)
numpy.einsum("ij,ij->j", a, b)
For 10 ** 3 <= n <= 10 ** 6, this is the fastest method, and up to twice as fast as its untransposed equivalent. The maximum occurs when the level-2 cache is maxed out, at about 2 * 10 ** 4.
Note also that the transposed summation is much faster than its untransposed equivalent.
The plot was created with perfplot (a small project of mine)
import numpy
from numpy.core.umath_tests import inner1d
import perfplot
def setup(n):
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
aT = numpy.ascontiguousarray(a.T)
bT = numpy.ascontiguousarray(b.T)
return (a, b), (aT, bT)
b = perfplot.bench(
setup=setup,
n_range=[2 ** k for k in range(1, 25)],
kernels=[
lambda data: numpy.sum(data[0][0] * data[0][1], axis=1),
lambda data: numpy.einsum("ij, ij->i", data[0][0], data[0][1]),
lambda data: numpy.sum(data[1][0] * data[1][1], axis=0),
lambda data: numpy.einsum("ij, ij->j", data[1][0], data[1][1]),
lambda data: inner1d(data[0][0], data[0][1]),
],
labels=["sum", "einsum", "sum.T", "einsum.T", "inner1d"],
xlabel="len(a), len(b)",
)
b.save("out1.png")
b.save("out2.png", relative_to=3)
You'll do better avoiding the append, but I can't think of a way to avoid the python loop. A custom Ufunc perhaps? I don't think numpy.vectorize will help you here.
import numpy as np
a=np.array([[1,2,3],[3,4,5]])
b=np.array([[1,2,3],[1,2,3]])
result=np.empty((2,))
for i in range(2):
result[i] = np.dot(a[i],b[i]))
print result
EDIT
Based on this answer, it looks like inner1d might work if the vectors in your real-world problem are 1D.
from numpy.core.umath_tests import inner1d
inner1d(a,b) # array([14, 26])
I came across this answer and re-verified the results with Numpy 1.14.3 running in Python 3.5. For the most part the answers above hold true on my system, although I found that for very large matrices (see example below), all but one of the methods are so close to one another that the performance difference is meaningless.
For smaller matrices, I found that einsum was the fastest by a considerable margin, up to a factor of two in some cases.
My large matrix example:
import numpy as np
from numpy.core.umath_tests import inner1d
a = np.random.randn(100, 1000000) # 800 MB each
b = np.random.randn(100, 1000000) # pretty big.
def loop_dot(a, b):
result = np.empty((a.shape[1],))
for i, (row1, row2) in enumerate(zip(a, b)):
result[i] = np.dot(row1, row2)
%timeit inner1d(a, b)
# 128 ms ± 523 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('ij,ij->i', a, b)
# 121 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.sum(a*b, axis=1)
# 411 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit loop_dot(a, b) # note the function call took negligible time
# 123 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
So einsum is still the fastest on very large matrices, but by a tiny amount. It appears to be a statistically significant (tiny) amount though!

Select a multiple-key cross section from a DataFrame

I have a DataFrame "df" with (time,ticker) Multiindex and bid/ask/etc data columns:
tod last bid ask volume
time ticker
2013-02-01 SPY 1600 149.70 150.14 150.17 1300
SLV 1600 30.44 30.38 30.43 3892
GLD 1600 161.20 161.19 161.21 3860
I would like to select a second-level (level=1) cross section using multiple keys. Right now, I can do it using one key, i.e.
df.xs('SPY', level=1)
which gives me a timeseries of SPY. What is the best way to select a multi-key cross section, i.e. a combined cross-section of both SPY and GLD, something like:
df.xs(['SPY', 'GLD'], level=1)
?
There are better ways of doing this with more recent versions of Pandas (see Multi-indexing using slicers in the changelog for version 0.14):
regression_df.loc[(slice(None), ['SPY', 'GLD']), :]
This can be made more readable with the use of pd.IndexSlice:
df.loc[pd.IndexSlice[:, ['SPY', 'GLD']], :]
With the convention idx = pd.IndexSlice, this becomes
df.loc[idx[:, ['SPY', 'GLD']], :]
I couldn't find a more direct way other than using select:
>>> df
last tod
A SPY 1 1600
SLV 2 1600
GLD 3 1600
>>> df.select(lambda x: x[1] in ['SPY','GLD'])
last tod
A SPY 1 1600
GLD 3 1600
For what it is worth, I did the following:
foo = pd.DataFrame(np.random.rand(12,3),
index=pd.MultiIndex.from_product([['A','B','C','D'],['Green','Red','Blue']],
names=['Letter','Color']),
columns=['X','Y','Z']).sort_index()
foo.reset_index()\
.loc[foo.reset_index().Color.isin({'Green','Red'})]\
.set_index(foo.index.names)
This approach is similar to select, but avoids iterating over all rows with a lambda.
However, I compared this to the Panel approach, and it appears the Panel solution is faster (2.91 ms for index/loc vs 1.48 ms for to_panel/to_frame:
foo.to_panel()[:,:,['Green','Red']].to_frame()
Times:
In [56]:
%%timeit
foo.reset_index().loc[foo.reset_index().Color.isin({'Green','Red'})].set_index(foo.index.names)
100 loops, best of 3: 2.91 ms per loop
In [57]:
%%timeit
foo2 = foo.reset_index()
foo2.loc[foo2.Color.eq('Green') | foo2.Color.eq('Red')].set_index(foo.index.names)
100 loops, best of 3: 2.85 ms per loop
In [58]:
%%timeit
foo2 = foo.reset_index()
foo2.loc[foo2.Color.ne('Blue')].set_index(foo.index.names)
100 loops, best of 3: 2.37 ms per loop
In [54]:
%%timeit
foo.to_panel()[:,:,['Green','Red']].to_frame()
1000 loops, best of 3: 1.18 ms per loop
UPDATE
After revisiting this topic (again), I observed the following:
In [100]:
%%timeit
foo2 = pd.DataFrame({k: foo.loc[k] for k in foo.index if k[1] in ['Green','Red']}).transpose()
foo2.index.names = foo.index.names
foo2.columns.names = foo2.columns.names
100 loops, best of 3: 1.97 ms per loop
In [101]:
%%timeit
foo2 = pd.DataFrame.from_dict({k: foo.loc[k] for k in foo.index if k[1] in ['Green','Red']}, orient='index')
foo2.index.names = foo.index.names
foo2.columns.names = foo2.columns.names
100 loops, best of 3: 1.82 ms per loop
If you don't care about preserving the original order and naming of the levels, you can use:
%%timeit
pd.concat({key: foo.xs(key, axis=0, level=1) for key in ['Green','Red']}, axis=0)
1000 loops, best of 3: 1.31 ms per loop
And if you are just selecting on the first level:
%%timeit
pd.concat({key: foo.loc[key] for key in ['A','B']}, axis=0, names=foo.index.names)
1000 loops, best of 3: 1.12 ms per loop
versus:
%%timeit
foo.to_panel()[:,['A','B'],:].to_frame()
1000 loops, best of 3: 1.16 ms per loop
Another Update
If you sort the index of the example foo, many of the times above improve (times have been updated to reflect a pre-sorted index). However, when the index is sorted, you can use the solution described by user674155:
%%timeit
foo.loc[(slice(None), ['Blue','Red']),:]
1000 loops, best of 3: 582 µs per loop
This is the most efficient and intuitive in my opinion (the user doesn't need to understand panels and how they are created from frames).
Note: even if the index has not yet been sorted, sorting the index of foo on the fly is comparable in performance to the to_panel option.
Convert to a panel, then indexing is direct
In [20]: df = pd.DataFrame(dict(time = pd.Timestamp('20130102'),
A = np.random.rand(3),
ticker=['SPY','SLV','GLD'])).set_index(['time','ticker'])
In [21]: df
Out[21]:
A
time ticker
2013-01-02 SPY 0.347209
SLV 0.034832
GLD 0.280951
In [22]: p = df.to_panel()
In [23]: p
Out[23]:
<class 'pandas.core.panel.Panel'>
Dimensions: 1 (items) x 1 (major_axis) x 3 (minor_axis)
Items axis: A to A
Major_axis axis: 2013-01-02 00:00:00 to 2013-01-02 00:00:00
Minor_axis axis: GLD to SPY
In [24]: p.ix[:,:,['SPY','GLD']]
Out[24]:
<class 'pandas.core.panel.Panel'>
Dimensions: 1 (items) x 1 (major_axis) x 2 (minor_axis)
Items axis: A to A
Major_axis axis: 2013-01-02 00:00:00 to 2013-01-02 00:00:00
Minor_axis axis: SPY to GLD