How to use the np.where function together with the index of each element of the array? - numpy

cashflow = [0] + [10] * 7
# [0, 10, 10, 10, 10, 10, 10, 10]
for index in range(len(cashflow)):
growth_cashflow = 1.05**index * cashflow[index]
or
growth_cashflow = [1.05**index*pmt[index] for index in range(len(pmt))]
the result is:
[10.0, 10.5, 11.025, 11.576250000000002, 12.155062500000003, 12.762815625000004, 13.400956406250003]
But is it possible to get the same result with np.where?
cf = np.array(cashflow)
s = np.where(cf >= 0, 1.05**cf.index*cf, cf)
ERROR => AttributeError: 'numpy.ndarray' object has no attribute 'index'
Is it possible to get the index of each item and use it in the above multiplication?
If not, is there another way to do numpy without using for?

import numpy as np
cf=np.array([10, 10, 10, 10, 10, 10, 10])
s = cf*1.05**np.arange(len(cf))
print(s)
This should give you the output you are looking for. If you really want to get specific indices, you may want to use np.nonzero or np.argwhere.

Related

numpy append in a for loop with different sizes

I have a for loop but where i has changes by 2 and i want to save a value in a numpy array in each iteration that that changes by 1.
n = 8 #steps
# random sequence
rand_seq = np.zeros(n-1)
for i in range(0, (n-1)*2, 2):
curr_state= i+3
I want to get curr_state outside the loop in the rand_seq array (seven values).
can you help me with that?
thanks a lot
A much simpler version (if I understand the question correctly) would be:
np.arange(3, 15+1, 2)
where 3 = start, 15 = stop, 2 = step size.
In general, when using numpy try to avoid adding elements in a for loop as this is inefficient. I would suggest checking out the documentation of np.arange(), np.array() and np.zeros() as in my experience, these will solve 90% of array - creation issues.
A straight forward list iteration:
In [313]: alist = []
...: for i in range(0,(8-1)*2,2):
...: alist.append(i+3)
...:
In [314]: alist
Out[314]: [3, 5, 7, 9, 11, 13, 15]
or cast as a list comprehension:
In [315]: [i+3 for i in range(0,(8-1)*2,2)]
Out[315]: [3, 5, 7, 9, 11, 13, 15]
Or if you make an array with the same range parameters:
In [316]: arr = np.arange(0,(8-1)*2,2)
In [317]: arr
Out[317]: array([ 0, 2, 4, 6, 8, 10, 12])
you can add the 3 with one simple expression:
In [318]: arr + 3
Out[318]: array([ 3, 5, 7, 9, 11, 13, 15])
With lists, iteration and comprehensions are great. With numpy you should try to make an array, such as with arange, and modify that with whole-array methods (not with iterations).

find the array index which its element is most near greater than a value

I have a sorted array.
x = [1, 10, 12, 16, 19, 20, 21, ....]
for any given number y which is between [x[0], x[-1]], I want to find the index of the element which is the most near greater than y, for example, if y = 0, it returns 0, if y = 18, it returns 4
Is there a function available?
Without any external library, you can use bisect
i = bisect.bisect_right(x, y)
i will be the index of the element you wanted.
Given the sorted nature, we can use np.searchsorted -
idx = np.searchsorted(x,y,'right')
You can use numpy.argmin on the absolute value of the difference:
import numpy as np
x = np.array([1, 10, 12, 16, 19, 20, 21])
def find_closest(x,y):
return (np.abs(x-y)).argmin()
for y in [0,18]:
print(find_closest(x,y))
0
4

Finding those elements in an array which are "close"

I have an 1 dimensional sorted array and would like to find all pairs of elements whose difference is no larger than 5.
A naive approach would to be to make N^2 comparisons doing something like
diffs = np.tile(x, (x.size,1) ) - x[:, np.newaxis]
D = np.logical_and(diffs>0, diffs<5)
indicies = np.argwhere(D)
Note here that the output of my example are indices of x. If I wanted the values of x which satisfy the criteria, I could do x[indicies].
This works for smaller arrays, but not arrays of the size with which I work.
An idea I had was to find where there are gaps larger than 5 between consecutive elements. I would split the array into two pieces, and compare all the elements in each piece.
Is this a more efficient way of finding elements which satisfy my criteria? How could I go about writing this?
Here is a small example:
x = np.array([ 9, 12,
21,
36, 39, 44, 46, 47,
58,
64, 65,])
the result should look like
array([[ 0, 1],
[ 3, 4],
[ 5, 6],
[ 5, 7],
[ 6, 7],
[ 9, 10]], dtype=int64)
Here is a solution that iterates over offsets while shrinking the set of candidates until there are none left:
import numpy as np
def f_pp(A, maxgap):
d0 = np.diff(A)
d = d0.copy()
IDX = []
k = 1
idx, = np.where(d <= maxgap)
vidx = idx[d[idx] > 0]
while vidx.size:
IDX.append(vidx[:, None] + (0, k))
if idx[-1] + k + 1 == A.size:
idx = idx[:-1]
d[idx] = d[idx] + d0[idx+k]
k += 1
idx = idx[d[idx] <= maxgap]
vidx = idx[d[idx] > 0]
return np.concatenate(IDX, axis=0)
data = np.cumsum(np.random.exponential(size=10000)).repeat(np.random.randint(1, 20, (10000,)))
pairs = f_pp(data, 1)
#pairs = set(map(tuple, pairs))
from timeit import timeit
kwds = dict(globals=globals(), number=100)
print(data.size, 'points', pairs.shape[0], 'close pairs')
print('pp', timeit("f_pp(data, 1)", **kwds)*10, 'ms')
Sample run:
99963 points 1020651 close pairs
pp 43.00256529124454 ms
Your idea of slicing the array is a very efficient approach. Since your data are sorted you can just calculate the difference and split it:
d=np.diff(x)
ind=np.where(d>5)[0]
pieces=np.split(x,ind)
Here pieces is a list, where you can then use in a loop with your own code on every element.
The best algorithm is highly dependent on the nature of your data which I'm unaware. For example another possibility is to write a nested loop:
pairs=[]
for i in range(x.size):
j=i+1
while x[j]-x[i]<=5 and j<x.size:
pairs.append([i,j])
j+=1
If you want it to be more clever, you can edit the outer loop in a way to jump when j hits a gap.

Group numpy into multiple sub-arrays using an array of values

I have an array of points along a line:
a = np.array([18, 56, 32, 75, 55, 55])
I have another array that corresponds to the indices I want to use to access the information in a (they will always have equal lengths). Neither array a nor array b are sorted.
b = np.array([0, 2, 3, 2, 2, 2])
I want to group a into multiple sub-arrays such that the following would be possible:
c[0] -> array([18])
c[2] -> array([56, 75, 55, 55])
c[3] -> array([32])
Although the above example is simple, I will be dealing with millions of points, so efficient methods are preferred. It is also essential later that any sub-array of points can be accessed in this fashion later in the program by automated methods.
Here's one approach -
def groupby(a, b):
# Get argsort indices, to be used to sort a and b in the next steps
sidx = b.argsort(kind='mergesort')
a_sorted = a[sidx]
b_sorted = b[sidx]
# Get the group limit indices (start, stop of groups)
cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
# Split input array with those start, stop ones
out = [a_sorted[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
return out
A simpler, but lesser efficient approach would be to use np.split to replace the last few lines and get the output, like so -
out = np.split(a_sorted, np.flatnonzero(b_sorted[1:] != b_sorted[:-1])+1 )
Sample run -
In [38]: a
Out[38]: array([18, 56, 32, 75, 55, 55])
In [39]: b
Out[39]: array([0, 2, 3, 2, 2, 2])
In [40]: groupby(a, b)
Out[40]: [array([18]), array([56, 75, 55, 55]), array([32])]
To get sub-arrays covering the entire range of IDs in b -
def groupby_perID(a, b):
# Get argsort indices, to be used to sort a and b in the next steps
sidx = b.argsort(kind='mergesort')
a_sorted = a[sidx]
b_sorted = b[sidx]
# Get the group limit indices (start, stop of groups)
cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
# Create cut indices for all unique IDs in b
n = b_sorted[-1]+2
cut_idxe = np.full(n, cut_idx[-1], dtype=int)
insert_idx = b_sorted[cut_idx[:-1]]
cut_idxe[insert_idx] = cut_idx[:-1]
cut_idxe = np.minimum.accumulate(cut_idxe[::-1])[::-1]
# Split input array with those start, stop ones
out = [a_sorted[i:j] for i,j in zip(cut_idxe[:-1],cut_idxe[1:])]
return out
Sample run -
In [241]: a
Out[241]: array([18, 56, 32, 75, 55, 55])
In [242]: b
Out[242]: array([0, 2, 3, 2, 2, 2])
In [243]: groupby_perID(a, b)
Out[243]: [array([18]), array([], dtype=int64),
array([56, 75, 55, 55]), array([32])]

Python pandas json 2D array

relatively new to pandas, I have a json and python files:
{"dataset":{
"id": 123,
"data": [["2015-10-16",1,2,3,4,5,6],
["2015-10-15",7,8,9,10,11,12],
["2015-10-14",13,14,15,16,17]]
}}
&
import pandas
x = pandas.read_json('sample.json')
y = x.dataset.data
print x.dataset
Printing x.dataset and y works fine, but when I go to access a sub-element y, it returns a 'buffer' type. What's going on? How can I access the data inside the array? Attempting y[0][1] it returns out of bounds error, and iterating through returns a strange series of 'nul' characters and yet, it appears to be able to return the first portion of the data after printing x.dataset...
The data attribute of a pandas Series points to the memory buffer of all the data contained in that series:
>>> df = pandas.read_json('sample.json')
>>> type(df.dataset)
pandas.core.series.Series
>>> type(df.dataset.data)
memoryview
If you have a column/row named "data", you have to access it by it's string name, e.g.:
>>> type(df.dataset['data'])
list
Because of surprises like this, it's usually considered best practice to access columns through indexing rather than through attribute access. If you do this, you will get your desired result:
>>> df['dataset']['data']
[['2015-10-16', 1, 2, 3, 4, 5, 6],
['2015-10-15', 7, 8, 9, 10, 11, 12],
['2015-10-14', 13, 14, 15, 16, 17]]
>>> arr = df['dataset']['data']
>>> arr[0][0]
'2015-10-16'