How to modify/update an array in numpy without changing original - numpy

Is there a Numpy operation that does this?
a = np.array([1,2,3])
b = np.some_update_method(a, 0, 99) # b is array([99, 2, 3]), a is unchanged.
In J this is called "amend", but I don't know what it might be called in Numpy (if it exists).

You can make a copy of the original array and then modify it in place:
b = a.copy()
b[0] = 99
b
# [99 2 3]
a
# [1 2 3]

Import copy and use deepcopy.
import copy
a = np.array([1,2,3])
b = copy.deepcopy(a)
#modify b
You can use just copy to keep the original the same (the copied elements will be referenced and dependent from the original list), deepcopy will keep each object completely independent, but use twice the memory and whatever time it takes to reproduce the object.

Related

Merge masked selection of array with original array

I'm facing a problem with an assignment at the moment.
So I have an array which contains 400 2d Points. So an array of shape 400 X 2.
Then I have a mask that selects m points (rows) that I wanna compute some changes on.
As per the assignment I'm supposed to store the points that I want to change in an array of shape m X 2.
Then I do my changes on this resulting array. But now after the changes I want to insert these new computed values in my original array at the original indices. And I just have no clue how to do that.
So I basically have:
orig (400 X 2)
mask (400 X 1) (boolean mask selecting the rows to edit)
change (m X 2) (just the changes I want to add)
changed (m X 2) (the original values + the change (with a factor applied) added together
How do I transform my change or changed arrays with the mask so that I can add/insert the changes into my original array?
Look at this example with 4 rows.
The principle is that the mask that "extract" from orig can also return the sub-array to the original place.
import numpy as np
x = np.array([[1,2],[3,4],[5,6],[7,8]])
print(x)
mask_ix = np.array([True,False, True, False])
masked = x[mask_ix,:]
masked = masked * 10 # the change
print(masked)
x[mask_ix] = masked # return to the original x in the mask_ix mask
print(x)
x =[[1 2]
[3 4]
[5 6]
[7 8]]
masked = [[10 20]
[50 60]]
x = [[10 20]
[ 3 4]
[50 60]
[ 7 8]]

How to build a numpy matrix one row at a time?

I'm trying to build a matrix one row at a time.
import numpy as np
f = np.matrix([])
f = np.vstack([ f, np.matrix([1]) ])
This is the error message.
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 1
As you can see, np.matrix([]) is NOT an empty list. I'm going to have to do this some other way. But what? I'd rather not do an ugly workaround kludge.
you have to pass some dimension to the initial matrix. Either fill it with some zeros or use np.empty():
f = np.empty(shape = [1,1])
f = np.vstack([f,np.matrix([1])])
you can use np.hstack instead for the first case, then use vstack iteratively.
arr = np.array([])
arr = np.hstack((arr, np.array([1,1,1])))
arr = np.vstack((arr, np.array([2,2,2])))
Now you can convert into a matrix.
mat = np.asmatrix(arr)
Good grief. It appears there is no way to do what I want. Kludgetown it is. I'll build an array with a bogus first entry, then when I'm done make a copy without the bogosity.

How to obtain a matrix by adding one more new row vector within an iteration?

I am generating arrays (technically they are row vectors) with a for-loop. a, b, c ... are the outputs.
Can I add the new array to the old ones together to form a matrix?
import numpy as np
# just for example:
a = np.asarray([2,5,8,10])
b = np.asarray([1,2,3,4])
c = np.asarray([2,3,4,5])
... ... ... ... ...
I have tried ab = np.stack((a,b)), and this could work. But my idea is to always add a new row to the old matrix in a new loop, but with abc = np.stack((ab,c)) then there would be an error ValueError: all input arrays must have the same shape.
Can anyone tell me how I could add another vector to an already existing matrix? I couldn´t find a perfect answer in this forum.
np.stack wouldn't work, you can only stack arrays with same dimensions.
One possible solution is to use np.concatenate((original, to_append), axis = 0) each time. Check the docs.
You can also try using np.append.
Thanks to the ideas from everybody, the best solution of this problem is to append nparray or list to a list during the iteration and convert the list to a matrix using np.asarray in the end.
a = np.asarray([2,5,8,10]) # or a = [2,5,8,10]
b = np.asarray([1,2,3,4]) # b = [1,2,3,4]
c = np.asarray([2,3,4,5]) # c = [2,3,4,5]
... ...
l1 = []
l1.append(a)
l1.append(b)
l1.append(c)
... ...
l1don´t have to be empty, however, the elements which l1 already contained should be the same type as the a,b,c
For example, the difference between l1 = [1,1,1,1] and l1 = [[1,1,1,1]] is huge in this case.

When does pandas do pass-by-reference Vs pass-by-value when passing dataframe to a function?

def dropdf_copy(df):
df = df.drop('y',axis=1)
def dropdf_inplace(df):
df.drop('y',axis=1,inplace=True)
def changecell(df):
df['y'][0] = 99
x = pd.DataFrame({'x': [1,2],'y': [20,31]})
x
Out[204]:
x y
0 1 20
1 2 31
dropdf_copy(x)
x
Out[206]:
x y
0 1 20
1 2 31
changecell(x)
x
Out[208]:
x y
0 1 99
1 2 31
In the above example dropdf() doesnt modify the original dataframe x while changecell() modifies x. I know if I add the minor change to changecell() it wont change x.
def changecell(df):
df = df.copy()
df['y'][0] = 99
I dont think its very elegant to inlcude df = df.copy() in every function I write.
Questions
1) Under what circumstances does pandas change the original dataframe and when it does not? Can someone give me a clear generalizable rule? I know it may have something to do with mutability Vs immutability but its not clearly explained in stackoverflow.
2) Does numpy behave simillary or its different? What about other python objects?
PS: I have done research in stackoverflow but couldnt find a clear generalizable rule for this problem.
By default python does pass by reference. Only if a explicit copy is made in the function like assignment or a copy() function is used the original object passed is unchanged.
Example with explicit copy :
#1. Assignment
def dropdf_copy1(df):
df = df.drop('y',axis=1)
#2. copy()
def dropdf_copy2(df):
df = df.copy()
df.drop('y',axis=1,inplace = True)
If explicit copy is not done then original object passed is changed.
def dropdf_inplace(df):
df.drop('y',axis=1,inplace = True)
Nothing to deal with pandas. It'a problem of local/global variables on mutable values. in dropdf, you set df as a local variable.
The same with lists:
def global_(l):
l[0]=1
def local_(l):
l=l+[0]
in the second function, it will be the same if you wrote :
def local_(l):
l2=l+[0]
so you don't affect l.
Here the python tutor exemple which shoes what happen.

numpy array assigning values

technically what is the difference between
import numpy as np
a = np.random.random((100,3))
b = numpy.empty((100))
# what the difference between
b = a[:,0]
# and
b[:] = a[:,0]
The reason I am asking that I am reading b with a fortran compiled function and the slicing in b is making all the difference. This has something to do with the column and row reading style between C and fortran. In default numpy convention is the C one.
The main difference is that
b = a[:,0]
creates a view onto a's data, whereas
b[:] = a[:,0]
makes a copy of the data.
The former uses the same memory layout as a, whereas the latter preserves the memory layout of the original b. In particular this means that in the latter case all the data gets compacted into consecutive memory locations:
In [29]: b = numpy.empty((100))
In [30]: b = a[:,0]
In [31]: b.strides
Out[31]: (24,)
In [32]: b = numpy.empty((100))
In [33]: b[:] = a[:,0]
In [34]: b.strides
Out[34]: (8,)