how can i add elments together inside the array in numpy? - numpy

here's the code:
print(array)
here's part of the outcomes:
array([[1.09080648e-07, 1.27947783e-07, 1.35521106e-07, 2.36965352e-03,
1.76941751e-07, 6.02428392e-03, 1.93768765e-07],
[1.17183374e-03, 1.54375957e-03, 4.94265019e-04, 1.72861062e-07,
7.56083752e-04, 5.68696862e-03, 3.03002388e-04],...)
if i want to add elements in each row of the array, what should i do ?
i can't directly use .sum() because it will get a sum total...
can i use a double for loop?
what should i do next?
it seems that i am very close to the answer but this is kind of ugent...
THANKS IN ADVANCE!

If you have an array with shape (N,M):
use array.sum(axis=0) to sum all values in the same column, obtaining an array with shape (M,);
use array.sum(axis=1) to sum all values in the same row, obtaining an array with shape (N,);
See the Numpy documentation for other details:
https://numpy.org/doc/stable/reference/generated/numpy.sum.html

Related

Error when filtering pandas dataframe by column value

I am having a problem with filtering a pandas dataframe. I am trying to filter a dataframe based on column values being equal to a specific list but I am getting a length error.
I tried every possible way of filtering a dataframe but got nowhere. Any help would be appreciated, thanks in advance.
Here is my code :
for ind in df_hourly.index:
timeslot = df_hourly['date_parsed'][ind][0:4] # List value to filter
filtered_df = df.loc[df['timeslot'] == timeslot]
Error : ValueError: ('Lengths must match to compare', (5696,), (4,))
Above Image : df , Below Image : df_hourly
In the above image, the dataframe I want to filter is shown. Specifically, I want to filter according to the "timeslot" column.
And the below image shows the the dataframe which includes the value I want to filter by. I specifically want to filter by "date_parsed" column. In the first line of my code, I iterate through every row in this dataframe and assign the first 4 elements of the list value in df_hourly["date_parsed"] to a variable and later in the code, I try to filter the above dataframe by that variable.
When comparing columns using ==, pandas try to compare value by value - aka does the first item equals to first item, second item to the second and so on. This is why you receive this error - pandas expects to have two columns of the same shape.
If you want to compare if value is inside a list, you can use the .isin (documentation):
df.loc[df['timeslot'].isin(timeslot)]
Depends on what timeslot is exactly, you might to take timeslot.values or something like that (hard to understand exactly without giving an example for your dataframe)

Finding smallest dtype to safely cast an array to

Let's say I want to find the smallest data type I can safely cast this array to, to save it as efficiently as possible. (The expected output is int8.)
arr = np.array([-101,125,6], dtype=np.int64)
The most logical solution seems something like
np.min_scalar_type(arr) # dtype('int64')
but that function doesn't work as expected for arrays. It just returns their original data type.
The next thing I tried is this:
np.promote_types(np.min_scalar_type(arr.min()), np.min_scalar_type(arr.max())) # dtype('int16')
but that still doesn't output the smallest possible data type.
What's a good way to achieve this?
Here's a working solution I wrote. It will only work for integers.
def smallest_dtype(arr):
arr_min = arr.min()
arr_max = arr.max()
for dtype_str in ["u1", "i1", "u2", "i2", "u4", "i4", "u8", "i8"]:
if (arr_min >= np.iinfo(np.dtype(dtype_str)).min) and (arr_max <= np.iinfo(np.dtype(dtype_str)).max):
return np.dtype(dtype_str)
This is close to your initial idea:
np.result_type(np.min_scalar_type(arr.min()), arr.max())
It will take the signed int8 from arr.min() if arr.max() fits inside of it.

Numpy - how do I erase elements of an array if it is found in an other array

TLDR: I have 2 arrays indices = numpy.arange(9) and another that contains some of the numbers in indices (maybe none at all, maybe it'll contain [2,4,7]). The output I'd like for this example is [0,1,3,5,6,8]. What method can be used to achieve this?
Edit: I found a method which works somewhat: casting both arrays to a set then taking the difference of the two does give the correct result, but as a set, even if I pass this result to a numpy.array(). I'll update this if I find a solution for that.
Edit2: Casting the result of the subtraction to a list, then casting passing that to a numpy.array() resolved my issue.
I guess I posted this question a little prematurely, given that I found the solution for it myself, but maybe this'll be useful to somebody in future!
You can make use of boolean masking:-
indices[~numpy.isin(indices,[2,4,7])]
Explanation:-
we are using numpy.isin() method to find out the values exists or not in incides array and then using ~ so that this gives opposite result and finally we are passing this boolean mask to indices

Pandas dataframe: create new column using conditional and slicing string

I have this piece of code that creates a new dataframe column, using first a conditional, and then slicing some string, with a fixed slicing index (0, 5):
df.loc[df['operation'] == 'dividend', ['order_adj']] = df['comment'].str.slice(0, 5)
But, instead of having a fixed slicing index, I need to use str.find() at the final of this code, to have a dynamic slice index on df['comment'], based on its characters.
As I'm creating a new column by broadcasting, I couldn't find the correct sintaxe to use str.find('some_string') inside str.slice(). Thanks.
Option using split:
df['comment'].str.split("some_string").str[0]
Or option using regex (move the capture group to be where you want regarding inclusive/exclusive):
pandas.Series.str.extract("(.*?)some_string")
pandas.Series.str.extract("(.*?some_string)")

How to assign a value to a specific mgrid entry in matplotlib?

I am new to matplotlib and scipy. I want to create a two dimensional mgrid in matplotlib and assign individual cells in this two dimensional array to values that I have generated. How can I do it? I am looking for an assignment function such as a[i,j] = k but I cant find one. Any clues?
Thanks in advance.
Ranga
OK. I think I found the answer. What I had wanted to do was better done with an numpy.array. So the way to do this (for me) was :
t = []
zeroRow = []
for j in range(cols):
zeroRow.append(0)
for i in range(rows):
t.append(zeroRow)
spectrogramData = np.array(t,float)
Later I read the values from a file where the row and column are stored and assign to the spectrogramData
spectrogramData[row][column] = valueRead
My confusion was not knowing how to access the wrapped array. It is accessed like any two dimensional array.
Thanks for responding!