Percentile of every column and row in a dataframe

Percentile of every column and row in a dataframe - pandas

I have a csv that looks like the image below. I want to calculate the percentile(10,50,90) of each row starting from B2 to X2 and adding that final percentile in a new column. Essentially, I want to find the 10th percetile of the average(std, cv, sp_tim.....) value over the entire period of record available.
I have created the following code line to read it in python as a dataframe format so far.
da = pd.read_csv('Project/11433300_annual_flow_matrix.csv', index_col=0, parse_dates=True)

If I have understood your question correctly then below code might be helpful for you:
I have Used some Dummy data, and given similar kind of treatment on it which you are looking for
aq = [1, 2, 2, 3, 3, 4, 4, 5, 7, 8, 10, 11]
aw = [91, 25, 13, 53, 95, 94, 75, 35, 57, 88, 111, 12]
df = pd.DataFrame({'aq': aq, 'aw': aw})
n = df.shape[0]
p = 0.1 #for 10th percentile
position = np.ceil(n*p)
position = int(position)
df.iloc[position,]
Kindly have a look and let me know if this is works for you.

Related

Numpy subarrays and relative indexing

I have been searching if there is an standard mehtod to create a subarray using relative indexes. Take the following array into consideration:
>>> m = np.arange(25).reshape([5, 5])
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
I want to access the 3x3 matrix at a specific array position, for example [2,2]:
>>> x = 2, y = 2
>>> m[slice(x-1,x+2), slice(y-1,y+2)]
array([[ 6, 7, 8],
[11, 12, 13],
[16, 17, 18]])
For example for the above somethig like m.subarray(pos=[2,2], shape=[3,3])
I want to sample a ndarray of n dimensions on a specific position which might change.
I did not want to use a loop as it might be inneficient. Scipy functions correlate and convolve do this very efficiently, but for all positions. I am interested only in the sampling of one.
The best answer could solve the issues at edges, in my case I would like for example to have wrap mode:
(a b c d | a b c d | a b c d)
--------------------EDITED-----------------------------
Based on the answer from #Carlos Horn, I could create the following function.
def cell_neighbours(array, index, shape):
pads = [(floor(dim/2), ceil(dim / 2)) for dim in shape]
array = np.pad(self.configuration, pads, "wrap")
views = np.lib.stride_tricks.sliding_window_view
return views(array, shape)[tuple(index)]
Last concern might be about speed, from docs: For many applications using a sliding window view can be convenient, but potentially very slow. Often specialized solutions exist.
From here maybe is easier to get a faster solution.

You could build a view of 3x3 matrices into the array as follows:
import numpy as np
m = np.arange(25).reshape(5,5)
m3x3view = np.lib.stride_tricks.sliding_window_view(m, (3,3))
Note that it will change slightly your indexing on half the window size meaning
x_view = x - 3//2
y_view = y - 3//2
print(m3x3view[x_view,y_view]) # gives your result
In case a copy operation is fine, you could use:
mpad = np.pad(m, 1, mode="wrap")
mpad3x3view = np.lib.stride_tricks.sliding_window_view(mpad, (3,3))
print(mpad3x3view[x % 5,y % 5])
to use arbitrary x, y integer values.

Generate & List all Lucky numbers from 1 to n using Numpy

A lucky number is found by listing all numbers up to n.
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
And then remove every second number so we get: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
Now the next number after 1 here is 3 so now remove every third number:
1,3,7,9,13,15,19,21,25,27,29
Now the next number after 3 is 7, so now remove every seventh number:
1,3,7,9,13,15,21,25,27,29
And the next number after 7 in our list is 9 so now remove every ninth number.
etc
The remaining numbers are lucky numbers: 1,3,7,9,13,15,21,25,31
Hello, I am a relatively new Python programmer who is trying to figure this out.
I did not even come close to solving this, and I want them up to the 100 billions so an advice of the best way to go about this is welcome. here is my best try to get this done in Numpy:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])
b = a[::2] #using skip of 2 on our array
c = b[::3] #using skip of 3 on our array
d = c[::7] #using skip of 7 on our array
e = d[::9] #using skip of 9 on our array
print(e)
It returns only 1 so this requires more advanced programming to find the lucky numbers,
I need some clever programming in order to automatically find the next skip also since I can't input millions of skips like I have done here with the skips of 2, 3, 7 & 9.

IIUC, one way using while loop with checker:
def find_lucky(n):
arr = list(range(1, n+1))
done = set()
ind = 1
while len(arr) >= (i:= arr[ind]):
if i in done:
ind += 1
else:
del arr[i-1::i]
done.add(i)
return arr
Output:
find_lucky(32)
# [1, 3, 7, 9, 13, 15, 21, 25, 31]

Compute moving average on a dynamic rolling window

OK, need some help here! I have the following dataframe.
df2 = {'Value': [123, 126, 120, 121, 123, 126, 120, 121, 123, 126],
'Look-back': [2, 3, 4, 5, 3, 6, 2, 4, 2, 1]}
df2 = pd.DataFrame(df2)
df2
I'd like to add a third row that shows the simple moving average of the 'Value' column with the rolling look-back period of the 'Look-back' column. My thought was to do this.
df2['Average'] = df2['Value'].rolling(df['Look-back']).mean()
Of course this doesn't work because the rolling() function needs an integer key value and I'm supplying a series.
How do I get what I'm after here?

Pandas columns by given value in last row

Below my dataframe "df" made of 34 columns (pairs of stocks) and 530 rows (their respective cumulative returns). 'Date' is the index
Now, my target is to consider last row (Date=3 Febraury 2021). I want to plot ONLY those columns (pair stocks) that have a positive return on last Date.
I started with:
n=list()
for i in range(len(df.columns)):
if df.iloc[-1,i] >0:
n.append(i)
Output: [3, 11, 12, 22, 23, 25, 27, 28, 30]
Now, final step is to create a subset dataframe of 'df' containing only columns belonging to those numbers in this list. This is where I have problems. Have you any idea? Thanks

Does this solve your problem?
n = []
for i, col in enumerate(df.columns):
if df.iloc[-1,i] > 0:
n.append(col)
df[n]
Here you are ;)

sample df:
a b c
date
2017-04-01 0.5 -0.7 -0.6
2017-04-02 1.0 1.0 1.3
df1.loc[df1.index.astype(str) == '2017-04-02', df1.ge(1.2).any()]
c
date
2017-04-02 1.3
the logic will be same for your case also.

If I understand correctly, you want columns with IDs [3, 11, 12, 22, 23, 25, 27, 28, 30], am I right?
You should use DataFrame.iloc:
column_ids = [3, 11, 12, 22, 23, 25, 27, 28, 30]
df_subset = df.iloc[:, column_ids].copy()
The ":" on the left side of df.iloc means "all rows". I suggest using copy method in case you want to perform additional operations on df_subset without the risk to affect the original df, or raising Warnings.
If instead of a list of column IDs, you have a list of column names, you should just replace .iloc with .loc.

matplotlib plot data with nans

I'm surprised how few are the posts relating to this problem. Anyway...
here it is:
I have csv data files containing X values in the first column, and several Y values columns thereafter. But for a given X value not all Y series have a corresponding value. Here is an example:
0, 16, 96, 99
10, 88, 45, 85
20, 85, 61, 10
30, 30, --, 45
40, 82, 28, 82
50, 23, 9, 61
60, 40, 77, 0
70, 26, 21, --
80, --, 58, 99
90, 1, 14, 30
when this csv data is loaded with numpy.genfromtxt, the '--' strings are taken as nan which is good. But when plotting, the plots are interrupted with blanks where there is a nan. Is there an option when a nan appears to make pyplot.plot() ignoring both the nan and the corresponding X value?

Not sure if matplotlib has such functionality built in, but you could home-brew it doing the following:
idx = ~numpy.isnan(Y)
pyplot.plot(X[idx], Y[idx])

Look at this post
As proposed in my answer there, I'd recommend using np.isfinite instead of np.isnan. There might be other reasons for your plot to have discontinuities, e.f., inf

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Percentile of every column and row in a dataframe - pandas

Related

Numpy subarrays and relative indexing

Generate & List all Lucky numbers from 1 to n using Numpy

Compute moving average on a dynamic rolling window

Pandas columns by given value in last row

matplotlib plot data with nans

Categories

Resources