Consider the following sequence:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
which produces:
A B C D
0 56 83 99 46
1 40 70 22 51
2 70 9 78 33
3 65 72 79 87
4 0 6 22 73
.. .. .. .. ..
95 35 76 62 97
96 86 85 50 65
97 15 79 82 62
98 21 20 19 32
99 21 0 51 89
I can reverse the sequence with the following command:
df.iloc[::-1]
That gives me the following result:
A B C D
99 21 0 51 89
98 21 20 19 32
97 15 79 82 62
96 86 85 50 65
95 35 76 62 97
.. .. .. .. ..
4 0 6 22 73
3 65 72 79 87
2 70 9 78 33
1 40 70 22 51
0 56 83 99 46
How would I rewrite the code if I wanted to reverse the sequence every nth row, e.g. every 4th row?
IIUC, you want to reverse by chunk (3, 2, 1, 0, 8, 7, 6, 5…):
One option is to use groupby with a custom group:
N = 4
group = df.index//N
# if the index is not a linear range
# import numpy as np
# np.arange(len(df))//N
df.groupby(group).apply(lambda d: d.iloc[::-1]).droplevel(0)
output:
A B C D
3 45 33 73 77
2 91 34 19 68
1 12 25 55 19
0 65 48 17 4
7 99 99 95 9
.. .. .. .. ..
92 89 68 48 67
99 99 28 52 87
98 47 49 21 8
97 80 18 92 5
96 49 12 24 40
[100 rows x 4 columns]
A very fast method, based only on indexing is to use numpy to generate a list of the indices reversed by chunk:
import numpy as np
N = 4
idx = np.arange(len(df)).reshape(-1, N)[:, ::-1].ravel()
# array([ 3, 2, 1, 0, 7, 6, 5, 4, 11, ...])
# slice using iloc
df.iloc[idx]
Related
Using pandas, how do I return dataframe filtered by value of 2 in 'GEN' column, value 20 in 'AGE' column and exclude columns with name 'GEN' and 'BP'? Thanks in advance:)
AGE GEN BMI BP S1 S2 S3 S4 S5 S6 Y
59 2 32.1 101 157 93.2 38 4 4.8598 87 151
48 1 21.6 87 183 103.2 70 3 3.8918 69 75
72 2 30.5 93 156 93.6 41 4 4.6728 85 141
24 1 25.3 84 198 131.4 40 5 4.8903 89 206
50 1 23 101 192 125.4 52 4 4.2905 80 135
23 1 22.6 89 139 64.8 61 2 4.1897 68 97
20 2 22 90 160 99.6 50 3 3.9512 82 138
66 2 26.2 114 255 185 56 4.5 4.2485 92 63
60 2 32.1 83 179 119.4 42 4 4.4773 94 110
20 1 30 85 180 93.4 43 4 5.3845 88 310
You can do this -
cols = df.columns[~df.columns.isin(['GEN','BP'])]
out=df.loc[(df['GEN'] == 2) & (df['AGE'] == 20),cols]
OR
out=df.query("'GEN'==2 and 'AGE'==20").loc[cols]
I'm getting an error when I try to create a new calculated column by dividing two numbers by the total sum (1+1/3+2+1+1).
The error is:
ValueError: cannot join with no overlapping index names
This is currently what I have:
lvl0 a b
lvl1 bar foo bar foo
lvl2 fake real fake real fake real fake real
A0 B0 C0 D0 2 3 0 1 6 7 4 5
D1 10 11 8 9 14 15 12 13
C1 D0 18 19 16 17 22 23 20 21
D1 26 27 24 25 30 31 28 29
C2 D0 34 35 32 33 38 39 36 37
D1 42 43 40 41 46 47 44 45
C3 D0 50 51 48 49 54 55 52 53
D1 58 59 56 57 62 63 60 61
first = dfmi['a', 'foo', 'fake'] + dfmi['a', 'bar', 'fake']
dfmi = dfmi.loc[:, ('a', 'Fake %', 'Fake Calculation')] = first.div(dfmi.sum(axis=1, level=0), level=0)
You should be able to copy and paste to test
# creating MultiIndex
def mklbl(prefix, n):
return ["%s%s" % (prefix, i) for i in range(n)]
miindex = pd.MultiIndex.from_product([mklbl('A', 4),
mklbl('B', 2),
mklbl('C', 4),
mklbl('D', 2)])
micolumns = pd.MultiIndex.from_tuples([('a', 'foo', 'fake'),
('a', 'foo', 'real'),
('a', 'bar', 'fake'),
('a', 'bar', 'real'),
('b', 'foo', 'fake'),
('b', 'foo', 'real'),
('b', 'bar', 'fake'),
('b', 'bar', 'real'),
],
names=['lvl0', 'lvl1', 'lvl2'])
dfmi = pd.DataFrame(np.arange(len(miindex) * len(micolumns))
.reshape((len(miindex), len(micolumns))),
index=miindex,
columns=micolumns).sort_index().sort_index(axis=1)
# My Code
first = dfmi['a', 'foo', 'fake'] + dfmi['a', 'bar', 'fake']
dfmi = dfmi.loc[:, ('a', 'Fake %', 'Fake Calculation')] = first.div(dfmi.sum(axis=1, level=0), level=0)
This is what I want it to look like:
lvl0 a b
lvl1 bar foo bar foo
lvl2 fake real fake real CALCULATE fake real fake real
A0 B0 C0 D0 2 3 0 1 33% 6 7 4 5
D1 10 11 8 9 47% 14 15 12 13
C1 D0 18 19 16 17 ETC 22 23 20 21
D1 26 27 24 25 ETC 30 31 28 29
C2 D0 34 35 32 33 ETC 38 39 36 37
D1 42 43 40 41 ETC 46 47 44 45
C3 D0 50 51 48 49 ETC 54 55 52 53
D1 58 59 56 57 ETC 62 63 60 61
The calculate column is just (fake+fake)/(fake+real+fake+real) in the rows.
You can try with
df.sum(level=[0,2],axis=1).loc[:,pd.IndexSlice[:,'fake']].div(df.sum(level=0,axis=1),level=0,axis=0)
I have a data frame, df, which looks like this:
index New Old Map Limit count
1 93 35 54 > 18 1
2 163 93 116 > 18 1
3 134 78 96 > 18 1
4 117 81 93 > 18 1
5 194 108 136 > 18 1
6 125 57 79 <= 18 1
7 66 39 48 > 18 1
8 120 83 95 > 18 1
9 150 98 115 > 18 1
10 149 99 115 > 18 1
11 148 85 106 > 18 1
12 92 55 67 <= 18 1
13 64 24 37 > 18 1
14 84 53 63 > 18 1
15 99 70 79 > 18 1
I need to produce a data frame that looks like this:
Limit <=18 >18
total mean total mean
New xx1 yy1 aa1 bb1
Old xx2 yy2 aa2 bb2
MAP xx3 yy3 aa3 bb3
I tried this without success:
df.groupby('Limit')['New', 'Old', 'MAP'].[sum(), mean()].T without success.
How can I achieve this in pandas?
You can use groupby with agg, then transpose by T and unstack:
print (df[['New', 'Old', 'Map', 'Limit']].groupby('Limit').agg([sum, 'mean']).T.unstack())
Limit <= 18 > 18
sum mean sum mean
New 217.0 108.5 1581.0 121.615385
Old 112.0 56.0 946.0 72.769231
Map 146.0 73.0 1153.0 88.692308
I edit by comment, it looks nicer:
print (df.groupby('Limit')['New', 'Old', 'Map', 'Limit'].agg([sum, 'mean']).T.unstack())
And if need total columns:
print (df.groupby('Limit')['New', 'Old', 'Map', 'Limit']
.agg({'total':sum, 'mean': 'mean'})
.T
.unstack(0))
Limit <= 18 > 18
total mean total mean
New 217.0 108.5 1581.0 121.615385
Old 112.0 56.0 946.0 72.769231
Map 146.0 73.0 1153.0 88.692308
I have two text boxes. The first contains this text just like shown.
I need to remove the first 7 characters of each row then show the edited text in the second box.
The first number is different every time so I can't use this
RawText.Text = Replace(RawText.Text, "1757792", " ")
TextFilter.Text = RawText.Text
because the number changes every row.
Is there a way to have a button remove ALL instances of ANY text 7 characters long?
1757792 02 08 09 10 15 21 22 29 34 40 44 46 47 48 53 56 58 68 69 71
1757793 01 07 16 20 22 25 30 36 38 39 42 48 49 51 58 66 70 72 79 80
1757794 01 02 07 09 10 18 29 32 35 36 48 53 54 56 62 65 68 69 71 73
1757795 01 02 06 09 12 18 23 27 30 35 43 52 57 59 60 61 62 73 74 76
1757796 01 11 13 14 18 19 22 31 34 41 45 46 54 57 61 70 71 72 79 80
1757797 01 08 10 18 19 21 32 41 43 44 45 54 61 62 64 66 68 73 74 80
1757798 02 03 06 09 10 23 27 28 33 36 38 41 49 53 60 61 64 73 74 80
1757799 02 12 16 34 36 44 51 52 55 57 58 59 64 71 73 75 76 78 79 80
1757800 05 11 13 17 18 19 23 24 27 31 34 38 39 45 48 61 67 73 79 80
1757801 17 23 29 31 35 38 43 45 48 51 56 57 60 64 65 66 67 73 77 78
1757802 05 06 11 14 17 20 21 27 28 29 33 41 45 49 58 66 67 73 79 80
1757803 06 07 10 11 12 19 20 21 25 30 33 35 38 42 46 51 65 66 75 80
1757804 06 14 16 19 20 23 32 42 43 44 48 52 62 67 68 69 71 72 74 78
You can use string methods like Substring. If you really want to remove the first 7 you can use String.Substring:
Dim txt2Lines = From l In RawText.Lines
Let index = Math.Min(l.Length, 7)
Select l.Substring(index)
txt2.Lines = txt2Lines.ToArray()
This handles also the case that there are also shorter lines.
Note that it doesn't remove the leading space since that is not part of the first seven characters. You could use l.Substring(index).TrimStart().
Another approach is to search the first space and remove everything before that:
Dim txt2Lines = From l In RawText.Lines
Let index = Math.Max(l.IndexOf(" "), 0)
Select l.Substring(index)
txt2.Lines = txt2Lines.ToArray()
String.IndexOf returns -1 if the substring wasn't found, that's why i've used Math.Max(l.IndexOf(" "), 0). In that case the full line should be taken.
You could use String.Split to split the text at the vbCrLf (line break), then use String.SubString to select the string parter starting at index 8, and there you are.
And as GSerg pointed out, if you would like to replace all 7 digit occurences try this:
Dim ResultString As String
Try
ResultString = Regex.Replace(SubjectString, "\d{7}", "", RegexOptions.Singleline)
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
I am trying to format the output of the AWK's printf() function. More precisely, I am trying to print a matrix with very long rows and I would like to wrap them and continue on the next line. What I am trying to do is best illustrated using Fortran. Consider the following Fortran statement:
write(*,'(10I5)')(i,i=1,100)
The output would be the integers in the range 1:100 printed in rows of 10 elements.
Is it possible to do the same in AWK. I could do that by offsetting the index and printing to new line with "\n". The question is whether that can be done in an elegant manner as in Fortran.
Thanks,
As suggested in the comments I would like to explain my Fortran code, given as an example above.
(i,i=1,100) ! => is a do loop going from 1 to 100
write(*,'(10I5)') ! => is a formatted write statement
10I5 says print 10 integers and for each integer allocate 5 character slot
The trick is, that when one exceeds the 10 x 5 character slots given by the formatted write, one jumps on the next line. So one doesn't need the trailing "\n".
This may help you
[akshay#localhost tmp]$ cat test.for
implicit none
integer i
write(*,'(10I5)')(i,i=1,100)
end
[akshay#localhost tmp]$ gfortran test.for
[akshay#localhost tmp]$ ./a.out
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
[akshay#localhost tmp]$ awk 'BEGIN{for(i=1;i<=100;i++)printf("%5d%s",i,i%10?"":"\n")}'
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100