I have a column of scores going in descending order. I want to create a column of difficulty level with scale 1-10 going up every 37 rows for diffculty 1-7 and then 36 rows for 8-10. i have created a small example below where the difficulty goes down in 3 row intervals and the final difficulty '4' and '5' is 2 rows
In:
score
0 11
1 10
2 9
3 8
4 8
5 6
6 5
7 4
8 4
9 3
10 2
11 1
12 1
Out:
score difficulty
0 11 1
1 10 1
2 9 1
3 8 2
4 8 2
5 6 2
6 5 3
7 4 3
8 4 3
9 3 4
10 2 4
11 1 5
12 1 5
If I understand your problem correctly, you could do something like:
import pandas as pd
from random import randint
count = (37*7) + (36*3)
difficulty = [int(i/37) + 1 for i in range(37*7)] + [int(i/36) + 8 for i in range(36*3)]
df = pd.DataFrame({'score': [randint(0, 10) for i in range(count)]})
df['difficulty'] = difficulty
I have a dataframe:
df = 0 1 2 3 4
1 1 3 2 5
4 1 5 7 8
7 1 2 3 9
I want to enforce monotonically per row, to get:
df = 0 1 2 3 4
1 1 3 3 5
4 4 5 7 8
7 7 7 7 9
What is the best way to do so?
Try cummax
out = df.cummax(1)
Out[80]:
0 1 2 3 4
0 1 1 3 3 5
1 4 4 5 7 8
2 7 7 7 7 9
How to calculate the mean value of all the columns with 'count' column.I have created a dataframe with random generated values in the below code.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10,10)*100/10).astype(int)
df
output:
A B C D E F G H I J
0 4 3 2 8 5 0 9 9 0 5
1 1 5 8 0 5 9 8 3 9 1
2 9 5 1 1 3 2 6 3 8 3
3 4 0 8 1 7 3 4 2 8 8
4 9 4 8 2 7 9 7 8 9 7
5 1 0 7 3 8 6 1 7 2 0
6 3 6 8 9 6 6 5 0 8 4
7 8 9 9 5 3 9 0 7 5 5
8 5 5 8 7 8 4 3 0 9 9
9 2 4 2 3 0 5 2 0 3 0
I found mean value for a single column like this.How to find the mean for multiple columns with respect to count in pandas.
df['count'] = 1
print(df)
df.groupby('count').agg({'A':'mean'})
A B C D E F G H I J count
0 4 3 2 8 5 0 9 9 0 5 1
1 1 5 8 0 5 9 8 3 9 1 1
2 9 5 1 1 3 2 6 3 8 3 1
3 4 0 8 1 7 3 4 2 8 8 1
4 9 4 8 2 7 9 7 8 9 7 1
5 1 0 7 3 8 6 1 7 2 0 1
6 3 6 8 9 6 6 5 0 8 4 1
7 8 9 9 5 3 9 0 7 5 5 1
8 5 5 8 7 8 4 3 0 9 9 1
9 2 4 2 3 0 5 2 0 3 0 1
A
count
1 4.6
If need mean of all columns per groups by column count use:
df.groupby('count').mean()
If need mean by all rows (like grouping if same values in count) use:
df.mean().to_frame().T
I want to remove the duplicate row value from a specific column - in this case the column name is "number".
Before:
number qty status
0 10 2 go
1 10 5 nogo
2 4 6 yes
3 3 1 no
4 2 7 go
5 5 2 nah
6 5 6 go
7 5 3 nogo
8 1 10 yes
9 1 10 go
10 5 2 nah
After:
number qty status
0 10 2 go
5 nogo
1 4 6 yes
2 3 1 no
3 2 7 go
4 5 2 nah
6 go
3 nogo
5 1 10 yes
10 go
6 5 2 nah
It is possible replace values to empty string or NaNs by mask with duplicated by new Series a created by comparing column with shifted column with cumsum:
a = df['number'].ne(df['number'].shift()).cumsum()
#for replace ''
df['number'] = df['number'].mask(a.duplicated(), '')
#for replace NaNs
#df['number'] = df['number'].mask(a.duplicated())
print (df)
number qty status
0 10 2 go
1 5 nogo
2 4 6 yes
3 3 1 no
4 2 7 go
5 5 2 nah
6 6 go
7 3 nogo
8 1 10 yes
9 10 go
10 5 2 nah
Detail:
a = df['number'].ne(df['number'].shift()).cumsum()
print (a)
0 1
1 1
2 2
3 3
4 4
5 5
6 5
7 5
8 6
9 6
10 7
Name: number, dtype: int32
Joel mentioned counting the number of set bits in a byte as a programming question in his Guerrilla Guide to Interviewing, and talked of a way to take advantage of patterns that occur in the lookup table. I wrote an article about it awhile back after I found the pattern.
To summarize:
Number of bits set in a byte in 16x16
0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4
1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
3 4 4 5 4 5 5 6 4 5 5 6 5 6 6 7
1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
3 4 4 5 4 5 5 6 4 5 5 6 5 6 6 7
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
3 4 4 5 4 5 5 6 4 5 5 6 5 6 6 7
3 4 4 5 4 5 5 6 4 5 5 6 5 6 6 7
4 5 5 6 5 6 6 7 5 6 6 7 6 7 7 8
The first row and column are exactly the same, and each position in the grid can be calculated by adding the first values in that position's row and column. Because of this, you only need a lookup table with 16 entries for an 8-bit number, and can just use the first 16 numbers. Then, if you wanted to count the set bits in the number 243, for example, you'd just do:
a = [0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4]
x = 243 / 16 => 15 # (int)
y = 243 % 16 => 3
a[x] + a[y] => 6
# Are there six bits set in the number 243?
243 = 11110011 # yep
The next pattern I noticed after that was that each time you double the size of the NxN grid, each quadrant could be calculated by adding 0, 1, 1, and 2 to each quadrant, respectively, like so:
# Make a 4x4 grid on the paper, and fill in the upper left quadrant with the values of the 2x2 grid.
# For each quadrant, add the value from that same quadrant in the 2x2 grid to the array.
# Upper left quad add 0 to each number from 2x2
0 1 * *
1 2 * *
* * * *
* * * *
# Upper right quad add 1 to each number from 2×2
0 1 1 2
1 2 2 3
* * * *
* * * *
# Lower left quad add 1 to each number from 2×2
0 1 1 2
1 2 2 3
1 2 * *
2 3 * *
# Lower right quad add 2 to each number from 2×2
0 1 1 2
1 2 2 3
1 2 2 3
2 3 3 4
Repeat this process two more times, and you'll get the 16x16 grid from above, so I figured there must be some sort of quadtree algorithm that would allow you to start from the grid:
0 1
1 2
and given a number N, generate the lookup table on the fly and figure out the number of bits. So my question/challenge is, can you figure out an algorithm to do just that?
This is a silly question! In the first example where you've computed the number of bits set using a 16-entry table instead of 256 isn't anything magical! All you've done is count the number of bits set in the first four bits of the byte (first nibble) and then in the second nibble, adding the two together. x/16 is the first nibble, x%16 is the second nibble.
If you repeat the process, now you have a lookup table for two bits and you just do it four times, once for each pair. In the extreme, you can just add all the bits together one-by-one and you get the obvious answer.
The whole point of a lookup table is to avoid the addition.
Based on Robert's code here, it can even be done without the division or modulus, replacing them with one shift and one AND, like so:
a = [0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4]
x = 243 >> 4 # 15 (same as dividing by 16)
y = 243 & 0x0f # 3 ( same as modding by 16)
result = a[x] + a[y] # 6 bits set
Or in C:
const unsigned char oneBits[] = {0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};
unsigned char CountOnes(unsigned char x)
{
unsigned char results;
results = oneBits[x&0x0f];
results += oneBits[x>>4];
return results
}
For any size integer, you could just loop through the bytes and do a quick lookup, like so:
def bits(n)
a = [0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4]
a[n >> 4] + a[n & 0x0f]
end
def setBits(n)
total = 0
while(n > 0)
total += bits(n&0xff)
n >>= 8
end
total
end
setBits(6432132132165432132132165436265465465653213213265465) # 78 bits set
I'm satisfied with this answer. I knew something more complex and quadtree-esque wouldn't be efficient, I just thought it was a decent thought experiment.
Excuse the late post, but I just found the challenge. My $.02 (brute force)
Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles Button1.Click
For x As Integer = 0 To 255
Debug.WriteLine(bitsOn2(CByte(x)) & " " & Convert.ToString(x, 2).PadLeft(8, "0"c))
Next
End Sub
Private Function bitsOn(ByVal aByte As Byte) As Integer
Dim aBit As Byte = 1
For z As Integer = 0 To 7
If (aByte >> z And aBit) = aBit Then bitsOn += 1
Next
End Function
Dim aDict As New Dictionary(Of Integer, Integer)
Private Function bitsOn2(ByVal aByte As Byte) As Integer
If aDict.Count = 0 Then 'init dictionary
For x As Integer = 0 To 255
aDict.Add(x, bitsOn(CByte(x)))
Next
End If
Return aDict(aByte)
End Function