Patterns of set bits in a byte - puzzle

Joel mentioned counting the number of set bits in a byte as a programming question in his Guerrilla Guide to Interviewing, and talked of a way to take advantage of patterns that occur in the lookup table. I wrote an article about it awhile back after I found the pattern.
To summarize:
Number of bits set in a byte in 16x16
0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4
1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
3 4 4 5 4 5 5 6 4 5 5 6 5 6 6 7
1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
3 4 4 5 4 5 5 6 4 5 5 6 5 6 6 7
2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
3 4 4 5 4 5 5 6 4 5 5 6 5 6 6 7
3 4 4 5 4 5 5 6 4 5 5 6 5 6 6 7
4 5 5 6 5 6 6 7 5 6 6 7 6 7 7 8
The first row and column are exactly the same, and each position in the grid can be calculated by adding the first values in that position's row and column. Because of this, you only need a lookup table with 16 entries for an 8-bit number, and can just use the first 16 numbers. Then, if you wanted to count the set bits in the number 243, for example, you'd just do:
a = [0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4]
x = 243 / 16 => 15 # (int)
y = 243 % 16 => 3
a[x] + a[y] => 6
# Are there six bits set in the number 243?
243 = 11110011 # yep
The next pattern I noticed after that was that each time you double the size of the NxN grid, each quadrant could be calculated by adding 0, 1, 1, and 2 to each quadrant, respectively, like so:
# Make a 4x4 grid on the paper, and fill in the upper left quadrant with the values of the 2x2 grid.
# For each quadrant, add the value from that same quadrant in the 2x2 grid to the array.
# Upper left quad add 0 to each number from 2x2
0 1 * *
1 2 * *
* * * *
* * * *
# Upper right quad add 1 to each number from 2×2
0 1 1 2
1 2 2 3
* * * *
* * * *
# Lower left quad add 1 to each number from 2×2
0 1 1 2
1 2 2 3
1 2 * *
2 3 * *
# Lower right quad add 2 to each number from 2×2
0 1 1 2
1 2 2 3
1 2 2 3
2 3 3 4
Repeat this process two more times, and you'll get the 16x16 grid from above, so I figured there must be some sort of quadtree algorithm that would allow you to start from the grid:
0 1
1 2
and given a number N, generate the lookup table on the fly and figure out the number of bits. So my question/challenge is, can you figure out an algorithm to do just that?

This is a silly question! In the first example where you've computed the number of bits set using a 16-entry table instead of 256 isn't anything magical! All you've done is count the number of bits set in the first four bits of the byte (first nibble) and then in the second nibble, adding the two together. x/16 is the first nibble, x%16 is the second nibble.
If you repeat the process, now you have a lookup table for two bits and you just do it four times, once for each pair. In the extreme, you can just add all the bits together one-by-one and you get the obvious answer.
The whole point of a lookup table is to avoid the addition.

Based on Robert's code here, it can even be done without the division or modulus, replacing them with one shift and one AND, like so:
a = [0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4]
x = 243 >> 4 # 15 (same as dividing by 16)
y = 243 & 0x0f # 3 ( same as modding by 16)
result = a[x] + a[y] # 6 bits set
Or in C:
const unsigned char oneBits[] = {0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};
unsigned char CountOnes(unsigned char x)
{
unsigned char results;
results = oneBits[x&0x0f];
results += oneBits[x>>4];
return results
}
For any size integer, you could just loop through the bytes and do a quick lookup, like so:
def bits(n)
a = [0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4]
a[n >> 4] + a[n & 0x0f]
end
def setBits(n)
total = 0
while(n > 0)
total += bits(n&0xff)
n >>= 8
end
total
end
setBits(6432132132165432132132165436265465465653213213265465) # 78 bits set
I'm satisfied with this answer. I knew something more complex and quadtree-esque wouldn't be efficient, I just thought it was a decent thought experiment.

Excuse the late post, but I just found the challenge. My $.02 (brute force)
Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles Button1.Click
For x As Integer = 0 To 255
Debug.WriteLine(bitsOn2(CByte(x)) & " " & Convert.ToString(x, 2).PadLeft(8, "0"c))
Next
End Sub
Private Function bitsOn(ByVal aByte As Byte) As Integer
Dim aBit As Byte = 1
For z As Integer = 0 To 7
If (aByte >> z And aBit) = aBit Then bitsOn += 1
Next
End Function
Dim aDict As New Dictionary(Of Integer, Integer)
Private Function bitsOn2(ByVal aByte As Byte) As Integer
If aDict.Count = 0 Then 'init dictionary
For x As Integer = 0 To 255
aDict.Add(x, bitsOn(CByte(x)))
Next
End If
Return aDict(aByte)
End Function

Related

How can I create a column of numbers that ascends after a certain amount of rows?

I have a column of scores going in descending order. I want to create a column of difficulty level with scale 1-10 going up every 37 rows for diffculty 1-7 and then 36 rows for 8-10. i have created a small example below where the difficulty goes down in 3 row intervals and the final difficulty '4' and '5' is 2 rows
In:
score
0 11
1 10
2 9
3 8
4 8
5 6
6 5
7 4
8 4
9 3
10 2
11 1
12 1
Out:
score difficulty
0 11 1
1 10 1
2 9 1
3 8 2
4 8 2
5 6 2
6 5 3
7 4 3
8 4 3
9 3 4
10 2 4
11 1 5
12 1 5
If I understand your problem correctly, you could do something like:
import pandas as pd
from random import randint
count = (37*7) + (36*3)
difficulty = [int(i/37) + 1 for i in range(37*7)] + [int(i/36) + 8 for i in range(36*3)]
df = pd.DataFrame({'score': [randint(0, 10) for i in range(count)]})
df['difficulty'] = difficulty

If a column value does not have a certain number of occurances in a dataframe, how to duplicate rows at random until that count is met?

Say that this is what my dataframe looks like
A B
0 1 5
1 4 2
2 3 5
3 3 3
4 3 2
5 2 0
6 4 5
7 2 3
8 4 1
9 5 1
I want every unique value in Column B to occur at least 3 times. So none of the rows with a B value of 5 are duplicated. The row with a column B value of 0 are duplicated twice. And the rest have one of their two rows duplicated at random.
Here is an example desired output
A B
0 1 5
1 4 2
2 3 5
3 3 3
4 3 2
5 2 0
6 4 5
7 2 3
8 4 1
9 5 1
10 4 2
11 2 3
12 2 0
13 2 0
14 4 1
Edit:
The row chosen to be duplicated should be selected at random
To random pick rows, I would use groupby apply with sample on each group. x of lambda is each group of B, so I use reapeat - x.shape[0] to find number of rows need to create. There may be some cases group B already has more rows than 3, so I use np.clip to force negative values to 0. Sample on 0 row is the same as ignore it. Finally, reset_index and append back to df
repeats = 3
df1 = (df.groupby('B').apply(lambda x: x.sample(n=np.clip(repeats-x.shape[0], 0, np.inf)
.astype(int), replace=True))
.reset_index(drop=True))
df_final = df.append(df1).reset_index(drop=True)
Out[43]:
A B
0 1 5
1 4 2
2 3 5
3 3 3
4 3 2
5 2 0
6 4 5
7 2 3
8 4 1
9 5 1
10 2 0
11 2 0
12 5 1
13 4 2
14 2 3

Pandas running sum

I have a pandas dataframe and it is something like this:
x y
1 0
2 1
3 2
4 0 <<<< Reset
5 1
6 2
7 3
8 0 <<<< Reset
9 1
10 2
The x values could be anything, they are not meaningful for this question. The y values increment, and reset and increment again. I need a third column (z) which is a number that represents the groups, so it increments when the y values are reset.
I cannot guarantee that the reset will be to zero, only a value that is less than the previous one, should indicate a reset.
x y z
1 0 0
2 1 0
3 2 0
4 0 1 <<<< Incremented by 1
5 1 1
6 2 1
7 3 1
8 0 2 <<<< Incremented by 1
9 1 2
10 2 2
So To produce z, i understand what needs to be done, just not familiar with the syntax. My solution would be to first assign z as a sparse column of 0 and 1's, where everything is zero except a 1 is given when y[ix] < y[ix-1], indicating that the y counter has been reset. Then a cumulative running sum should be performed on the z column, meaning that: z[ix] = sum(z[0],z[1],...,z[ix])
Id appreciate some help with the syntax of assigning column z, if someone has a moment.
Based on your logic:
#general case
df['z'] = df['y'].diff().lt(0).cumsum()
# or equivalently
# df['z'] = df['y'].lt(df['y'].shift()).cumsum()
Output:
x y z
0 1 0 0
1 2 1 0
2 3 2 0
3 4 0 1
4 5 1 1
5 6 2 1
6 7 3 1
7 8 0 2
8 9 1 2
9 10 2 2
Using ne(1)
df.y.diff().ne(1).cumsum().sub(1)
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 2
8 2
9 2
Name: y, dtype: int32

pandas: bin data into specific number of bins of specific size

I would like to bin a dataframe by the values in a single column into bins of a specific size and number.
Here is an example df:
df= pd.DataFrame(np.random.randint(0,10000,size=(10000, 4)), columns=list('ABCD'))
Say I want to bin by column D, I will first sort the data:
df.sort('D')
I would now wish to bin so that the first if bin size is 50 and bin number is 100, the first 50 values will go into bin 1, the next into bin 2, and so on and so forth. Any remaining values after the twenty bins should all go into the final bin. Is there anyway of doing this?
EDIT:
Here is a sample input:
x = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
And here is the expected output:
A B C D bin
0 6 8 6 5 3
1 5 4 9 1 1
2 5 1 7 4 3
3 6 3 3 3 2
4 2 5 9 3 2
5 2 5 1 3 2
6 0 1 1 0 1
7 3 9 5 8 3
8 2 4 0 1 1
9 6 4 5 6 3
As an extra aside, is it also possible to bin any equal values in the same bin? So for example, say I have bin 1 which contains values, 0,1,1 and then bin 2 contains 1,1,2. Is there any way of putting those two 1 values in bin 2 into bin 1? This will create very uneven bin sizes but this is not an issue.
It seems you need floor divide np.arange and then assign to new column:
idx = df['D'].sort_values().index
df['b'] = pd.Series(np.arange(len(df)) // 3 + 1, index = idx)
print (df)
A B C D bin b
0 6 8 6 5 3 3
1 5 4 9 1 1 1
2 5 1 7 4 3 3
3 6 3 3 3 2 2
4 2 5 9 3 2 2
5 2 5 1 3 2 2
6 0 1 1 0 1 1
7 3 9 5 8 3 4
8 2 4 0 1 1 1
9 6 4 5 6 3 3
Detail:
print (np.arange(len(df)) // 3 + 1)
[1 1 1 2 2 2 3 3 3 4]
EDIT:
I create another question about problem with last values here:
N = 3
idx = df['D'].sort_values().index
#one possible solution, thanks divakar
def replace_irregular_groupings(a, N):
n = len(a)
m = N*(n//N)
if m!=n:
a[m:] = a[m-1]
return a
idx = df['D'].sort_values().index
arr = replace_irregular_groupings(np.arange(len(df)) // N + 1, N)
df['b'] = pd.Series(arr, index = idx)
print (df)
A B C D bin b
0 6 8 6 5 3 3
1 5 4 9 1 1 1
2 5 1 7 4 3 3
3 6 3 3 3 2 2
4 2 5 9 3 2 2
5 2 5 1 3 2 2
6 0 1 1 0 1 1
7 3 9 5 8 3 3
8 2 4 0 1 1 1
9 6 4 5 6 3 3

Fill Down until Last Empty Row or Next Filled Cell

I know how to code in order to fill down a column, but I have a few conditions that I can't find out how to implement.
I want to fill down until the last row (that contains any value at all) or the next cell within the column that contains information.
The data looks like this
a 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
b 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
c 1 2 3 4 5 6 7 8 9 10
d 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
e 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
So as you can see, the code needs to recognize how to stop at b (and not copy over it) when copying down the column. Also the code needs to stop at the last row with values in it when dragging down e.
I've been trying to figure it out to no avail, please help!!!
Previous code:
Yes I do have some code, but it is slow and I would like to figure out something more efficient.
'Sub CopyDown()
Sheets("RAW").Range("A1").Select
For i = 1 To 100
ActiveCell.Copy
ActiveCell.Offset(1, 0).Select
If ActiveCell.Value = vbNullString Then
ActiveCell.Paste
End If
Next i
End Sub'
This one is simple, if your example dataset is used (filling down the existing values to the blanks in Column A.)
Sub MacroFillAreas()
For Each area In Columns("A:A").SpecialCells(xlCellTypeBlanks)
If area.Cells.Row <= ActiveSheet.UsedRange.Rows.Count Then
area.Cells = Range(area.Address).Offset(-1, 0).Value
End If
Next area
(Code Modified)
End Sub
Without Code its hard to say, but assuming you are doing a loop, then all you need to do is check the cell
Sub filldown()
Dim X, Y As Long
Dim MaxX, MaxY As Long
MaxX = ActiveSheet.UsedRange.Rows.Count
MaxY = ActiveSheet.UsedRange.Columns.Count
For X = 1 To MaxX
For Y = 1 To MaxY
If IsEmpty(ActiveSheet.Cells(X, Y)) = True Then
''Do something
End If
Next
Next
End Sub