Create 20 unique bingo cards - sql

I'm trying to create 20 unique cards with numbers, but I struggle a bit.. So basically I need to create 20 unique matrices 3x3 having numbers 1-10 in first column, numbers 11-20 in the second column and 21-30 in the third column.. Any ideas? I'd prefer to have it done in r, especially as I don't know Visual Basic. In excel I know how to generate the cards, but not sure how to ensure they are unique..
It seems to be quite precise and straightforward to me. Anyway, i needed to create 20 matrices that would look like :
[,1] [,2] [,3]
[1,] 5 17 23
[2,] 8 18 22
[3,] 3 16 24
Each of the matrices should be unique and each of the columns should consist of three unique numbers ( the 1st column - numbers 1-10, the 2nd column 11-20, the 3rd column - 21-30).
Generating random numbers is easy, though how to make sure that generated cards are unique?Please have a look at the post that i voted for as an answer - as it gives you thorough explanation how to achieve it.

(N.B. : I misread "rows" instead of "columns", so the following code and explanation will deal with matrices with random numbers 1-10 on 1st row, 11-20 on 2nd row etc., instead of columns, but it's exactly the same just transposed)
This code should guarantee uniqueness and good randomness :
library(gtools)
# helper function
getKthPermWithRep <- function(k,n,r){
k <- k - 1
if(n^r< k){
stop('k is greater than possibile permutations')
}
v <- rep.int(0,r)
index <- length(v)
while ( k != 0 )
{
remainder<- k %% n
k <- k %/% n
v[index] <- remainder
index <- index - 1
}
return(v+1)
}
# get all possible permutations of 10 elements taken 3 at a time
# (singlerowperms = 720)
allperms <- permutations(10,3)
singlerowperms <- nrow(allperms)
# get 20 random and unique bingo cards
cards <- lapply(sample.int(singlerowperms^3,20),FUN=function(k){
perm2use <- getKthPermWithRep(k,singlerowperms,3)
m <- allperms[perm2use,]
m[2,] <- m[2,] + 10
m[3,] <- m[3,] + 20
return(m)
# if you want transpose the result just do:
# return(t(m))
})
Explanation
(disclaimer tl;dr)
To guarantee both randomness and uniqueness, one safe approach is generating all the possibile bingo cards and then choose randomly among them without replacements.
To generate all the possible cards, we should :
generate all the possibilities for each row of 3 elements
get the cartesian product of them
Step (1) can be easily obtained using function permutations of package gtools (see the object allPerms in the code). Note that we just need the permutations for the first row (i.e. 3 elements taken from 1-10) since the permutations of the other rows can be easily obtained from the first by adding 10 and 20 respectively.
Step (2) is also easy to get in R, but let's first consider how many possibilities will be generated. Step (1) returned 720 cases for each row, so, in the end we will have 720*720*720 = 720^3 = 373248000 possible bingo cards!
Generate all of them is not practical since the occupied memory would be huge, thus we need to find a way to get 20 random elements in this big range of possibilities without actually keeping them in memory.
The solution comes from the function getKthPermWithRep, which, given an index k, it returns the k-th permutation with repetition of r elements taken from 1:n (note that in this case permutation with repetition corresponds to the cartesian product).
e.g.
# all permutations with repetition of 2 elements in 1:3 are
permutations(n = 3, r = 2,repeats.allowed = TRUE)
# [,1] [,2]
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 2 1
# [5,] 2 2
# [6,] 2 3
# [7,] 3 1
# [8,] 3 2
# [9,] 3 3
# using the getKthPermWithRep you can get directly the k-th permutation you want :
getKthPermWithRep(k=4,n=3,r=2)
# [1] 2 1
getKthPermWithRep(k=8,n=3,r=2)
# [1] 3 2
Hence now we just choose 20 random indexes in the range 1:720^3 (using sample.int function), then for each of them we get the corresponding permutation of 3 numbers taken from 1:720 using function getKthPermWithRep.
Finally these triplets of numbers, can be converted to actual card rows by using them as indexes to subset allPerms and get our final matrix (after, of course, adding +10 and +20 to the 2nd and 3rd row).
Bonus
Explanation of getKthPermWithRep
If you look at the example above (permutations with repetition of 2 elements in 1:3), and subtract 1 to all number of the results you get this :
> permutations(n = 3, r = 2,repeats.allowed = T) - 1
[,1] [,2]
[1,] 0 0
[2,] 0 1
[3,] 0 2
[4,] 1 0
[5,] 1 1
[6,] 1 2
[7,] 2 0
[8,] 2 1
[9,] 2 2
If you consider each number of each row as a number digit, you can notice that those rows (00, 01, 02...) are all the numbers from 0 to 8, represented in base 3 (yes, 3 as n). So, when you ask the k-th permutation with repetition of r elements in 1:n, you are also asking to translate k-1 into base n and return the digits increased by 1.
Therefore, given the algorithm to change any number from base 10 to base n :
changeBase <- function(num,base){
v <- NULL
while ( num != 0 )
{
remainder = num %% base # assume K > 1
num = num %/% base # integer division
v <- c(remainder,v)
}
if(is.null(v)){
return(0)
}
return(v)
}
you can easily obtain getKthPermWithRep function.

One 3x3 matrix with the desired value range can be generated with the following code:
mat <- matrix(c(sample(1:10,3), sample(11:20,3), sample(21:30, 3)), nrow=3)
Furthermore, you can use a for loop to generate a list of 20 unique matrices as follows:
for (i in 1:20) {
mat[[i]] <- list(matrix(c(sample(1:10,3), sample(11:20,3), sample(21:30,3)), nrow=3))
print(mat[[i]])
}

Well OK I may fall on my face here but I propose a checksum (using Excel).
This is a unique signature for each bingo card which will remain invariate if the order of numbers within any column is changed without changing the actual numbers. The formula is
=SUM(10^MOD(A2:A4,10)+2*10^MOD(B2:B4,10)+4*10^MOD(C2:C4,10))
where the bingo numbers for the first card are in A2:C4.
The idea is to generate a 10-digit number for each column, then multiply each by a constant and add them to get the signature.
So here I have generated two random bingo cards using a standard formula from here plus two which are deliberately made to be just permutations of each other.
Then I check if any of the signatures are duplicates using the formula
=MAX(COUNTIF(D5:D20,D5:D20))
which shouldn't given an answer more than 1.
In the unlikely event that there were duplicates, then you would just press F9 and generate some new cards.
All formulae are array formulae and must be entered with CtrlShiftEnter

Here is an inelegant way to do this. Generate all possible combinations and then sample without replacement. These are permutations, combinations: order does matter in bingo
library(dplyr)
library(tidyr)
library(magrittr)
generate_samples = function(n) {
first = data_frame(first = (n-9):n)
first %>%
merge(first %>% rename(second = first)) %>%
merge(first %>% rename(third = first)) %>%
sample_n(20)
}
suffix = function(df, suffix)
df %>%
setNames(names(.) %>%
paste0(suffix))
generate_samples(10) %>% suffix(10) %>%
bind_cols(generate_samples(20) %>% suffix(20)) %>%
bind_cols(generate_samples(30) %>% suffix(30)) %>%
rowwise %>%
do(matrix = t(.) %>% matrix(3)) %>%
use_series(matrix)

Related

Sorting/Optimization problem with rows in a pandas dataframe [duplicate]

So if I was given a sorted list/array i.e. [1,6,8,15,40], the size of the array, and the requested number..
How would you find the minimum number of values required from that list to sum to the requested number?
For example given the array [1,6,8,15,40], I requested the number 23, it would take 2 values from the list (8 and 15) to equal 23. The function would then return 2 (# of values). Furthermore, there are an unlimited number of 1s in the array (so you the function will always return a value)
Any help is appreciated
The NP-complete subset-sum problem trivially reduces to your problem: given a set S of integers and a target value s, we construct set S' having values (n+1) xk for each xk in S and set the target equal to (n+1) s. If there's a subset of the original set S summing to s, then there will be a subset of size at most n in the new set summing to (n+1) s, and such a set cannot involve extra 1s. If there is no such subset, then the subset produced as an answer must contain at least n+1 elements since it needs enough 1s to get to a multiple of n+1.
So, the problem will not admit any polynomial-time solution without a revolution in computing. With that disclaimer out of the way, you can consider some pseudopolynomial-time solutions to the problem which work well in practice if the maximum size of the set is small.
Here's a Python algorithm that will do this:
import functools
S = [1, 6, 8, 15, 40] # must contain only positive integers
#functools.lru_cache(maxsize=None) # memoizing decorator
def min_subset(k, s):
# returns the minimum size of a subset of S[:k] summing to s, including any extra 1s needed to get there
best = s # use all ones
for i, j in enumerate(S[:k]):
if j <= s:
sz = min_subset(i, s-j)+1
if sz < best: best = sz
return best
print min_subset(len(S), 23) # prints 2
This is tractable even for fairly large lists (I tested a random list of n=50 elements), provided their values are bounded. With S = [random.randint(1, 500) for _ in xrange(50)], min_subset(len(S), 8489) takes less than 10 seconds to run.
There may be a simpler solution, but if your lists are sufficiently short, you can just try every set of values, i.e.:
1 --> Not 23
6 --> Not 23
...
1 + 6 = 7 --> Not 23
1 + 8 = 9 --> Not 23
...
1 + 40 = 41 --> Not 23
6 + 8 = 14 --> Not 23
...
8 + 15 = 23 --> Oh look, it's 23, and we added 2 values
If you know your list is sorted, you can skip some tests, since if 6 + 20 > 23, then there's no need to test 6 + 40.

Merge certain rows in a DataFrame based on startswith

I have a DataFrame, in which I want to merge certain rows to a single one. It has the following structure (values repeat)
Index Value
1 date:xxxx
2 user:xxxx
3 time:xxxx
4 description:xxx1
5 xxx2
6 xxx3
7 billed:xxxx
...
Now the problem is, that the columns 5 & 6 still belong to the description and were separated just wrong (whole string separated by ","). I want to merge the "description" row (4) with the values afterwards (5,6). In my DF, there can be 1-5 additional entries which have to be merged with the description row, but the structure allows me to work with startswith, because no matter how many rows have to be merged, the end point is always the row which starts with "billed". Due to me being very new to python, I haven´t got any code written for this problem yet.
My thought is the following (if it is even possible):
Look for a row which starts with "description" → Merge all the rows afterwards till reaching the row which starts with "billed", then stop (obviosly we keep the "billed" row) → Do the same to each row starting with "description"
New DF should look like:
Index Value
1 date:xxxx
2 user:xxxx
3 time:xxxx
4 description:xxx1, xxx2, xxx3
5 billed:xxxx
...
df = pd.DataFrame.from_dict({'Value': ('date:xxxx', 'user:xxxx', 'time:xxxx', 'description:xxx', 'xxx2', 'xxx3', 'billed:xxxx')})
records = []
description = description_val = None
for rec in df.to_dict('records'): # type: dict
# if previous description and record startswith previous description value
if description and rec['Value'].startswith(description_val):
description['Value'] += ', ' + rec['Value'] # add record Value into previous description
continue
# record with new description...
if rec['Value'].startswith('description:'):
description = rec
_, description_val = rec['Value'].split(':')
elif rec['Value'].startswith('billed:'):
# billed record - remove description value
description = description_val = None
records.append(rec)
print(pd.DataFrame(records))
# Value
# 0 date:xxxx
# 1 user:xxxx
# 2 time:xxxx
# 3 description:xxx, xxx2, xxx3
# 4 billed:xxxx

The King's March

You’re given a chess board with dimension n x n. There’s a king at the bottom right square of the board marked with s. The king needs to reach the top left square marked with e. The rest of the squares are labeled either with an integer p (marking a point) or with x marking an obstacle. Note that the king can move up, left and up-left (diagonal) only. Find the maximum points the king can collect and the number of such paths the king can take in order to do so.
Input Format
The first line of input consists of an integer t. This is the number of test cases. Each test case contains a number n which denotes the size of board. This is followed by n lines each containing n space separated tokens.
Output Format
For each case, print in a separate line the maximum points that can be collected and the number of paths available in order to ensure maximum, both values separated by a space. If e is unreachable from s, print 0 0.
Sample Input
3
3
e 2 3
2 x 2
1 2 s
3
e 1 2
1 x 1
2 1 s
3
e 1 1
x x x
1 1 s
Sample Output
7 1
4 2
0 0
Constraints
1 <= t <= 100
2 <= n <= 200
1 <= p <= 9
I think this problem could be solved using dynamic-programing. We could use dp[i,j] to calculate the best number of points you can obtain by going from the right bottom corner to the i,j position. We can calculate dp[i,j], for a valid i,j, based on dp[i+1,j], dp[i,j+1] and dp[i+1,j+1] if this are valid positions(not out of the matrix or marked as x) and adding them the points obtained in the i,j cell. You should start computing from the bottom right corner to the left top, row by row and beginning from the last column.
For the number of ways you can add a new matrix ways and use it to store the number of ways.
This is an example code to show the idea:
dp[i,j] = dp[i+1,j+1] + board[i,j]
ways[i,j] = ways[i+1,j+1]
if dp[i,j] < dp[i+1,j] + board[i,j]:
dp[i,j] = dp[i+1,j] + board[i,j]
ways[i,j] = ways[i+1,j]
elif dp[i,j] == dp[i+1,j] + board[i,j]:
ways[i,j] += ways[i+1,j]
# check for i,j+1
This assuming all positions are valid.
The final result is stored in dp[0,0] and ways[0,0].
Brief Overview:
This problem can be solved through recursive method call, starting from nn till it reaches 00 which is the king's destination.
For the detailed explanation and the solution for this problem,check it out here -> https://www.callstacker.com/detail/algorithm-1

Get coherent subsets from pandas series

I'm rather new to pandas and recently run into a problem. I have a pandas DataFrame that I need to process. I need to extract parts of the DataFrame where specific conditions are met. However, i want these parts to be coherent blocks, not one big set.
Example:
Consider the following pandas DataFrame
col1 col2
0 3 11
1 7 15
2 9 1
3 11 2
4 13 2
5 16 16
6 19 17
7 23 13
8 27 4
9 32 3
I want to extract the subframes where the values of col2 >= 10, resulting maybe in a list of DataFrames in the form of (in this case):
col1 col2
0 3 11
1 7 15
col1 col2
5 16 16
6 19 17
7 23 13
Ultimately, I need to do further analysis on the values in col1 within the resulting parts. However, the start and end of each of these blocks is important to me, so simply creating a subset using pandas.DataFrame.loc isn't going to work for me, i think.
What I have tried:
Right now I have a workaround that gets the subset using pandas.DataFrame.loc and then extracts the start and end index of each coherent block afterwards, by iterating through the subset and check, whether there is a jump in the indices. However, it feels rather clumsy and I feel that I'm missing a basic pandas function here, that would make my code more efficient and clean.
This is code representing my current workaround as adapted to the above example
# here the blocks will be collected for further computations
blocks = []
# get all the items where col2 >10 using 'loc[]'
subset = df.loc[df['col2']>10]
block_start = 0
block_end = None
#loop through all items in subset
for i in range(1, len(subset)):
# if the difference between the current index and the last is greater than 1 ...
if subset.index[i]-subset.index[i-1] > 1:
# ... this is the current blocks end
next_block_start = i
# extract the according block and add it to the list of all blocks
block = subset[block_start:next_block_start]
blocks.append(block)
#the next_block_start index is now the new block's starting index
block_start = next_block_start
#close and add last block
blocks.append(subset[block_start:])
Edit: I was by mistake previously referring to 'pandas.DataFrame.where' instead of 'pandas.DataFrame.loc'. I seem to be a bit confused by my recent research.
You can split you problem into parts. At first you check the condition:
df['mask'] = (df['col2']>10)
We use this to see where a new subset starts:
df['new'] = df['mask'].gt(df['mask'].shift(fill_value=False))
Now you can combine these informations into a group number. The cumsum will generate a step function which we set to zero (via the mask column) if this is not a group we are interested in.
df['grp'] = (df.new + 0).cumsum() * df['mask']
EDIT
You don't have to do the group calculation in your df:
s = (df['col2']>10)
s = (s.gt(s.shift(fill_value=False)) + 0).cumsum() * s
After that you can split this into a dict of separate DataFrames
grp = {}
for i in np.unique(s)[1:]:
grp[i] = df.loc[s == i, ['col1', 'col2']]

Comparing vectors

I am new to R and am trying to find a better solution for accomplishing this fairly simple task efficiently.
I have a data.frame M with 100,000 lines (and many columns, out of which 2 columns are relevant to this problem, I'll call it M1, M2). I have another data.frame where column V1 with about 10,000 elements is essential to this task. My task is this:
For each of the element in V1, find where does it occur in M2 and pull out the corresponding M1. I am able to do this using for-loop and it is terribly slow! I am used to Matlab and Perl and this is taking for EVER in R! Surely there's a better way. I would appreciate any valuable suggestions in accomplishing this task...
for (x in c(1:length(V$V1)) {
start[x] = M$M1[M$M2 == V$V1[x]]
}
There is only 1 element that will match, and so I can use the logical statement to directly get the element in start vector. How can I vectorize this?
Thank you!
Here is another solution using the same example by #aix.
M[match(V$V1, M$M2),]
To benchmark performance, we can use the R package rbenchmark.
library(rbenchmark)
f_ramnath = function() M[match(V$V1, M$M2),]
f_aix = function() merge(V, M, by.x='V1', by.y='M2', sort=F)
f_chase = function() M[M$M2 %in% V$V1,] # modified to return full data frame
benchmark(f_ramnath(), f_aix(), f_chase(), replications = 10000)
test replications elapsed relative
2 f_aix() 10000 12.907 7.068456
3 f_chase() 10000 2.010 1.100767
1 f_ramnath() 10000 1.826 1.000000
Another option is to use the %in% operator:
> set.seed(1)
> M <- data.frame(M1 = sample(1:20, 15, FALSE), M2 = sample(1:20, 15, FALSE))
> V <- data.frame(V1 = sample(1:20, 10, FALSE))
> M$M1[M$M2 %in% V$V1]
[1] 6 8 11 9 19 1 3 5
Sounds like you're looking for merge:
> M <- data.frame(M1=c(1,2,3,4,10,3,15), M2=c(15,6,7,8,-1,12,5))
> V <- data.frame(V1=c(-1,12,5,7))
> merge(V, M, by.x='V1', by.y='M2', sort=F)
V1 M1
1 -1 10
2 12 3
3 5 15
4 7 3
If V$V1 might contain values not present in M$M2, you may want to specify all.x=T. This will fill in the missing values with NAs instead of omitting them from the result.