Checking if the tictactoe game has ended and returning the winner - numpy

I'm trying to code the function which checks if the tictactoe game has ended and returns the winner if the game has ended. Can you please help me code this part ?
input: s, a 3*3 integer matrix depicting the current state of the game.
if the element of the matrix is equal to 0 means the place is empty, and if the element is equal to 1 means the place is taken by 'X' player and if the element is equal to -1 means that the place is taken by 'O' player.
output:
e: the result, an integer scalar with value 0, 1 or -1.
if e = None, the game has not ended yet.
if e = 0, the game ended with a draw.
if e = 1, X player won the game.
if e = -1, O player won the game.
I tried some user defined functions to work this out but couldn't do it properly. can you please help me code this part ?
[code]
def check_game(s):
e = None
def sum_all_straights(S):
r3 = np.arange(3)
return np.concatenate([s.sum(0),s.sum(1),s[[r3,2-r3],r3].sum(1)])
def winner(S):
ss = sum_all_straights(S)
ssa = np.absolute(ss)
win, = np.where(ssa==3)
if win.size:
e = ss[win[0]]//3
sas = sum_all_straights(np.absolute(S))
if (sas>ssa).all():
e = 0
return e

Here is one approach: We can check for a winner by summing the fields of each straight line. If at least one of the sums is 3 or -3 then player X or O has won.
If there is no winner we check for a draw. The game is drawn if no player can win anymore, i.e. if each straight line has at least one X and one O. We can detect that by first taking the absolute value of the board and then summing, call that "SAS"---this counts the chips on each straight line regardless of who they belong to---and first summing and then take the absolute value, call that "SSA". If both players have chips on a line there will be some cancellation in the sum and SSA will be smaller than SAS. If this is true for each straight line the game is drawn.
Implementation:
import numpy as np
def sum_all_straights(S):
r3 = np.arange(3)
return np.concatenate([S.sum(0),S.sum(1),S[[r3,2-r3],r3].sum(1)])
# ^ ^ ^
# | | |
# | ------------+-----
# | | |
# | ------------ |
# This returns the three vertical sums, | the three horizontal sums and
# the two diagonal sums, using advanced indexing
def winner(S):
ss = sum_all_straights(S)
ssa = np.absolute(ss)
win, = np.where(ssa==3)
if win.size:
return ss[win[0]]//3
sas = sum_all_straights(np.absolute(S))
if (sas>ssa).all():
return 0
examples = b"""
XO. OOO ... XXO
OXO XX. .X. OXX
XXO OXX .O. XOO
"""
examples = np.array(examples.strip().split()).reshape(3,4,1).view("S1").transpose(1,0,2)
examples = 0 + (examples==b'X') - (examples==b'O')
for example in examples:
print(example,winner(example),"\n")
Demo:
[[ 1 -1 0]
[-1 1 -1]
[ 1 1 -1]] None
[[-1 -1 -1]
[ 1 1 0]
[-1 1 1]] -1
[[ 0 0 0]
[ 0 1 0]
[ 0 -1 0]] None
[[ 1 1 -1]
[-1 1 1]
[ 1 -1 -1]] 0

Related

average on dataframe segments

In the following picture, I have DataFrame that renders zero after each cycle of operation (the cycle has random length). I want to calculate the average (or perform other operations) for each patch. For example, the average of [0.762, 0.766] alone, and [0.66, 1.37, 2.11, 2.29] alone and so forth till the end of the DataFrame.
So I worked with this data :
random_value
0 0
1 0
2 1
3 2
4 3
5 0
6 4
7 4
8 0
9 1
There is probably a way better solution, but here is what I came with :
def avg_function(df):
avg_list = []
value_list = list(df["random_value"])
temp_list = []
for i in range(len(value_list)):
if value_list[i] == 0:
if temp_list:
avg_list.append(sum(temp_list) / len(temp_list))
temp_list = []
else:
temp_list.append(value_list[i])
if temp_list: # for the last values
avg_list.append(sum(temp_list) / len(temp_list))
return avg_list
test_list = avg_function(df=df)
test_list
[Out] : [2.0, 4.0, 1.0]
Edit: since requested in the comments, here is a way to add the means to the dataframe. I dont know if there is a way to do that with pandas (and there might be!), but I came up with this :
def add_mean(df, mean_list):
temp_mean_list = []
list_index = 0 # will be the index for the value of mean_list
df["random_value_shifted"] = df["random_value"].shift(1).fillna(0)
random_value = list(df["random_value"])
random_value_shifted = list(df["random_value_shifted"])
for i in range(df.shape[0]):
if random_value[i] == 0 and random_value_shifted[i] == 0:
temp_mean_list.append(0)
elif random_value[i] == 0 and random_value_shifted[i] != 0:
temp_mean_list.append(0)
list_index += 1
else:
temp_mean_list.append(mean_list[list_index])
df = df.drop(["random_value_shifted"], axis=1)
df["mean"] = temp_mean_list
return df
df = add_mean(df=df, mean_list=mean_list
Which gave me :
df
[Out] :
random_value mean
0 0 0
1 0 0
2 1 2
3 2 2
4 3 2
5 0 0
6 4 4
7 4 4
8 0 0
9 1 1

Build a decision Column by ANDing multiple columns in pandas

I have a pandas data frame which is shown below:
>>> x = [[1,2,3,4,5],[1,2,4,4,3],[2,4,5,6,7]]
>>> columns = ['a','b','c','d','e']
>>> df = pd.DataFrame(data = x, columns = columns)
>>> df
a b c d e
0 1 2 3 4 5
1 1 2 4 4 3
2 2 4 5 6 7
I have an array of objects (conditions) as shown below:
[
{
'header' : 'a',
'condition' : '==',
'values' : [1]
},
{
'header' : 'b',
'condition' : '==',
'values' : [2]
},
...
]
and an assignHeader which is:
assignHeader = decision
now I want to do an operation which builds up all the conditions from the conditions array by looping through it, for example something like this:
pConditions = []
for eachCondition in conditions:
header = eachCondition['header']
values = eachCondition['values']
if eachCondition['condition'] == "==":
pConditions.append(df[header].isin(values))
else:
pConditions.append(~df[header].isin(values))
df[assignHeader ] = and(pConditions)
I was thinking of using all operator in pandas but am unable to crack the right syntax to do so. The list I shared can go big and dynamic and so I want to use this nested approach and check for the equality. Does anyone know a way to do so?
Final Output:
conditons = [df['a']==1,df['b']==2]
>>> df['decision'] = (df['a']==1) & (df['b']==2)
>>> df
a b c d e decision
0 1 2 3 4 5 True
1 1 2 4 4 3 True
2 2 4 5 6 7 False
Here conditions array will be variable. And I want to have a function which takes df, 'newheadernameandconditions` as input and returns the output as shown below:
>>> df
a b c d e decision
0 1 2 3 4 5 True
1 1 2 4 4 3 True
2 2 4 5 6 7 False
where newheadername = 'decision'
I was able to solve the problem using the code shown below. I am not sure if this is kind of fast way of getting things done, but would love to know your inputs in case you have any specific thing to point out.
def andMerging(conditions, mergeHeader, df):
if len(conditions) != 0:
df[mergeHeader] = pd.concat(conditions, axis = 1).all(axis = 1)
return df
where conditions are an array of pd.Series with boolean values.
And conditions are formatted as shown below:
def prepareForConditionMerging(conditionsArray, df):
conditions = []
for prop in conditionsArray:
condition = prop['condition']
values = prop['values']
header = prop['header']
if type(values) == str:
values = [values]
if condition=="==":
conditions.append(df[header].isin(values))
else:
conditions.append(~df[header].isin(values))
# Here we can add more conditions such as greater than less than etc.
return conditions

Splitting a coordinate string into X and Y columns with a pandas data frame

So I created a pandas data frame showing the coordinates for an event and number of times those coordinates appear, and the coordinates are shown in a string like this.
Coordinates Occurrences x
0 (76.0, -8.0) 1 0
1 (-41.0, -24.0) 1 1
2 (69.0, -1.0) 1 2
3 (37.0, 30.0) 1 3
4 (-60.0, 1.0) 1 4
.. ... ... ..
63 (-45.0, -11.0) 1 63
64 (80.0, -1.0) 1 64
65 (84.0, 24.0) 1 65
66 (76.0, 7.0) 1 66
67 (-81.0, -5.0) 1 67
I want to create a new data frame that shows the x and y coordinates individually and shows their occurrences as well like this--
x Occurrences y Occurrences
76 ... -8 ...
-41 ... -24 ...
69 ... -1 ...
37 ... -30 ...
60 ... 1 ...
I have tried to split the string but don't think I am doing it correctly and don't know how to add it to the table regardless--I think I'd have to do something like a for loop later on in my code--I scraped the data from an API, here is the code to set up the data frame shown.
for key in contents['liveData']['plays']['allPlays']:
# for plays in key['result']['event']:
# print(key)
if (key['result']['event'] == "Shot"):
#print(key['result']['event'])
scoordinates = (key['coordinates']['x'], key['coordinates']['y'])
if scoordinates not in shots:
shots[scoordinates] = 1
else:
shots[scoordinates] += 1
if (key['result']['event'] == "Goal"):
#print(key['result']['event'])
gcoordinates = (key['coordinates']['x'], key['coordinates']['y'])
if gcoordinates not in goals:
goals[gcoordinates] = 1
else:
goals[gcoordinates] += 1
#create data frame using pandas
gdf = pd.DataFrame(list(goals.items()),columns = ['Coordinates','Occurences'])
print(gdf)
sdf = pd.DataFrame(list(shots.items()),columns = ['Coordinates','Occurences'])
print()
try this
import re
df[['x', 'y']] = df.Coordinates.apply(lambda c: pd.Series(dict(zip(['x', 'y'], re.findall('[-]?[0-9]+\.[0-9]+', c.strip())))))
using the in-built string methods to achieve this should be performant:
df[["x", "y"]] = df["Coordinates"].str.strip(r"[()]").str.split(",", expand=True).astype(np.float)
(this also converts x,y to float values, although not requested probably desired)

How to find word frequency per country list in pandas?

Let's say I have a .CSV which has three columns: tidytext, location, vader_senti
I was already able to get the amount of *positive, neutral and negative text instead of word* pero country using the following code:
data_vis = pd.read_csv(r"csviamcrpreprocessed.csv", usecols=fields)
def print_sentiment_scores(text):
vadersenti = analyser.polarity_scores(str(text))
return pd.Series([vadersenti['pos'], vadersenti['neg'], vadersenti['neu'], vadersenti['compound']])
data_vis[['vadersenti_pos', 'vadersenti_neg', 'vadersenti_neu', 'vadersenti_compound']] = data_vis['tidytext'].apply(print_sentiment_scores)
data_vis['vader_senti'] = 'neutral'
data_vis.loc[data_vis['vadersenti_compound'] > 0.3 , 'vader_senti'] = 'positive'
data_vis.loc[data_vis['vadersenti_compound'] < 0.23 , 'vader_senti'] = 'negative'
data_vis['vader_possentiment'] = 0
data_vis.loc[data_vis['vadersenti_compound'] > 0.3 , 'vader_possentiment'] = 1
data_vis['vader_negsentiment'] = 0
data_vis.loc[data_vis['vadersenti_compound'] <0.23 , 'vader_negsentiment'] = 1
data_vis['vader_neusentiment'] = 0
data_vis.loc[(data_vis['vadersenti_compound'] <=0.3) & (data_vis['vadersenti_compound'] >=0.23) , 'vader_neusentiment'] = 1
sentimentbylocation = data_vis.groupby(["Location"])['vader_senti'].value_counts()
sentimentbylocation
sentimentbylocation gives me the following results:
Location vader_senti
Afghanistan negative 151
positive 25
neutral 2
Albania negative 6
positive 1
Algeria negative 116
positive 13
neutral 4
TO GET THE MOST COMMON POSITIVE WORDS, I USED THIS CODE:
def process_text(text):
tokens = []
for line in text:
toks = tokenizer.tokenize(line)
toks = [t.lower() for t in toks if t.lower() not in stopwords_list]
tokens.extend(toks)
return tokens
tokenizer=TweetTokenizer()
punct = list(string.punctuation)
stopwords_list = stopwords.words('english') + punct + ['rt','via','...','…','’','—','—:',"‚","â"]
pos_lines = list(data_vis[data_vis.vader_senti == 'positive'].tidytext)
pos_tokens = process_text(pos_lines)
pos_freq = nltk.FreqDist(pos_tokens)
pos_freq.most_common()
Running this will give me the most common words and the number of times they appeared, such as
[(good, 1212),
(amazing, 123)
However, what I want to see is how many of these positive words appeared in a country.
For example:
I have a sample CSV here: https://drive.google.com/file/d/112k-6VLB3UyljFFUbeo7KhulcrMedR-l/view?usp=sharing
Create a column for each most_common word, then do a groupby location and use agg to apply a sum for each count:
words = [i[0] for i in pos_freq.most_common()]
# lowering all cases in tidytext
data_vis.tidytext = data_vis.tidytext.str.lower()
for i in words:
data_vis[i] = data_vis.tidytext.str.count(i)
funs = {i: 'sum' for i in words}
grouped = data_vis.groupby('Location').agg(funs)
Based on the example from the CSV and using most_common as ['good', 'amazing'] the result would be:
grouped
# good amazing
# Location
# Australia 0 1
# Belgium 6 4
# Japan 2 1
# Thailand 2 0
# United States 1 0

Sample without replacement

How to sample without replacement in TensorFlow? Like numpy.random.choice(n, size=k, replace=False) for some very large integer n (e.g. 100k-100M), and smaller k (e.g. 100-10k).
Also, I want it to be efficient and on the GPU, so other solutions like this with tf.py_func are not really an option for me. Anything which would use tf.range(n) or so is also not an option because n could be very large.
This is one way:
n = ...
sample_size = ...
idx = tf.random_shuffle(tf.range(n))[:sample_size]
EDIT:
I had posted the answer below but then read the last line of your post. I don't think there is a good way to do it if you absolutely cannot produce a tensor with size O(n) (numpy.random.choice with replace=False is also implemented as a slice of a permutation). You could resort to a tf.while_loop until you have unique indices:
n = ...
sample_size = ...
idx = tf.zeros(sample_size, dtype=tf.int64)
idx = tf.while_loop(
lambda i: tf.size(idx) == tf.size(tf.unique(idx)),
lambda i: tf.random_uniform(sample_size, maxval=n, dtype=int64))
EDIT 2:
About the average number of iterations in the previous method. If we call n the number of possible values and k the length of the desired vector (with k ≤ n), the probability that an iteration is successful is:
p = product((n - (i - 1) / n) for i in 1 .. k)
Since each iteartion can be considered a Bernoulli trial, the average number of trials unitl first success is 1 / p (proof here). Here is a function that calculates the average numbre of trials in Python for some k and n values:
def avg_iter(k, n):
if k > n or n <= 0 or k < 0:
raise ValueError()
avg_it = 1.0
for p in (float(n) / (n - i) for i in range(k)):
avg_it *= p
return avg_it
And here are some results:
+-------+------+----------+
| n | k | Avg iter |
+-------+------+----------+
| 10 | 5 | 3.3 |
| 100 | 10 | 1.6 |
| 1000 | 10 | 1.1 |
| 1000 | 100 | 167.8 |
| 10000 | 10 | 1.0 |
| 10000 | 100 | 1.6 |
| 10000 | 1000 | 2.9e+22 |
+-------+------+----------+
You can see it varies wildy depending on the parameters.
It is possible, though, to construct a vector in a fixed number of steps, although the only algorithm I can think of is O(k2). In pure Python it goes like this:
import random
def sample_wo_replacement(n, k):
sample = [0] * k
for i in range(k):
sample[i] = random.randint(0, n - 1 - len(sample))
for i, v in reversed(list(enumerate(sample))):
for p in reversed(sample[:i]):
if v >= p:
v += 1
sample[i] = v
return sample
random.seed(100)
print(sample_wo_replacement(10, 5))
# [2, 8, 9, 7, 1]
print(sample_wo_replacement(10, 10))
# [6, 5, 8, 4, 0, 9, 1, 2, 7, 3]
This is a possible way to do it in TensorFlow (not sure if the best one):
import tensorflow as tf
def sample_wo_replacement_tf(n, k):
# First loop
sample = tf.constant([], dtype=tf.int64)
i = 0
sample, _ = tf.while_loop(
lambda sample, i: i < k,
# This is ugly but I did not want to define more functions
lambda sample, i: (tf.concat([sample,
tf.random_uniform([1], maxval=tf.cast(n - tf.shape(sample)[0], tf.int64), dtype=tf.int64)],
axis=0),
i + 1),
[sample, i], shape_invariants=[tf.TensorShape((None,)), tf.TensorShape(())])
# Second loop
def inner_loop(sample, i):
sample_size = tf.shape(sample)[0]
v = sample[i]
j = i - 1
v, _ = tf.while_loop(
lambda v, j: j >= 0,
lambda v, j: (tf.cond(v >= sample[j], lambda: v + 1, lambda: v), j - 1),
[v, j])
return (tf.where(tf.equal(tf.range(sample_size), i), tf.tile([v], (sample_size,)), sample), i - 1)
i = tf.shape(sample)[0] - 1
sample, _ = tf.while_loop(lambda sample, i: i >= 0, inner_loop, [sample, i])
return sample
And an example:
with tf.Graph().as_default(), tf.Session() as sess:
tf.set_random_seed(100)
sample = sample_wo_replacement_tf(10, 5)
for i in range(10):
print(sess.run(sample))
# [3 0 6 8 4]
# [5 4 8 9 3]
# [1 4 0 6 8]
# [8 9 5 6 7]
# [7 5 0 2 4]
# [8 4 5 3 7]
# [0 5 7 4 3]
# [2 0 3 8 6]
# [3 4 8 5 1]
# [5 7 0 2 9]
This is quite intesive on tf.while_loops, though, which are well-known not to be particularly fast in TensorFlow, so I wouldn't know how fast can you really get with this method without some kind of benchmarking.
EDIT 4:
One last possible method. You can divide the range of possible values (0 to n) in "chunks" of size c and pick a random amount of numbers from each chunk, then shuffle everything. The amount of memory that you use is limited by c, and you don't need nested loops. If n is divisible by c, then you should get about a perfect random distribution, otherwise values in the last "short" chunk would receive some extra probability (this may be negligible depending on the case). Here is a NumPy implementation. It is somewhat long to account for different corner cases and pitfalls, but if c ≥ k and n mod c = 0 several parts get simplified.
import numpy as np
def sample_chunked(n, k, chunk=None):
chunk = chunk or n
last_chunk = chunk
parts = n // chunk
# Distribute k among chunks
max_p = min(float(chunk) / k, 1.0)
max_p_last = max_p
if n % chunk != 0:
parts += 1
last_chunk = n % chunk
max_p_last = min(float(last_chunk) / k, 1.0)
p = np.full(parts, 2)
# Iterate until a valid distribution is found
while not np.isclose(np.sum(p), 1) or np.any(p > max_p) or p[-1] > max_p_last:
p = np.random.uniform(size=parts)
p /= np.sum(p)
dist = (k * p).astype(np.int64)
sample_size = np.sum(dist)
# Account for rounding errors
while sample_size < k:
i = np.random.randint(len(dist))
while (dist[i] >= chunk) or (i == parts - 1 and dist[i] >= last_chunk):
i = np.random.randint(len(dist))
dist[i] += 1
sample_size += 1
while sample_size > k:
i = np.random.randint(len(dist))
while dist[i] == 0:
i = np.random.randint(len(dist))
dist[i] -= 1
sample_size -= 1
assert sample_size == k
# Generate sample parts
sample_parts = []
for i, v in enumerate(np.nditer(dist)):
if v <= 0:
continue
c = chunk if i < parts - 1 else last_chunk
base = chunk * i
sample_parts.append(base + np.random.choice(c, v, replace=False))
sample = np.concatenate(sample_parts, axis=0)
np.random.shuffle(sample)
return sample
np.random.seed(100)
print(sample_chunked(15, 5, 4))
# [ 8 9 12 13 3]
A quick benchmark of sample_chunked(100000000, 100000, 100000) takes about 3.1 seconds in my computer, while I haven't been able to run the previous algorithm (sample_wo_replacement function above) to completion with the same parameters. It should be possible to implement it in TensorFlow, maybe using tf.TensorArray, although it would require significant effort to get it exactly right.
use the gumbel-max trick here: https://github.com/tensorflow/tensorflow/issues/9260
z = -tf.log(-tf.log(tf.random_uniform(tf.shape(logits),0,1)))
_, indices = tf.nn.top_k(logits + z,K)
indices are what you want. This tick is so easy~!
The following works fairly fast on the GPU, and I did not encounter memory issues when using n~100M and k~10k (using NVIDIA GeForce GTX 1080 Ti):
def random_choice_without_replacement(n, k):
"""equivalent to 'numpy.random.choice(n, size=k, replace=False)'"""
return tf.math.top_k(tf.random.uniform(shape=[n]), k, sorted=False).indices