is there a better way to write this code in Python ? (transforming UCI-move into bitboard, chess)

is there a better way to write this code in Python ? (transforming UCI-move into bitboard, chess) - chess

I want to transform a UCI-move into bitboard.
for example a2a3 -> 32768, 8388608
I need to assign [7,6,...,0] to [a,b,...,h] so that for each letter i have the assigned number(n) to calculate 2^n
which i can then left shift by the value in uci[1] or uci[3] *8 depending on start- or endfield.
This is my approach and it doesnt look very nice and redundant.
def ucitoBit(uci):
if uci[0] == 'a':
mask1 = 2 ** 7
if uci[0] == 'b':
mask1 = 2 ** 6
if uci[0] == 'c':
mask1 = 2 ** 5
if uci[0] == 'd':
mask1 = 2 ** 4
if uci[0] == 'e':
mask1 = 2 ** 3
if uci[0] == 'f':
mask1 = 2 ** 2
if uci[0] == 'g':
mask1 = 2 ** 1
if uci[0] == 'h':
mask1 = 2 ** 0
mask1 = mask1 << 8 * (int(uci[1]) - 1)
if uci[2] == 'a':
mask2 = 2 ** 7
if uci[2] == 'b':
mask2 = 2 ** 6
if uci[2] == 'c':
mask2 = 2 ** 5
if uci[2] == 'd':
mask2 = 2 ** 4
if uci[2] == 'e':
mask2 = 2 ** 3
if uci[2] == 'f':
mask2 = 2 ** 2
if uci[2] == 'g':
mask2 = 2 ** 1
if uci[2] == 'h':
mask2 = 2 ** 0
mask2 = mask2 << 8 * (int(uci[3]) - 1)
bitstring = [np.uint64(mask1), np.uint64(mask2)]
return bitstring

How about defining two arrays containing the rows and cols indices and use them like this:
rows = ["1", "2", "3", "4", "5", "6", "7", "8"]
cols = ["a", "b", "c", "d", "e", "f", "g", "h"]
def parse_move(move):
from_col, from_row, to_col, to_row = list(move)
from_sq = 2**((7 - cols.index(from_col)) + 8*rows.index(from_row))
to_sq = 2**((7 - cols.index(to_col)) + 8*rows.index(to_row))
return [from_sq, to_sq]

Related

Combine 2 different sized arrays element-wise based on index pairing array

Say, we had 2 arrays of unique values:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # any values are possible,
b = np.array([0, 11, 12, 13, 14, 15, 16, 17, 18, 19]) # sorted values are for demonstration
, where a[0] corresponds to b[0], a[1] to b[11], a[2]-b[12], etc.
Then, due to some circumstances we randomly lost some of it and received noise elements from/to both a & b. Now 'useful data' in a and b are kind of 'eroded' like this:
a = np.array([0, 1, 313, 2, 3, 4, 5, 934, 6, 8, 9, 730, 241, 521])
b = np.array([112, 514, 11, 13, 16, 955, 17, 18, 112])
The noise elements have negligible probability to coincide with any of 'useful data'. So, if to search them, we could find the left ones and to define the 'index pairing array':
cor_tab = np.array([[1,2], [4,3], [8,4], [9,7]])
which, if applied, provides pairs of 'useful data' left:
np.column_stack((a[cor_tab[:,0]], b[cor_tab[:,1]]))
array([[1, 11],
[3, 13],
[6, 16],
[8, 18]])
The question: Given the 'eroded' a and b, how to combine them into numpy array such that:
values indexed in cor_tab are paired in the same column/row,
lost values are treated as -1,
noise as 'don't care', and
array looks like this:
[[ -1 112],
[ 0 514],
[ 1 11],
[313 -1],
[ 2 -1],
[ 3 13],
[ 4 -1],
[ 5 -1],
[934 -1],
[ 6 16],
[ -1 955],
[ -1 17],
[ 8 18],
[ 9 -1],
[730 -1],
[241 -1],
[521 112]]
, where 'useful data' is at indices: 2, 5, 9, 12?
Initially I solved this, in dubious way:
import numpy as np
def combine(aa, bb, t):
c0 = np.empty((0), int)
c1 = np.empty((0), int)
# add -1 & 'noise' at the left side:
if t[0][0] > t[0][1]:
c0 = np.append(c0, aa[: t[0][0]])
c1 = np.append(c1, [np.append([-1] * (t[0][0] - t[0][1]), bb[: t[0][1]])])
else:
c0 = np.append(c0, [np.append([-1] * (t[0][1] - t[0][0]), aa[: t[0][0]])])
c1 = np.append(c1, bb[: t[0][1]])
ind_compenstr = t[0][0] - t[0][1] # 'index compensator'
for i, ii in enumerate(t):
x = ii[0] - ii[1] - ind_compenstr
# add -1 & 'noise' in the middle:
if x > 0:
c0 = np.append(c0, [aa[ii[0]-x:ii[0]]])
c1 = np.append(c1, [[-1] * x])
elif x == 0:
c0 = np.append(c0, [aa[ii[0]-x:ii[0]]])
c1 = np.append(c1, [bb[ii[1]-x:ii[1]]])
else:
x = abs(x)
c0 = np.append(c0, [[-1] * x])
c1 = np.append(c1, [bb[ii[1]-x:ii[1]]])
# add useful elements:
c0 = np.append(c0, aa[ii[0]])
c1 = np.append(c1, bb[ii[1]])
ind_compenstr += x
# add -1 & 'noise' at the right side:
l0 = len(aa) - t[-1][0]
l1 = len(bb) - t[-1][1]
if l0 > l1:
c0 = np.append(c0, aa[t[-1][0] + 1:])
c1 = np.append(c1, [np.append(bb[t[-1][1] + 1:], [-1] * (l0 - l1))])
else:
c0 = np.append(c0, [np.append(aa[t[-1][0] + 1:], [-1] * (l1 - l0))])
c1 = np.append(c1, bb[t[-1][1] + 1:])
return np.array([c0,c1])
But bellow I suggest another solution.

It is difficult to understand what the question want, but IIUC, at first, we need to find the column size of the expected array that contains combined uncommon values between the two arrays (np.union1d), and then create an array based on that size full filled by -1 (np.full). Now, using np.searchsorted, the indices of values of an array in another array will be achieved. Values that are not contained in the other array can be given by np.in1d in invert mode. So we can achieve the goal by indexing as:
union_ = np.union1d(a, b)
# [0 1 2 3 4 5 6 7 8 9]
res = np.full((2, union_.size), -1)
# [[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
# [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]]
arange_row_ids = np.arange(union_.size)
# [0 1 2 3 4 5 6 7 8 9]
col_inds = np.searchsorted(a, b)[np.in1d(b, a, invert=True)]
# np.searchsorted(a, b) ---> [1 3 6 7 7]
# np.in1d(b, a, invert=True) ---> [False False False True False]
# [7]
res[0, np.delete(arange_row_ids, col_inds + np.arange(col_inds.size))] = a
# np.delete(arange_row_ids, col_inds + np.arange(col_inds.size)) ---> [0 1 2 3 4 5 6 8 9]
# [[ 0 1 2 3 4 5 6 -1 8 9]
# [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]]
col_inds = np.searchsorted(b, a)[np.in1d(a, b, invert=True)]
# np.searchsorted(b, a) ---> [0 0 1 1 2 2 2 4 5]
# np.in1d(a, b, invert=True) ---> [ True False True False True True False False True]
# [0 1 2 2 5]
res[1, np.delete(arange_row_ids, col_inds + np.arange(col_inds.size))] = b
# np.delete(arange_row_ids, col_inds + np.arange(col_inds.size)) ---> [1 3 6 7 8]
# [[ 0 1 2 3 4 5 6 -1 8 9]
# [-1 1 -1 3 -1 -1 6 7 8 -1]]
The question is not clear enough to see if the answer is the expected one, but I think it is helpful that could help for further modifications based on the need.

Here's a partially vectorized solution:
import numpy as np
# this function if from Divakar's answer at #https://stackoverflow.com/questions/38619143/convert-python-#sequence-to-numpy-array-filling-missing-values that I used as #function:
def boolean_indexing(v):
lens = np.array([len(item) for item in v])
mask = lens[:,None] > np.arange(lens.max())[::-1]
out = np.full(mask.shape, -1, dtype=int)
out[mask] = np.concatenate(v)
return out
# 2 arrays with eroded useful data and the index pairing array:
a = np.array([0, 1, 313, 2, 3, 4, 5, 934, 6, 8, 9, 730, 241, 521])
b = np.array([112, 514, 11, 13, 16, 955, 17, 18, 112])
cor_tab = np.array([[1,2], [4,3], [8,4], [9,7]])
# split every array by correspondent indices in `cor_tab`:
aa = np.split(a, cor_tab[:,0]+1)
bb = np.split(b, cor_tab[:,1]+1)
#initiate 2 flat empty arrays:
aaa = np.empty((0), int)
bbb = np.empty((0), int)
# loop over the splitted arrays:
for i, j in zip(aa,bb):
c = boolean_indexing([i, j])
aaa = np.append(aaa, c[0])
bbb = np.append(bbb, c[1])
ccc = np.array([aaa,bbb]).T
In case of other types of data, here is another example. Lets take two arrays of letters:
a = np.array(['y', 'w', 'a', 'e', 'i', 'o', 'u', 'y', 'w', 'a', 'e', 'i', 'o', 'u'])
b = np.array(['t', 'h', 'b', 't', 'c', 'n', 's', 'j', 'p', 'z', 'n', 'h', 't', 's', 'm', 'p'])
, and index pairing array:
cor_tab = np.array([[2,0], [3,2], [4,3], [5,5], [6,6], [9,10], [11,12], [13,13]])
np.column_stack((a[cor_tab[:,0]], b[cor_tab[:,1]]))
array([['a', 't'], # useful data
['e', 'b'],
['i', 't'],
['o', 'n'],
['u', 's'],
['a', 'n'],
['i', 't'],
['u', 's']], dtype='<U1')
The only correction required is dtype='<U1' in boolean_indexing(). Result is:
[['y' '-'],
['w' '-'],
['a' 't'],
['-' 'h'],
['e' 'b'],
['i' 't'],
['-' 'c'],
['o' 'n'],
['u' 's'],
['-' 'j'],
['y' 'p'],
['w' 'z'],
['a' 'n'],
['e' 'h'],
['i' 't'],
['o' '-'],
['u' 's'],
['-' 'm'],
['-' 'p']]
It works for floats as well if change dtype in boolean_indexing() to float.

Why am I getting an empty dataframe?

Here is my initial dataframe:
df.head()
Unnamed: 0 Unnamed: 0.1 Unnamed: 0.1.1 Unnamed: 0.1.1.1 Unnamed: 0.1.1.1.1 date time game score home_odds draw_odds away_odds country league
0 0 0 0.0 0.0 0.0 NaN 22:00 Bahia - Vitoria 0:2 1.82 3.36 4.13  Brazil Copa do Nordeste 2020
1 1 1 1.0 1.0 1.0 NaN 20:00 ABC - Ceara 0:0 3.15 3.09 2.15  Brazil Copa do Nordeste 2020
2 2 2 2.0 2.0 2.0 NaN 20:00 Botafogo PB - Nautico 2:1 2.45 3.07 2.81  Brazil Copa do Nordeste 2020
3 3 3 3.0 3.0 3.0 NaN 20:00 Fortaleza - Santa Cruz 3:0 1.43 4.16 6.56  Brazil Copa do Nordeste 2020
4 4 4 4.0 4.0 4.0 07 Feb 2020 00:00 Sport Recife - Imperatriz 2:2 1.36 4.31 7.66  Brazil Copa do Nordeste 2020
I am getting an empty dataframe when I run this code:
import pandas as pd
def harmonize_game(df: pd.DataFrame) -> pd.DataFrame:
df["game"] = df["game"].astype(str).str.replace(r"(\(\w+\))", "", regex=True)
df["game"] = df["game"].astype(str).str.replace(r"(\s\d+\S\d+)$", "", regex=True)
df["league"] = (
df["league"].astype(str).str.replace(r"(\s\d+\S\d+)$", "", regex=True)
)
df[["home_team", "away_team"]] = df["game"].str.split(" - ", expand=True, n=1)
df[["home_score", "away_score"]] = df["score"].str.split(":", expand=True)
print("Data Harmonised")
return df
def numerical_scores(df: pd.DataFrame) -> pd.DataFrame:
df["away_score"] = (
df["away_score"].astype(str).str.replace(r"[a-zA-Z\s\D]", "", regex=True)
)
df["home_score"] = (
df["home_score"].astype(str).str.replace(r"[a-zA-Z\s\D]", "", regex=True)
)
df = df[df.home_score != "."]
df = df[df.home_score != ".."]
df = df[df.home_score != "."]
df = df[df.home_odds != "-"]
df = df[df.draw_odds != "-"]
df = df[df.away_odds != "-"]
m = (
df[["home_odds", "draw_odds", "away_odds"]]
.astype(str)
.agg(lambda x: x.str.count("/"), 1)
.ne(0)
.all(1)
)
n = df[["home_score"]].agg(lambda x: x.str.count("-"), 1).ne(0).all(1)
o = df[["away_score"]].agg(lambda x: x.str.count("-"), 1).ne(0).all(1)
df = df[~m]
df = df[~n]
df = df[~o]
df = df[df.home_score != ""]
df = df[df.away_score != ""]
df = df.dropna()
print("Numerical data harmonised and cleaned")
return df
def coerce_columns(df: pd.DataFrame) -> pd.DataFrame:
df = df.loc[
:,
df.columns.intersection(
[
"datetime",
"country",
"league",
"home_team",
"away_team",
"home_odds",
"draw_odds",
"away_odds",
"home_score",
"away_score",
]
),
]
colt = {
"country": str,
"league": str,
"home_team": str,
"away_team": str,
"home_odds": float,
"draw_odds": float,
"away_odds": float,
"home_score": int,
"away_score": int,
}
df = df.astype(colt)
print("Data types recognized")
return df
def strip_strings(df: pd.DataFrame) -> pd.DataFrame:
return df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
def clean_odds(df: pd.DataFrame) -> pd.DataFrame:
df = df[df["home_odds"] <= 100]
df = df[df["draw_odds"] <= 100]
df = df[df["away_odds"] <= 100]
df = df.drop_duplicates(
[
"datetime",
"home_score",
"away_score",
"country",
"league",
"home_team",
"away_team",
],
keep="last",
)
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print("Dataframe Cleaned")
return df
def clean(df: pd.DataFrame) -> pd.DataFrame:
df = harmonize_game(df)
df = numerical_scores(df)
df = coerce_columns(df)
df = strip_strings(df)
df = clean_odds(df)
print("All steps applied")
return df
def test() -> None:
df = pd.read_csv()
clean(df)
if __name__ == "__main__":
test()
cleaned = pd.DataFrame(test())
cleaned.to_csv()
How do I save the output of
if __name__ == '__main__':
test()
to csv?

As suggested in the comments, you could try like this:
import pandas as pd
def clean(df: pd.DataFrame) -> pd.DataFrame: # fix type hint
# df = harmonize_game(df)
# df = numerical_scores(df)
# df = coerce_columns(df)
# df = strip_strings(df)
# df = clean_odds(df)
print("All steps applied")
return df # add a return statement
def test() -> pd.DataFrame:
df = pd.DataFrame(
{
"Unnamed:": {0: 0, 1: 1},
"0": {0: 0, 1: 1},
"Unnamed:.1": {0: 0.0, 1: 1.0},
"0.1": {0: 0.0, 1: 1.0},
"Unnamed:.2": {0: 0.0, 1: 1.0},
"0.1.1": {0: "nan", 1: "nan"},
"Unnamed:.3": {0: "22:00", 1: "20:00"},
"0.1.1.1": {0: "Bahia", 1: "ABC"},
"Unnamed:.4": {0: "-", 1: "-"},
"0.1.1.1.1": {0: "Vitoria", 1: "Ceara"},
"date": {0: "0:2", 1: "0:0"},
"time": {0: 1.82, 1: 3.15},
"game": {0: 3.36, 1: 3.09},
"score": {0: 4.13, 1: 2.15},
"home_odds": {0: "Brazil", 1: "Brazil"},
"draw_odds": {0: "Copa", 1: "Copa"},
"away_odds": {0: "do", 1: "do"},
"country": {0: "Nordeste", 1: "Nordeste"},
"league": {0: 2020, 1: 2020},
}
) # "pd.read_csv("file.csv")" replaced here just to test that everything works
return clean(df)
if __name__ == "__main__":
# fix the indentation and remove unnecessary statement
cleaned = test()
cleaned.to_csv("new_file.csv")

Calculate the difference between all rows and a specific row in the dataframe

This is a similar question to this thread.
Lets consider df as:
df = pd.DataFrame([["a", 2, 3], ["b", 5, 6], ["c", 8, 9],["a", 0, 0], ["a", 8, 7], ["c", 2, 1]], columns = ["A", "B", "C"])
How can you calculate the difference between all rows and the row at Nth index in a group (lowest index for EACH group) for column "B", and put it in column "D"? I want to calculate mean square displacement for my data and I want to calculate the difference of values in a column in each group with the first appeared row in that group.
I tried:
df['D'] = df.groupby(["A"])['B'].sub(df.groupby(['A'])["B"].iloc[0])
Group = df.groupby(["A"])
However using .sub and groupby raise the following error:
AttributeError: 'SeriesGroupBy' object has no attribute 'sub'
the desired result would be like this:
A B C D
0 a 2 3 0 *lowest index in group "a"
1 b 5 6 0 *lowest index in group "b"
2 c 8 9 0 *lowest index in group "c"
3 a 0 0 -2
4 a 8 7 6
5 c 2 1 -6

I guess this answer could be enough of a hint for you:
import pandas as pd
df = pd.DataFrame([["a", 2, 3], ["b", 5, 6], ["c", 8, 9],["a", 0, 0], ["a", 8, 7], ["c", 2, 1]], columns = ["A", "B", "C"])
print("df:")
print(df)
print()
groupA = df.groupby(['A'])
print("groupA:")
print(groupA.groups)
print()
print("lowest indices for each group from columnA:")
lowest_indices = dict()
for k, v in groupA.groups.items():
lowest_indices[k] = v[0]
print(lowest_indices)
print()
columnB = df['B']
print("columnB:")
print(columnB)
print()
df['D'] = df['B']
for i in range(len(df)):
group_at_i = df['A'].iloc[i]
lowest_index_of_that = lowest_indices[group_at_i]
b_element_at_that_index = df['B'].iloc[lowest_index_of_that]
the_difference = df['B'].iloc[i] - b_element_at_that_index
df.loc[i, 'D'] = the_difference
print("df:")
print(df)

Group Pandas dataframe Age column by Age groups [duplicate]

I have a data frame column with numeric values:
df['percentage'].head()
46.5
44.2
100.0
42.12
I want to see the column as bin counts:
bins = [0, 1, 5, 10, 25, 50, 100]
How can I get the result as bins with their value counts?
[0, 1] bin amount
[1, 5] etc
[5, 10] etc
...

You can use pandas.cut:
bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = pd.cut(df['percentage'], bins)
print (df)
percentage binned
0 46.50 (25, 50]
1 44.20 (25, 50]
2 100.00 (50, 100]
3 42.12 (25, 50]
bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
df['binned'] = pd.cut(df['percentage'], bins=bins, labels=labels)
print (df)
percentage binned
0 46.50 5
1 44.20 5
2 100.00 6
3 42.12 5
Or numpy.searchsorted:
bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = np.searchsorted(bins, df['percentage'].values)
print (df)
percentage binned
0 46.50 5
1 44.20 5
2 100.00 6
3 42.12 5
...and then value_counts or groupby and aggregate size:
s = pd.cut(df['percentage'], bins=bins).value_counts()
print (s)
(25, 50] 3
(50, 100] 1
(10, 25] 0
(5, 10] 0
(1, 5] 0
(0, 1] 0
Name: percentage, dtype: int64
s = df.groupby(pd.cut(df['percentage'], bins=bins)).size()
print (s)
percentage
(0, 1] 0
(1, 5] 0
(5, 10] 0
(10, 25] 0
(25, 50] 3
(50, 100] 1
dtype: int64
By default cut returns categorical.
Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data, operations in categorical.

Using the Numba module for speed up.
On big datasets (more than 500k), pd.cut can be quite slow for binning data.
I wrote my own function in Numba with just-in-time compilation, which is roughly six times faster:
from numba import njit
#njit
def cut(arr):
bins = np.empty(arr.shape[0])
for idx, x in enumerate(arr):
if (x >= 0) & (x < 1):
bins[idx] = 1
elif (x >= 1) & (x < 5):
bins[idx] = 2
elif (x >= 5) & (x < 10):
bins[idx] = 3
elif (x >= 10) & (x < 25):
bins[idx] = 4
elif (x >= 25) & (x < 50):
bins[idx] = 5
elif (x >= 50) & (x < 100):
bins[idx] = 6
else:
bins[idx] = 7
return bins
cut(df['percentage'].to_numpy())
# array([5., 5., 7., 5.])
Optional: you can also map it to bins as strings:
a = cut(df['percentage'].to_numpy())
conversion_dict = {1: 'bin1',
2: 'bin2',
3: 'bin3',
4: 'bin4',
5: 'bin5',
6: 'bin6',
7: 'bin7'}
bins = list(map(conversion_dict.get, a))
# ['bin5', 'bin5', 'bin7', 'bin5']
Speed comparison:
# Create a dataframe of 8 million rows for testing
dfbig = pd.concat([df]*2000000, ignore_index=True)
dfbig.shape
# (8000000, 1)
%%timeit
cut(dfbig['percentage'].to_numpy())
# 38 ms ± 616 µs per loop (mean ± standard deviation of 7 runs, 10 loops each)
%%timeit
bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
pd.cut(dfbig['percentage'], bins=bins, labels=labels)
# 215 ms ± 9.76 ms per loop (mean ± standard deviation of 7 runs, 10 loops each)

We could also use np.select:
bins = [0, 1, 5, 10, 25, 50, 100]
df['groups'] = (np.select([df['percentage'].between(i, j, inclusive='right')
for i,j in zip(bins, bins[1:])],
[1, 2, 3, 4, 5, 6]))
Output:
percentage groups
0 46.50 5
1 44.20 5
2 100.00 6
3 42.12 5

Convenient and fast version using Numpy
np.digitize is a convenient and fast option:
import pandas as pd
import numpy as np
df = pd.DataFrame({'x': [1,2,3,4,5]})
df['y'] = np.digitize(a['x'], bins=[3,5])
print(df)
returns
x y
0 1 0
1 2 0
2 3 1
3 4 1
4 5 2

How to merge columns using mask

I am trying to merge two columns (Phone 1 and 2)
Here is my fake data:
import pandas as pd
employee = {'EmployeeID' : [0, 1, 2, 3, 4, 5, 6, 7],
'LastName' : ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],
'Name' : ['w', 'x', 'y', 'z', None, None, None, None],
'phone1' : [1, 1, 2, 2, 4, 5, 6, 6],
'phone2' : [None, None, 3, 3, None, None, 7, 7],
'level_15' : [0, 1, 0, 1, 0, 0, 0, 1]}
df2 = pd.DataFrame(employee)
and I want the 'phone' column to be
'phone' : [1, 2, 3, 4, 5, 7, 9, 10]
In the beginning of my code, i split the names based on '/' and this code below creates a column with 0s and 1s which I used as mask to do other tasks through out my code.
df2 = (df2.set_index(cols)['name'].str.split('/',expand=True).stack().reset_index(name='Name'))
m = df2['level_15'].eq(0)
print (m)
#remove column level_15
df2 = df2.drop(['level_15'], axis=1)
#add last name for select first letter by condition, replace NaNs by forward fill
df2['last_name'] = df2['name'].str[:2].where(m).ffill()
df2['name'] = df2['name'].mask(m, df2['name'].str[2:])
I feel like there is a way to merge phone1 and phone2 using the 0s and 1s, but I can't figure out. Thank you.

First, start by filling in NaNs;
df2['phone2'] = df2.phone2.fillna(df2.phone1)
# Alternatively, based on your latest update
# df2['phone2'] = df2.phone2.mask(df2.phone2.eq(0)).fillna(df2.phone1)
You can just use np.where to merge columns on odd/even indices:
df2['phone'] = np.where(np.arange(len(df2)) % 2 == 0, df2.phone1, df2.phone2)
df2 = df2.drop(['phone1', 'phone2'], 1)
df2
EmployeeID LastName Name phone
0 0 a w 1
1 1 b x 2
2 2 c y 3
3 3 d z 4
4 4 e None 5
5 5 f None 6
6 6 g None 7
7 7 h None 8
Or, with Series.where/mask:
df2['phone'] = df2.pop('phone1').where(
np.arange(len(df2)) % 2 == 0, df2.pop('phone2')
)
Or,
df2['phone'] = df2.pop('phone1').mask(
np.arange(len(df2)) % 2 != 0, df2.pop('phone2)
)
df2
EmployeeID LastName Name phone
0 0 a w 1
1 1 b x 2
2 2 c y 3
3 3 d z 4
4 4 e None 5
5 5 f None 6
6 6 g None 7
7 7 h None 8

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

is there a better way to write this code in Python ? (transforming UCI-move into bitboard, chess) - chess

Related

Combine 2 different sized arrays element-wise based on index pairing array

Why am I getting an empty dataframe?

Calculate the difference between all rows and a specific row in the dataframe

Group Pandas dataframe Age column by Age groups [duplicate]

How to merge columns using mask

Categories

Resources