Fix the length of some columns using Pandas - pandas

I am trying to add some columns to a pandas dataFrame, but I cannot set the character length of the columns.
I want to add the new fields as a string with a value of null and a length of two characters as the length of the field.
Any idea is welcome.
import pandas as pd
df[["Assess", "Operator","x", "y","z", "g"]]=None

If need fix length of columns in new DataFrame use:
from itertools import product
import string
#length of one character
letters = string.ascii_letters
#print(len(letters)) #52
#if need length of two characters
#print(len(letters)) #2704
#letters = [''.join(x) for x in product(letters,letters)]
df = pd.DataFrame({'col1':[4,5], 'col':[8,2]})
#threshold
N = 5
#get new columns names by difference with original columns length
#min is used if possible negative number after subraction, then is set 0
cols = list(letters[:max(0, N- len(df.columns))])
#added new columns filled by None
#filter by threshold (if possible more columns in original like `N`)
df = df.assign(**dict.fromkeys(cols, None)).iloc[:, :N]
print (df)
col1 col a b c
0 4 8 None None None
1 5 2 None None None
Test if more columns like N threshold:
df = pd.DataFrame({'col1':[4,5], 'col2':[8,2],'col3':[4,5],
'col4':[8,2], 'col5':[7,3],'col6':[9,0], 'col7':[5,1]})
print (df)
col1 col2 col3 col4 col5 col6 col7
0 4 8 4 8 7 9 5
1 5 2 5 2 3 0 1
N = 5
cols = list(letters[:max(0, N - len(df.columns))])
df = df.assign(**dict.fromkeys(cols, None)).iloc[:, :N]
print (df)
col1 col2 col3 col4 col5
0 4 8 4 8 7
1 5 2 5 2 3

Related

pandas creating new columns for each value in categorical columns

I have a pandas dataframe with some numeric and some categoric columns. I want to create a new column for each value of every categorical column and give that column a value of 1 in every row where that value is true and 0 in every row where that value is false. So the df is something like this -
col1 col2 col3
A P 1
B P 3
A Q 7
expected result is something like this:
col1 col2 col3 A B P Q
A P 1 1 0 1 0
B P 3 0 1 1 0
A Q 7 1 0 0 1
Is this possible? can someone please help me?
Use df.select_dtypes, pd.get_dummies with pd.concat:
# First select all columns which have object dtypes
In [826]: categorical_cols = df.select_dtypes('object').columns
# Create one-hot encoding for the above cols and concat with df
In [817]: out = pd.concat([df, pd.get_dummies(df[categorical_cols])], 1)
In [818]: out
Out[818]:
col1 col2 col3 col1_A col1_B col2_P col2_Q
0 A P 1 1 0 1 0
1 B P 3 0 1 1 0
2 A Q 7 1 0 0 1

Pandas new column that is sum of last N columns

Using Python 3.7 & Pandas, how can I create a new column that is the sum of the last N columns?
There are several questions with this title (example here), but they all seem to be referring to rolling thru last N rows which is not what I am after
col1 = [0,1,1,0,0,0,1,1,1]
col2 = [1,5,9,2,4,2,5,6,1]
col3 = [25,14,2,15,18,98,65,4,77]
col4 = [1,1,1,1,1,1,1,1,1]
df = pd.DataFrame(list(zip(col1, col2, col3, col4)), columns =['col1', 'col2', 'col3', 'col4'])
Desired Result
Let us try
c = df.columns
df['last_2'] = df.loc[:,c[-2:]].sum(1)
#df['last_3'] = df.loc[:,c[-3:]].sum(1)
0 26
1 15
2 3
3 16
4 19
5 99
6 66
7 5
8 78
dtype: int64

Pandas pivot table or groupby absolute maximum of column

I have a dataframe df as:
Col1 Col2
A -5
A 3
B -2
B 15
I need to get the following:
Col1 Col2
A -5
B 15
Where the decision was made for each group in Col1 by selecting the absolute maximum from Col2. I am not sure how to proceed with this.
Use DataFrameGroupBy.idxmax with pass absolute values for indices and then select by DataFrame.loc:
df = df.loc[df['Col2'].abs().groupby(df['Col1']).idxmax()]
#alternative with reassign column
df = df.loc[df.assign(Col2 = df['Col2'].abs()).groupby('Col1')['Col2'].idxmax()]
print (df)
Col1 Col2
0 A -5
3 B 15

Python Pandas: LabelEncoding fitting unknown variables

Hi I have a dataframe full of strings and I want to encode these strings and store their corresponding codes.
I want to produce these codes on one column and fit onto another column.
When I fit these codes on some other column that has a string that I haven't seen on my training column I want to create another unique value for that.
I have tried LabelEncoding function but it gives error on the previously unseen strings.
For example a have dataframe:
col1 col2
a a
b b
c e
d f
After training LabelEncoding on first column I get something like this:
col1 col2
1 a
2 b
3 e
4 f
After fitting on the created codes onthe second column I want to have something like this:
col1 col2
1 1
2 2
3 5
4 6
What is the easiest way to do this. Thank you.
Created df dataframe by copying sample from OP's post as follows.
df=pd.read_clipboard()
Its value will be as follows when we print it:
col1 col2
0 a a
1 b b
2 c e
3 d f
Could you please try following. I have given here only 1st 6 alphabets you could mention all in case you have them in your actual Input_file.
dict1 = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6}
df.applymap(lambda s: dict1.get(s) if s in dict1 else s)
Output will be as follows.
col1 col2
0 1 1
1 2 2
2 3 5
3 4 6
You could encoding yourself using pd.factorize:
v, k = pd.factorize(sorted(df.stack().unique()))
m = dict(zip(k.tolist(), (v+1).tolist()))
df.replace(m)
Output:
col1 col2
0 1 1
1 2 2
2 3 5
3 4 6
I think the real trick is to stack col1 and col2 then encoding the values of both list as one.
le = LabelEncoder()
le.fit(df.stack())

Map column names if data is same in two dataframes

I have two pandas dataframes
df1 = A B C
1 2 3
2 3 4
3 4 5
df2 = X Y Z
1 2 3
2 3 4
3 4 5
I need to map based on data If data is same then map column namesenter code here
Output = col1 col2
A X
B Y
C Z
I cannot find any built-in function to support this, hence simply loop over all columns:
pairs = []
for col1 in df1.columns:
for col2 in df2.columns:
if df1[col1].equals(df2[col2]):
pairs.append((col1, col2))
output = pandas.DataFrame(pairs, columns=['col1', 'col2'])