Inputting first and last name to output a value in Pandas Dataframe - pandas

I am trying to create an input function that returns a value for the corresponding first and last name.
For this example i'd like to be able to enter "Emily" and "Bell" and return "attempts: 3"
Heres my code so far:
import pandas as pd
import numpy as np
data = {
'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily',
'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'lastname': ['Thompson','Wu', 'Downs','Hunter','Bell','Cisneros', 'Becker', 'Sims', 'Gallegos', 'Horne'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no',
'yes', 'yes', 'no', 'no', 'yes']
}
data
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, index=labels)
df
fname = input()
lname = input()
print(f"{fname} {lname}'s number of attempts: {???}")
I thought there would be specific documentation for this but I cant find any on the pandas dataframe documentation. I am assuming its pretty simple but can't find it.

fname = input()
lname = input()
# use loc to filter the row and then capture the value from attempts columns
print(f"{fname} {lname}'s number of attempts:{df.loc[df['name'].eq(fname) & df['lastname'].eq(lname)]['attempts'].squeeze()}")
Emily
Bell
Emily Bell's number of attempts:2
alternately, to avoid mismatch due to case
fname = input().lower()
lname = input().lower()
print(f"{fname} {lname}'s number of attempts:{df.loc[(df['name'].str.lower() == fname) & (df['lastname'].str.lower() == lname)]['attempts'].squeeze()}")
emily
BELL
emily bell's number of attempts:2

Try this:
df[(df['name'] == fname) & (df['lastname'] == lname)]['attempts'].squeeze()

Related

Pandas: how to read CSV specific columns which doesn't contain header

usecols = [*range(1, 5), *range(7, 9), *range(11, 13)]
df = pd.read_csv('/content/drive/MyDrive/weather.csv', header=None, usecols=usecols, names=['d', 'm', 'y', 'time', 'temp1', 'outtemp', 'temp2', 'air_pressure', 'humidity'])
I'm trying this but always get
ValueError: Usecols do not match columns, columns expected but not found: [1, 2, 3, 4, 7, 8, 11, 12]
my data set looks like:
The problem you are seeing is due to a mismatch in the number of columns designated by usecols and the number columns designated by names.
Usecols:
[1, 2, 3, 4, 7, 8, 11, 12] - 8 columns
names:
'd', 'm', 'y', 'time', 'temp1', 'outtemp', 'temp2', 'air_pressure', 'humidity' - 9 columns
Change code so that the range in usecols ends in 14 rather than 13:
Code:
usecols = [*range(1, 5), *range(7, 9), *range(11, 14)]
df = pd.read_csv('/content/drive/MyDrive/weather.csv', header=None, usecols=usecols, names=['d', 'm', 'y', 'time', 'temp1', 'outtemp', 'temp2', 'air_pressure', 'humidity'])
Example df output:

Python pandas data frame issue. How can I conditionally insert rows and get the subtotal of the numeric column? Already solved it but any improvement?

With the change of Function, I want empty rows inserted below and the subtotal of Amount in the empty row below.
How can I imporve the code? A better or shorter way?
insertROW_df = pd.read_clipboard()
insertROW_df
insertROW_df['match'] = insertROW_df['Function'].eq(insertROW_df['Function'].shift(-1))
insertROW_df['insert_row_below?'] = insertROW_df['match'].apply(lambda x: 'Yes' if x == False else "No")
index_changed_row = insertROW_df.index[insertROW_df['match']==False]
type(insertROW_df)
line = pd.DataFrame({"Function": np.nan, "Budget": np.nan, "Amount":np.nan,"match":np.nan,"insert_row_below?":np.nan }, index=index_changed_row)
df = insertROW_df.append(line, ignore_index=False)
df = df.sort_index().reset_index(drop=True)
df['Total'] = df.groupby(['Function'])['Amount'].transform('sum')
df['Total'] = df['Total'].fillna(method="ffill")
df['Amount'] = df['Amount'].fillna(df['Total'])
print(df)
print(df[['Function', 'Budget', 'Amount']])
df.to_excel(r'Z:\Claiming FY17, 18, 19, 20, 21 & 22\November 2021\Pythonic approach\InsertBlankRows.xlsx', index=False, header=True)
DATAFRAME = pd.DataFrame({'Function':['AAA', 'AAA', 'AAA', 'AAA', 'BBB', 'BBB', 'BBB', 'CCC', 'CCC'],
'Budget': [550,550,550,680,550,550,550,860,860,],
'Amount': [14850,8640,2150,3210,5540,6660,2210,5555,5595,]
})

How to shift entire group of multiple columns

I have a dataframe like below:
import pandas as pd
import numpy as np
np.random.seed(22)
df = pd.DataFrame.from_dict({'a': np.random.rand(200), 'b': np.random.rand(200), 'x': np.tile(np.concatenate([np.repeat('F', 5), np.repeat('G', 5)]), 20)})
df.index = pd.MultiIndex.from_product([[1, 2], list(range(0, 10)), [1, 2, 3, 4, 5, 1, 2, 3, 4, 5]])
df.index.names = ['g_id', 'm_id', 'object_id']
I'd like to shift the values for the entire groups defined by ['g_id', 'm_id'] so that for example:
the value of ['a', 'b'] columns in index (1, 1, 1) of the new data frame would be the value from index (1, 0, 1) of the original dataframe, i.e. [0.208461, 0.980866]
the value of ['a', 'b'] columns in index (2, 4, 3) of the new data frame would be the value from index (2, 3, 3) of the original data frame, i.e. [0.651138, 0.559126].
The operation is similar to the one covered in this topic. However, I need to do this with multiple columns and I had no luck trying to generalise the provided solution.

Multi-column label-encoding: Print mappings

Following code can be used to transform strings into categorical labels:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame([['A','B','C','D','E','F','G','I','K','H'],
['A','E','H','F','G','I','K','','',''],
['A','C','I','F','H','G','','','','']],
columns=['A1', 'A2', 'A3','A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10'])
pd.DataFrame(columns=df.columns, data=LabelEncoder().fit_transform(df.values.flatten()).reshape(df.shape))
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0 1 2 3 4 5 6 7 9 10 8
1 1 5 8 6 7 9 10 0 0 0
2 1 3 9 6 8 7 0 0 0 0
Question:
How can I query the mappings (it appears they are sorted alphabetically)?
I.e. a list like:
A: 1
B: 2
C: 3
...
I: 9
K: 10
Thank you!
yes, it's possible if you define the LabelEncoder separately and query its classes_ attribute later.
le = LabelEncoder()
data = le.fit_transform(df.values.flatten())
dict(zip(le.classes_[1:], np.arange(1, len(le.classes_))))
{'A': 1,
'B': 2,
'C': 3,
'D': 4,
'E': 5,
'F': 6,
'G': 7,
'H': 8,
'I': 9,
'K': 10}
The classes_ stores a list of classes, in the order that they were encoded.
le.classes_
array(['', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K'], dtype=object)
So you may safely assume the first element is encoded as 1, and so on.
To reverse encodings, use le.inverse_transform.
I think there is transform in LabelEncoder
le=LabelEncoder()
le.fit(df.values.flatten())
dict(zip(df.values.flatten(),le.transform(df.values.flatten()) ))
Out[137]:
{'': 0,
'A': 1,
'B': 2,
'C': 3,
'D': 4,
'E': 5,
'F': 6,
'G': 7,
'H': 8,
'I': 9,
'K': 10}

Convert pandas to dictionary defining the columns used fo the key values

There's the pandas dataframe 'test_df'. My aim is to convert it to a dictionary. Therefore I run this:
id Name Gender Age
0 1 'Peter' 'M' 32
1 2 'Lara' 'F' 45
Therefore I run this:
test_dict = test_df.set_index('id').T.to_dict()
The output is this:
{1: {'Name': 'Peter', 'Gender': 'M', 'Age': 32}, 2: {'Name': 'Lara', 'Gender': 'F', 'Age': 45}}
Now, I want to choose only the 'Name' and 'Gender' columns as the values of dictionary's keys. I'm trying to modify the above script into sth like this:
test_dict = test_df.set_index('id')['Name']['Gender'].T.to_dict()
with no success!
Any suggestion please?!
You was very close, use subset of columns [['Name','Gender']]:
test_dict = test_df.set_index('id')[['Name','Gender']].T.to_dict()
print (test_dict)
{1: {'Name': 'Peter', 'Gender': 'M'}, 2: {'Name': 'Lara', 'Gender': 'F'}}
Also T is not necessary, use parameter orient='index':
test_dict = test_df.set_index('id')[['Name','Gender']].to_dict(orient='index')
print (test_dict)
{1: {'Name': 'Peter', 'Gender': 'M'}, 2: {'Name': 'Lara', 'Gender': 'F'}}