how to get cell value of a pd data frame [duplicate] - pandas

Let's say we have a pandas dataframe:
name age sal
0 Alex 20 100
1 Jane 15 200
2 John 25 300
3 Lsd 23 392
4 Mari 21 380
Let's say, a few rows are now deleted and we don't know the indexes that have been deleted. For example, we delete row index 1 using df.drop([1]). And now the data frame comes down to this:
fname age sal
0 Alex 20 100
2 John 25 300
3 Lsd 23 392
4 Mari 21 380
I would like to get the value from row index 3 and column "age". It should return 23. How do I do that?
df.iloc[3, df.columns.get_loc('age')] does not work because it will return 21. I guess iloc takes the consecutive row index?

Use .loc to get rows by label and .iloc to get rows by position:
>>> df.loc[3, 'age']
23
>>> df.iloc[2, df.columns.get_loc('age')]
23
More about Indexing and selecting data

dataset = ({'name':['Alex', 'Jane', 'John', 'Lsd', 'Mari'],
'age': [20, 15, 25, 23, 21],
'sal': [100, 200, 300, 392, 380]})
df = pd.DataFrame(dataset)
df.drop([1], inplace=True)
df.loc[3,['age']]

try this one:
[label, column name]
value = df.loc[1,"column_name]

Related

Exploding nested lists using Pandas Series keeps failing

not used pandas explode before. I got the gist of the pd.explode but for value lists where selective cols have nested lists I heard that pd.Series.explode is useful. However, i keep getting : "KeyError: "None of ['city'] are in the columns". Yet 'city' is defined in the keys:
keys = ["city", "temp"]
values = [["chicago","london","berlin"], [[32,30,28],[39,40,25],[33,34,35]]]
df = pd.DataFrame({"keys":keys,"values":values})
df2 = df.set_index(['city']).apply(pd.Series.explode).reset_index()
desired output is:
city / temp
chicago / 32
chicago / 30
chicago / 28
etc.
I would appreciate an expert weighing in as to why this throws an error, and a fix, thank you.
The problem comes from how you define df:
df = pd.DataFrame({"keys":keys,"values":values})
This actually gives you the following dataframe:
keys values
0 city [chicago, london, berlin]
1 temp [[32, 30, 28], [39, 40, 25], [33, 34, 35]]
You probably meant:
df = pd.DataFrame(dict(zip(keys, values)))
Which gives you:
city temp
0 chicago [32, 30, 28]
1 london [39, 40, 25]
2 berlin [33, 34, 35]
You can then use explode:
print(df.explode('temp'))
Output:
city temp
0 chicago 32
0 chicago 30
0 chicago 28
1 london 39
1 london 40
1 london 25
2 berlin 33
2 berlin 34
2 berlin 35

Pandas Decile Rank

I just used the pandas qcut function to create a decile ranking, but how do I look at the bounds of each ranking. Basically, how do I know what numbers fall in the range of the ranking of 1 or 2 or 3 etc?
I hope the following python code with 2 short examples can help you. For the second example I used the isin method.
import numpy as np
import pandas as pd
df = {'Name' : ['Mike', 'Anton', 'Simon', 'Amy',
'Claudia', 'Peter', 'David', 'Tom'],
'Score' : [42, 63, 75, 97, 61, 30, 80, 13]}
df = pd.DataFrame(df, columns = ['Name', 'Score'])
df['decile_rank'] = pd.qcut(df['Score'], 10,
labels = False)
print(df)
Output:
Name Score decile_rank
0 Mike 42 2
1 Anton 63 5
2 Simon 75 7
3 Amy 97 9
4 Claudia 61 4
5 Peter 30 1
6 David 80 8
7 Tom 13 0
rank_1 = df[df['decile_rank']==1]
print(rank_1)
Output:
Name Score decile_rank
5 Peter 30 1
rank_1_and_2 = df[df['decile_rank'].isin([1,2])]
print(rank_1_and_2)
Output:
Name Score decile_rank
0 Mike 42 2
5 Peter 30 1

Create column with values only for some multiindex in pandas

I have a dataframe like this:
df = pd.DataFrame(np.random.randint(50, size=(4, 4),
index=[['a', 'a', 'b', 'b'], [800, 900, 800, 900]],
columns=['X', 'Y', 'r_value', 'z_value'])
df.index.names = ["dat", "recor"]
X Y r_value z_value
dat recor
a 800 14 28 12 18
900 47 34 59 49
b 800 33 18 24 33
900 18 25 44 19
...
I want to apply a function to create a new column based on r_value that gives values only for the case of recor==900, so, in the end I would like something like:
X Y r_value z_value BB
dat recor
a 800 14 28 12 18 NaN
900 47 34 59 49 0
b 800 33 18 24 33 NaN
900 18 25 44 19 2
...
I have created the function like:
x = df.loc[pd.IndexSlice[:,900], "r_value"]
conditions = [x >=70, np.logical_and(x >= 40, x < 70), \
np.logical_and(x >= 10, x < 40), x <10]
choices = [0, 1, 2, 3]
BB = np.select(conditions, choices)
So now I need to append BB as a column, filling with NaNs the rows corresponding to recor==800. How can I do it? I have tried a couple of ideas (not commented here) without result. Thx.
Try
df.loc[df.index.get_level_values('recor')==900, 'BB'] = BB
the part df.index.get_level_values('recor')==900 creates a boolean array with True where the index level "recor" equals 900
indexing using a columns that does not already exist, ie "BB" creates that new column.
The rest of the column should automatically be filled with NaN.
I cant test it since you didn't include a minimal reproducible example.

How do I make a Dataframe of columns and unique values stacked?

I have a large data frame that I would like to develop a summation table from. In other words, column 1 would be the columns of the first data frame, column 2 would be each unique value of each column and column three thru ... would be a summation of different variables I choose. Like the below:
Variable Level Summed_Column
Here is some sample code:
data = {"name": ['bob', 'john', 'mary', 'timmy']
, "age": [32, 32, 29, 28]
, "location": ['philly', 'philly', 'philly', 'ny']
, "amt": [100, 2000, 300, 40]}
df = pd.DataFrame(data)
df.head()
So the output in the above example would be as follows:
Variable Level Summed_Column
Name Bob 100
Name john 2000
Name Mary 300
Name timmy 40
age 32 2100
age 29 300
age 29 40
location philly 2400
location ny 40
I'm not even sure where to start. The actual dataframe has 32 columns in which 4 will be summed and 28 put into the variable and Level format.
You don't need a loop for this and concatenation, you can do this in one go by combining melt with groupby and using the agg method:
final = df.melt(value_vars=['name', 'age', 'location'], id_vars='amt')\
.groupby(['variable', 'value']).agg({'amt':'sum'})\
.reset_index()
Which yields:
print(final)
variable value amt
0 age 28 40
1 age 29 300
2 age 32 2100
3 location ny 40
4 location philly 2400
5 name bob 100
6 name john 2000
7 name mary 300
8 name timmy 40
ok #Datanovice. I figured out how to do this using a for loop w/ pd.melt.
id = ['name', 'age', 'location']
final = pd.DataFrame(columns = ['variable', 'value', 'amt'])
for i in id:
table = df.groupby(i).agg({'amt':'sum'}).reset_index()
table2 = pd.melt(table, value_vars = i, id_vars = ['amt'])
final = pd.concat([final, table2])
print(final)

How to multiply iteratively down a column?

I am having a tough time with this one - not sure why...maybe it's the late hour.
I have a dataframe in pandas as follows:
1 10
2 11
3 20
4 5
5 10
I would like to calculate for each row the multiplicand for each row above it. For example, at row 3, I would like to calculate 10*11*20, or 2,200.
How do I do this?
Use cumprod.
Example:
df = pd.DataFrame({'A': [10, 11, 20, 5, 10]}, index=range(1, 6))
df['cprod'] = df['A'].cumprod()
Note, since your example is just a single column, a cumulative product can be done succinctly with a Series:
import pandas as pd
s = pd.Series([10, 11, 20, 5, 10])
s
# Output
0 10
1 11
2 20
3 5
4 10
dtype: int64
s.cumprod()
# Output
0 10
1 110
2 2200
3 11000
4 110000
dtype: int64
Kudos to #bananafish for locating the inherent cumprod method.