How to select multi range of rows in pandas dataframe [duplicate] - pandas

This question already has answers here:
Python pandas slice dataframe by multiple index ranges
(3 answers)
Slice multiple column ranges with Pandas
(1 answer)
Closed 5 years ago.
Given an example of pandas dataframe with index from 0 to 30. I would like to select the rows within several ranges of index, [0:5], [10:15] and [20:25].
How to do that?

Say you have a random pandas DataFrame with 30 rows and 4 columns as follows:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,30,size=(30, 4)), columns=list('ABCD'))
You can then use np.r_ to index into ranges of rows [0:5], [10:15] and [20:25] as follows:
df.loc[np.r_[0:5, 10:15, 20:25], :]

Related

How Should I skip Rows in Pandas DataFrame, that are above the Index Columns name? [duplicate]

This question already has answers here:
Pandas dataframe with multiindex column - merge levels
(4 answers)
Closed 2 months ago.
I want to set all column names at one line. How should I do it?
I tried many things but couldn't do it, including renaming columns.
You need to flatten your multiindex column header.
df.columns = df.columns.map('_'.join)
Or using f-string with list comprehension:
df.columns = [f'{i}_{j}' if j else f'{i}' for i, j in df.columns]

How can I replace value in cells in dataframe? [duplicate]

This question already has answers here:
replace() method not working on Pandas DataFrame
(9 answers)
Closed 3 months ago.
I have cells with values '...', I wanna replace them on 'NaN'. My dataframe called energy.
import pandas as pd
import numpy as np
file_name_energy = "data/Energy Indicators.xls"
energy = pd.read_excel(file_name_energy)
energy.replace('...', np.NaN)
I tried to use replace() but it doesn't work and dont output any error.
energy.head(10)
You need to reassign your dataframe or use inplace=True:
energy= energy.replace('...', np.NaN) #or energy.replace('...', np.NaN, inplace=True)
But since you're reading and Excel, why not using na_values parameter of pandas.read_excel ?
Try this :
energy = pd.read_excel(file_name_energy, na_values= ["..."])
From the documentation:
na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN.
If dict passed, specific per-column NA
values. By default the following values are interpreted as NaN: ‘’,
‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’,
‘1.#IND’, ‘1.#QNAN’, ‘’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’,
‘null’.

Seaborn boxplot for classification with pandas wide to long [duplicate]

This question already has answers here:
Boxplot of Multiple Columns of a Pandas Dataframe on the Same Figure (seaborn)
(4 answers)
Plotting grouped barplot using seaborn
(2 answers)
Can't plot time series with seaborn
(2 answers)
Plotting multiple boxplots in seaborn
(2 answers)
Pandas DataFrame.hist Seaborn equivalent
(5 answers)
Closed 9 months ago.
I have data that I would like to train an ml classifier on. The data is in wide format. I'd like to do a boxplot with searborn sns.boxplot(x='variable',y='value', hue='target', data=df_train). How do I reshape the data to be able to pass it to sns.boxplot?
Sample data
import pandas as pd
from sklearn import datasets
X, y = datasets.make_classification(n_samples=100, n_features=5, random_state=1)
df_train = pd.DataFrame(X)
df_train['y']=y
pd.melt is what you want to use.
dfg_train = df_train.melt(id_vars='y')
sns.boxplot(x='variable',y='value', hue='y', data=dfg_train)

count the number of strings in a 2-D pandas series [duplicate]

This question already has answers here:
How do I count the values from a pandas column which is a list of strings?
(5 answers)
Closed 11 months ago.
I am trying to count the number of characters in an uneven 2-D pandas series.
df = pd.DataFrame({ 'A' : [['a','b'],['a','c','f'],['a'], ['b','f']]}
I want to count the number of times each character is repeated.
any ideas?
You can use explode() and value_counts().
import pandas as pd
df = pd.DataFrame({ 'A' : [['a','b'],['a','c','f'],['a'], ['b','f']]})
df = df.explode("A")
print(df.value_counts())
Expected output:
A
a 3
b 2
f 2
c 1

How to convert ndarray to pandas DataFrame [duplicate]

This question already has answers here:
Convert two numpy array to dataframe
(3 answers)
Closed 3 years ago.
I have ndarray data with the shape of (231,31). now I want to convert this ndarray to pandas DataFrame with 31 columns. I am using this code:
for i in range (1,32):
dataset = pd.DataFrame({'Column{}'.format(i):data[:,i-1]})
but this code just creates the last column, it means with 231 indexes and just 1 column, but I need 31 columns. is there any way to fix this problem and why it happens?
Every time you are creating a new dataframe, that is why only the last column remains.
You need to create the dataframe with pd.DataFrame(data).