Convert column names to string in pandas dataframe - pandas

Can somebody explain how this works?
df.columns = list(map(str, df.columns))

Your code is not the best way to convert column names to string, use instead:
df.columns = df.columns.astype(str)
Your code:
df.columns = list(map(str, df.columns))
is equivalent to:
df.columns = [str(col) for col in df.columns]
map: for each item in df.columns (iterable), apply the function str on it but the map function returns an iterator, so you need to explicitly execute list to generate the list.

Related

How to resolve :first argument must be an iterable of pandas objects, you passed an object of type "DataFrame" in Pandas

I am getting this error:
first argument must be an iterable of pandas objects, you passed an object of type "DataFrame".
My code:
for f in
glob.glob("C:/Users/panksain/Desktop/aovaNALYSIS/CX AOV/Report*.csv"):
data = pd.concat(pd.read_csv(f,header = None, names = ("Metric Period", "")), axis=0, ignore_index=True)
concat takes a list of dataframes to concat with. You can first build the list and then do the concat at last:
dfs = []
for f in glob.glob("C:/Users/panksain/Desktop/aov aNALYSIS/CX AOV/Report*.csv"):
dfs.append(pd.read_csv(f,header = None, names = ("Metric Period", "")))
data = pd.concat(dfs, axis=0, ignore_index=True)

Pandas get row if column is a substring of string

I can do the following if I want to extract rows whose column "A" contains the substring "hello".
df[df['A'].str.contains("hello")]
How can I select rows whose column is the substring for another word? e.g.
df["hello".contains(df['A'].str)]
Here's an example dataframe
df = pd.DataFrame.from_dict({"A":["hel"]})
df["hello".contains(df['A'].str)]
IIUC, you could apply str.find:
import pandas as pd
df = pd.DataFrame(['hell', 'world', 'hello'], columns=['A'])
res = df[df['A'].apply("hello".find).ne(-1)]
print(res)
Output
A
0 hell
2 hello
As an alternative use __contains__
res = df[df['A'].apply("hello".__contains__)]
print(res)
Output
A
0 hell
2 hello
Or simply:
res = df[df['A'].apply(lambda x: x in "hello")]
print(res)

Pandas -change id to string using map and lambda

with this dictionary:
teams_dict = {'Tottenham':262, 'Liverpool': 263, 'Leeds': 264}
And having a column 'team_id' in df1 and a column 'team_name' in another df2, how do I change id to name in df1, using map() and lambda in pandas?
I've tried:
df1['team_name'] = df2['team_name'].map(lambda x: teams_dict[x])
But it does not work.
Try using apply instead of map:
df1['team_name'] = df2['team_name'].apply(lambda x: teams_dict[x])
It's worth mentioning that this is index-agnostic and only works if df1 and df2 are the same size, and are sorted the same way.

Quantile across rows and down columns using selected columns only [duplicate]

I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I'm searching for 'spike' in column names like 'spike-2', 'hey spike', 'spiked-in' (the 'spike' part is always continuous).
I want the column name to be returned as a string or a variable, so I access the column later with df['name'] or df[name] as normal. I've tried to find ways to do this, to no avail. Any tips?
Just iterate over DataFrame.columns, now this is an example in which you will end up with a list of column names that match:
import pandas as pd
data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)
spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)
Output:
['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']
Explanation:
df.columns returns a list of column names
[col for col in df.columns if 'spike' in col] iterates over the list df.columns with the variable col and adds it to the resulting list if col contains 'spike'. This syntax is list comprehension.
If you only want the resulting data set with the columns that match you can do this:
df2 = df.filter(regex='spike')
print(df2)
Output:
spike-2 spiked-in
0 1 7
1 2 8
2 3 9
This answer uses the DataFrame.filter method to do this without list comprehension:
import pandas as pd
data = {'spike-2': [1,2,3], 'hey spke': [4,5,6]}
df = pd.DataFrame(data)
print(df.filter(like='spike').columns)
Will output just 'spike-2'. You can also use regex, as some people suggested in comments above:
print(df.filter(regex='spike|spke').columns)
Will output both columns: ['spike-2', 'hey spke']
You can also use df.columns[df.columns.str.contains(pat = 'spike')]
data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)
colNames = df.columns[df.columns.str.contains(pat = 'spike')]
print(colNames)
This will output the column names: 'spike-2', 'spiked-in'
More about pandas.Series.str.contains.
# select columns containing 'spike'
df.filter(like='spike', axis=1)
You can also select by name, regular expression. Refer to: pandas.DataFrame.filter
df.loc[:,df.columns.str.contains("spike")]
Another solution that returns a subset of the df with the desired columns:
df[df.columns[df.columns.str.contains("spike|spke")]]
You also can use this code:
spike_cols =[x for x in df.columns[df.columns.str.contains('spike')]]
Getting name and subsetting based on Start, Contains, and Ends:
# from: https://stackoverflow.com/questions/21285380/find-column-whose-name-contains-a-specific-string
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
# from: https://cmdlinetips.com/2019/04/how-to-select-columns-using-prefix-suffix-of-column-names-in-pandas/
# from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html
import pandas as pd
data = {'spike_starts': [1,2,3], 'ends_spike_starts': [4,5,6], 'ends_spike': [7,8,9], 'not': [10,11,12]}
df = pd.DataFrame(data)
print("\n")
print("----------------------------------------")
colNames_contains = df.columns[df.columns.str.contains(pat = 'spike')].tolist()
print("Contains")
print(colNames_contains)
print("\n")
print("----------------------------------------")
colNames_starts = df.columns[df.columns.str.contains(pat = '^spike')].tolist()
print("Starts")
print(colNames_starts)
print("\n")
print("----------------------------------------")
colNames_ends = df.columns[df.columns.str.contains(pat = 'spike$')].tolist()
print("Ends")
print(colNames_ends)
print("\n")
print("----------------------------------------")
df_subset_start = df.filter(regex='^spike',axis=1)
print("Starts")
print(df_subset_start)
print("\n")
print("----------------------------------------")
df_subset_contains = df.filter(regex='spike',axis=1)
print("Contains")
print(df_subset_contains)
print("\n")
print("----------------------------------------")
df_subset_ends = df.filter(regex='spike$',axis=1)
print("Ends")
print(df_subset_ends)

Pandas rename columns with wildcard

My df looks like this:
Datum Zeit Temperatur[°C] Luftdruck Windgeschwindigkeit[m/s] Windrichtung[Grad] Relative Luftfeuchtigkeit[%] Globalstrahlung[W/m²]
Now i want to rename the columns like this:#
wetterdaten.rename(columns={'Temperatur%': 'Temperatur', 'Luftdruck[hPa]': 'Luftdruck'}, inplace=True)
Where % is a wildcard.
But of course it will not work like this.
The beginning of the column name is always the same in the log data,
but the ending is temporally changing.
You can filter the columns and fetch the name:
wetterdaten.rename(columns={wetterdaten.filter(regex='Temperatur.*').columns[0]: 'Temperatur',
wetterdaten.filter(regex='Luftdruck.*').columns[0]: 'Luftdruck'},
inplace=True)
You can use replace by dict, for wildcard use .* and for start of string ^:
d = {'^Temperatur.*': 'Temperatur', 'Luftdruck[hPa]': 'Luftdruck'}
df.columns = df.columns.to_series().replace(d, regex=True)
Sample:
cols = ['Datum', 'Zeit', 'Temperatur[°C]', 'Luftdruck' , 'Windgeschwindigkeit[m/s]',
'Windrichtung[Grad]', 'Relative Luftfeuchtigkeit[%]', ' Globalstrahlung[W/m²]']
df = pd.DataFrame(columns=cols)
print (df)
Empty DataFrame
Columns: [Datum, Zeit, Temperatur[°C], Luftdruck, Windgeschwindigkeit[m/s],
Windrichtung[Grad], Relative Luftfeuchtigkeit[%], Globalstrahlung[W/m²]]
Index: []
d = {'^Temperatur.*': 'Temperatur', 'Luftdruck.*': 'Luftdruck'}
df.columns = df.columns.to_series().replace(d, regex=True)
print (df)
Empty DataFrame
Columns: [Datum, Zeit, Temperatur, Luftdruck, Windgeschwindigkeit[m/s],
Windrichtung[Grad], Relative Luftfeuchtigkeit[%], Globalstrahlung[W/m²]]
Index: []
You may prepare a function for renaming you columns:
rename_columns(old_name):
if old_name == 'Temperatur':
new_name = old_name + whichever_you_wants # may be another function call
elif old_name == 'Luftdruck':
new_name = 'Luftdruck[hPa]'
else:
new_name = old_name
return new_name
and then use the .rename() method with that function as a parameter:
wetterdaten.rename(columns=rename_columns, inplace=True)
Also, you can try the below code; Which replaces #Item Code into Item Name without condition.
Code:
pd.rename(columns = {'#Item Code':'Item Name'}, inplace = True)