Dataframe-renaming multiply columns with othe same name - pandas

I have a dataframe with several columns with almost the same name and a number in the end (Hora1, Hora2, ..., Hora12).
I would like to change all column names to GAx, where x is a different number (GA01.0, GA01.1, ...).

Well, we can achieve the above output in many ways. One of the ways I will share here.
df.columns = [col.replace('Hora', 'GA01.') for col in df.columns]
Please check the screenshot for reference.

You can rename the columns by passing a list of column names:
columns = ['GA1.0','GA01.1']
df.columns = columns

You can try:
import re
df.columns = [re.sub('Hora', 'GA01.', x) for x in df.columns]

Related

reordering columns in vaex?

my question is how do I reorder columns in vaex. for example, I want the 5th column at number 1 and the first column at number 5, etc. I know we can use the reindex method in pandas, is there a way to mimic that in vaex. thanks for your help.
I think you can do this by selecting the columns in the order that you want.
import vaex
df = vaex.from_arrays(a=[1,2,3], b=[4,5,6])
display(df) # first a, then b
df = df[["b","a"]]
display(df) # first b, then a

Multiple column selection on a Julia DataFrame

Imagine I have the following DataFrame :
10 rows x 26 columns named A to Z
What I would like to do is to make a multiple subset of the columns by their name (not the index). For instance, assume that I want columns A to D and P to Z in a new DataFrame named df2.
I tried something like this but it doesn't seem to work :
df2=df[:,[:A,:D ; :P,:Z]]
syntax: unexpected semicolon in array expression
top-level scope at Slicing.jl:1
Any idea of the way to do it ?
Thanks for any help
df2 = select(df, Between(:A,:D), Between(:P,:Z))
or
df2 = df[:, All(Between(:A,:D), Between(:P,:Z))]
if you are sure your columns are only from :A to :Z you can also write:
df2 = select(df, Not(Between(:E, :O)))
or
df2 = df[:, Not(Between(:E, :O))]
Finally, you can easily find an index of the column using columnindex function, e.g.:
columnindex(df, :A)
and later use column numbers - if this is something what you would prefer.
In Julia you can also build Ranges with Chars and hence when your columns are named just by single letters yet another option is:
df[:, Symbol.(vcat('A':'D', 'P':'Z'))]

how to name colums?

I have a pandas Data Frame where some of the id's are repeated a few times. I've written this code:
df = df["id"].value_counts()
and got this output
What should I do to get something like in the following image?
Thanks
As Quang Hoang answered, value_counts set the column you count as the index. Therefore in order to get the id and the count as columns, you need to do 2 things:
Make the counts as column - to_frame(name='B')
Reset the index to make the ids another column which we'll rename to the desired name: .reset_index().rename(columns={'index': 'A'})
So in one line it'll be:
df = df["id"].value_counts().to_frame(name='B').reset_index().rename(columns={'index': 'A'})
Another possible way is:
col = list(["A", "B")]
df.columns = col

How to concat 3 dataframes with each into sequential columns

I'm trying to understand how to concat three individual dataframes (i.e df1, df2, df3) into a new dataframe say df4 whereby each individual dataframe has its own column left to right order.
I've tried using concat with axis = 1 to do this, but it appears not possible to automate this with a single action.
Table1_updated = pd.DataFrame(columns=['3P','2PG-3Io','3Io'])
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io])
Note that with the exception of get_table1_2P_max_3Io, which has two columns, all other dataframes have one column
For example,
get_table1_3P =
get_table1_2P_max_3Io =
get_table1_3Io =
Ultimately, i would like to see the following:
I believe you need first concat and tthen change order by list of columns names:
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io], axis=1)
Table1_updated = Table1_updated[['3P','2PG-3Io','3Io']]

Pandas get list of columns if columns name contains

I have written this code to show a list of column names in a dataframe if they contains 'a','b' ,'c' or 'd'.
I then want to say trim the first 3 character of the column name for these columns.
However its showing an error. Is there something wrong with the code?
ind_cols= [x for x in df if df.columns[df.columns.str.contains('|'.join(['a','b','c','d']))]]
df[ind_cols].columns=df[ind_cols].columns.str[3:]
Use list comprehension with if-else:
L = df.columns[df.columns.str.contains('|'.join(['a','b','c','d']))]
df.columns = [x[3:] if x in L else x for x in df.columns]
Another solution with numpy.where by boolean mask:
m = df.columns.str.contains('|'.join(['a','b','c','d']))
df.columns = np.where(m, df.columns.str[3:], df.columns)