How to rename pandas dataframe column with another dataframe?

How to rename pandas dataframe column with another dataframe? - pandas

I really don't understand what I'm doing. I have two data frames. One has a list of column labels and another has a bunch of data. I want to just label the columns in my data with my column labels.
My Code:
airportLabels = pd.read_csv('airportsLabels.csv', header= None)
airportData = pd.read_table('airports.dat', sep=",", header = None)
df = DataFrame(airportData, columns = airportLabels)
When I do this, all the data turns into "NaN" and there is only one column anymore. I am really confused.

I think you need add parameter nrows to read_csv, if you need read only columns, remove header= None, because first row of csv is column names and then use parameter names in read_table with columns from DataFrame airportLabels :
import pandas as pd
import io
temp=u"""col1,col2,col3
1,5,4
7,8,5"""
#after testing replace io.StringIO(temp) to filename
airportLabels = pd.read_csv(io.StringIO(temp), nrows=0)
print airportLabels
Empty DataFrame
Columns: [col1, col2, col3]
Index: []
temp=u"""
a,d,f
e,r,t"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_table(io.StringIO(temp), sep=",", header = None, names=airportLabels.columns)
print df
col1 col2 col3
0 a d f
1 e r t

Related

allowing python to impoert csv with duplicate column names in python

i have a data frame that looks like this:
there are in total 109 columns.
when i import the data using the read_csv it adds ".1",".2" to duplicate names .
is there any way to go around it ?
i have tried this :
df = pd.read_csv(r'C:\Users\agns1\Downloads\treatment1.csv',encoding = "ISO-8859-1",
sep='|', header=None)
df = df.rename(columns=df.iloc[0], copy=False).iloc[1:].reset_index(drop=True)
but it changed the data frame and wasnt helpful.
this is what it did to my data
python:
excel:

Remove header=None, because it is used for avoid convert first row of file to df.columns and then remove . with digits from columns names:
df = pd.read_csv(r'C:\Users\agns1\Downloads\treatment1.csv',encoding="ISO-8859-1", sep=',')
df.columns = df.columns.str.replace('\.\d+$','')

Adding additional rows of missing combination in pandas dataframe

I have a column-D which has value of other column names [Col A, COl B , COL C] , i want to add additional rows of missing combination. My dataframe looks like below:
Original Data
import pandas as pd
data={'colA':[0,0,0],'ColB':[0,0,0] ,'ColC':[0,0,0],'ColD':['ColA','ColA','ColB'],'Target':[1,1,1]}
df=pd.DataFrame(data)
print(df)
I need resulting df as:
data={'colA':[0,0,0,0,0,0,0,0,0],'ColB':[0,0,0,0,0,0,0,0,0] ,'ColC':[0,0,0,0,0,0,0,0,0],'ColD':['ColA','ColB','ColC','ColA','ColB','ColC','ColB','ColA','ColC'],'Target':[1,0,0,1,0,0,1,0,0]}
df=pd.DataFrame(data)
print(df)
Resulting Data needed

Given contents of ColA,B,C are irrelevant and you just want to repeat values in ColD and Target it just becomes a dict comprehension right. Nothing to do with pandas
data={'colA':[0,0,0],'ColB':[0,0,0] ,'ColC':[0,0,0],'ColD':['ColA','ColA','ColB'],'Target':[1,1,1]}
df=pd.DataFrame(data)
pd.DataFrame({k:v*3
if k not in ["Target","ColD"]
else [1,0,0]*3
if k=="Target" else ["ColA","ColB", "ColC"]*3
for k,v in data.items()})

how to put first value in one column and remaining into other column?

ROCO2_CLEF_00001.jpg,C3277934,C0002978
ROCO2_CLEF_00002.jpg,C3265939,C0002942,C2357569
I want to make a pandas data frame from csv file.
I want to put first row entry(filename) into a column and give the column/header name "filenames", and remaining entries into another column name "class". How to do so?

in case your file hasn't a fixed number of commas per row, you could do the following:
import pandas as pd
csv_path = 'test_csv.csv'
raw_data = open(csv_path).readlines()
# clean rows
raw_data = [x.strip().replace("'", "") for x in raw_data]
print(raw_data)
# make split between data
raw_data = [ [x.split(",")[0], ','.join(x.split(",")[1:])] for x in raw_data]
print(raw_data)
# build the pandas Dataframe
column_names = ["filenames", "class"]
temp_df = pd.DataFrame(data=raw_data, columns=column_names)
print(temp_df)
filenames class
0 ROCO2_CLEF_00001.jpg C3277934,C0002978
1 ROCO2_CLEF_00002.jpg C3265939,C0002942,C2357569

Streamlit - Applying value_counts / groupby to column selected on run time

I am trying to apply value_counts method to a Dataframe based on the columns selected dynamically in the Streamlit app
This is what I am trying to do:
if st.checkbox("Select Columns To Show"):
all_columns = df.columns.tolist()
selected_columns = st.multiselect("Select", all_columns)
new_df = df[selected_columns]
st.dataframe(new_df)
The above lets me select columns and displays data for the selected columns. I am trying to see how could I apply value_counts/groupby method on this output in Streamlit app
If I try to do the below
st.table(new_df.value_counts())
I get the below error
AttributeError: 'DataFrame' object has no attribute 'value_counts'

I believe the issue lies in passing a list of columns to a dataframe. When you pass a single column in [] to a dataframe, you get back a pandas.Series object (which has the value_counts method). But when you pass a list of columns, you get back a pandas.DataFrame (which doesn't have value_counts method defined on it).

Can you try st.table(new_df[col_name].value_counts())
I think the error is because value_counts() is applicable on a Series and not dataframe.

You can try Converting ".value_counts" output to dataframe
If you want to apply on one single column
def value_counts_df(df, col):
"""
Returns pd.value_counts() as a DataFrame
Parameters
----------
df : Pandas Dataframe
Dataframe on which to run value_counts(), must have column `col`.
col : str
Name of column in `df` for which to generate counts
Returns
-------
Pandas Dataframe
Returned dataframe will have a single column named "count" which contains the count_values()
for each unique value of df[col]. The index name of this dataframe is `col`.
Example
-------
>>> value_counts_df(pd.DataFrame({'a':[1, 1, 2, 2, 2]}), 'a')
count
a
2 3
1 2
"""
df = pd.DataFrame(df[col].value_counts())
df.index.name = col
df.columns = ['count']
return df
val_count_single = value_counts_df(new_df, selected_col)
If you want to apply for all object columns in the dataframe
def valueCountDF(df, object_cols):
c = df[object_cols].apply(lambda x: x.value_counts(dropna=False)).T.stack().astype(int)
p = (df[object_cols].apply(lambda x: x.value_counts(normalize=True,
dropna=False)).T.stack() * 100).round(2)
cp = pd.concat([c,p], axis=1, keys=["Count", "Percentage %"])
return cp
val_count_df_cols = valueCountDF(df, selected_columns)
And Finally, you can use st.table or st.dataframe to show the dataframe in your streamlit app

How do I swap two (or more) columns in two different data tables? on pandas

new here and I am new to programming.
So.. as the title says I am trying to swap two full columns from two different files (columns has the same name but different data). I started this:
import numpy as np
import pandas as pd
from pandas import DataFrame
df = pd.read_csv('table1.csv', col_name= 'COL1')
df1 = pd.read_csv('table2.csv', col_name = 'COL1')
df1.COL1 = df.COL1
But now I am stack.. how do I select whole column and how can I print the new combined table to a new file (i.e table 3)?

You could perform the swapping by copying one column in a temporary one and deleting afterwards like follows
df1['temp'] = df1['COL1']
df1['COL1'] = df['COL1']
df['COL1'] = df1['temp']
del df1['temp']
and then writing the result via to_csv to a third CSV
df1.to_csv('table3.csv')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to rename pandas dataframe column with another dataframe? - pandas

Related

allowing python to impoert csv with duplicate column names in python

Adding additional rows of missing combination in pandas dataframe

how to put first value in one column and remaining into other column?

Streamlit - Applying value_counts / groupby to column selected on run time

How do I swap two (or more) columns in two different data tables? on pandas

Categories

Resources