I am trying to append dataframe subsets to a separate dataframe.
So far I have tried append to an empty dataframe, but it returns an empty dataframe.
search_ID = ['36962G3P7', 'B3V3W13', 'XS1903485800', 'EXLS']
Search_Results = pd.DataFrame()
for i in search_ID:
current_table = df.loc[df.isin([i]).any(axis=1)]
Search_Results.append(current_table)
#Returns an empty dataframe
When I print every iteration it shows that it is creating new dataframes for every list item.
Search_ID = ['36962G3P7', 'B3V3W13', 'XS1903485800', 'EXLS']
Search_Results = pd.DataFrame()
for i in Search_ID:
current_table = df.loc[df.isin([i]).any(axis=1)]
print(current_table)
#Returns 4 printed dataframes
When I append outside the loop, the table does append to the empty dataframe.
current_table = df.loc[df.isin(['36962G3P7']).any(axis=1)]
Search_Results.append(current_table)
#Returns a filled dataframe
The append function does not occur in place, so you need to reassign on each iteration.
Search_Results = Search_Results.append(current_table)
Depending on how many times you append, this can be very slow so you might instead consider:
Search_ID = ['36962G3P7', 'B3V3W13', 'XS1903485800', 'EXLS']
Search_Results = pd.concat([
df.loc[df.isin([i]).any(axis=1)]
for i in Search_ID
], axis=0)
Related
for row in range(1, len(df)):
try:
df_out, orthogroup, len_group = HOG_get_group_stats(df.loc[row, "HOG"])
temp_df = pd.DataFrame()
for id in range(len(df_out)):
print(" ")
temp_df = pd.concat([df, pd.DataFrame(df_out.iloc[id, :]).T], axis=1)
temp_df["HOG"] = orthogroup
temp_df["len_group"] = len_group
print(temp_df)
except:
print(row, "no")
Here I have a script that does the following:
Iterate over df and apply the HOG_get_group_stats function to the HOG column in df and then, get 3 variables as outputs. (Basically, the function creates some stats as a data frame called df_out, and extracts some information as two more columns called orthogroup, len_group)
Create an empty template called temp_df
Transpose the df_out data frame and make it one single row and then, concatenate with the df we used in the beginning as columns.
Add orthogroup, len_group columns to the end of the temp_df
Problem:
It prints out the data however, when I try to see the temp_df as a data frame it shows only a single row ( probably the last one) means that my concatenation of several data frames doesn't work.
Questions:
How can I iterate and then append a data frame as columns?
Is there an easier way to iterate over a data frame? (e.g. iterrow)
Is there a better way to transpose rows to columns in a data frame? ( e.g. pivot, melt)
Any help would be appreciated!!
You can find the sample files to df, df_out,temp_df and expected output_sample table here :
Sample_files
I would like to create a ((25520*43),3) pandas Dataframe in a for loop.
I created the dataframe like:
lst=['Region', 'GeneID', 'DistanceValue']
df=pd.DataFrame(index=lst).T
And now I want to fill 'Region', 43 times with 25520 values. Also GeneID and DistanceValue.
This is my for loop for that:
for i in range(43):
df.DistanceValue = np.sort(distance[i,:])
df.Region = np.ones(25520) * i
args = np.argsort(distance[i,:])
df.GeneID = ids[int(args[i])]
But than my df exists just of (25520, 3). So I just have the last iteration for 43 filled in.
How can I concat all iteration one to 43 in my df?
I can't reproduce your example but there are couple of corrections you can make:
lst=['Region', 'GeneID', 'DistanceValue']
df=pd.DataFrame(index=lst).T
region = []
for i in range(43):
region.append(np.ones(25520))
flat_list = [item for sublist in region for item in sublist]
df.Region = flat_list
First create a new list outside loop and then append values within loop in this list.
The flat_list will consolidate all 43 lists to one and then you can map it to the DataFrame. It is always easier to fill DataFrame values outside of loop.
Similarly you can update all 3 columns.
I have two populated DataFrames, df1 and df2. I also have an empty Dataframe (test):
df1 = pd.read_excel(xlpath1, sheetname='Sheet1')
df2 = pd.read_excel(xlpath2, sheetname='Sheet1')
test = pd.DataFrame()
I'd like to iterate through the rows of df1 and add those rows to the empty test Dataframe. When I try the following, I don't get any sort of error, but nothing is added to the test DataFrame:
for i, j in df1.iterrows():
test.append(j)
Any ideas? Do I need to add the proper columns to the test DataFrame first? My total end-goal is to iterate through multiple DataFrames and add only unique items to the empty DataFrame (ex, adding items that appear in one of the many DataFrames).
If are trying to append dataframe df1 on empty dataframe df2 you can use concat function of pandas.
test = pd.concat([df1, test], axis = 0)
axis = 0 ; for appending two dataframes row-wise
I have a Dataframe (called df) that has list of tickets worked for a given date. I have a script that runs each day where this df gets generated and I would like to have a new master dataframe (lets say df_master) that appends values form df to a new Dataframe. So anytime I view df_master I should be able to see all the tickets worked across multiple days. Also would like to have a new column in df_master that shows date when the row was inserted.
Given below is how df looks like:
1001
1002
1003
1004
I tried to perform concat but it threw an error
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "Series"
Update
df_ticket = tickets['ticket']
df_master = df_ticket
df_master['Date'] = pd.Timestamp('now').normalize()
L = [df_master,tickets]
master_df = pd.concat(L)
master_df.to_csv('file.csv', mode='a', header=False, index=False)
I think you need pass sequence to concat, obviously list is used:
objs : a sequence or mapping of Series, DataFrame, or Panel objects
If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised
L = [s1,s2]
df = pd.concat(L)
And it seems you pass only Series, so raised error:
df = pd.concat(s)
For insert Date column is possible set pd.Timestamp('now').normalize(), for master df I suggest create one file and append each day DataFrame:
df_ticket = tickets[['ticket']]
df_ticket['Date'] = pd.Timestamp('now').normalize()
df_ticket.to_csv('file.csv', mode='a', header=False, index=False)
df_master = pd.read_csv('file.csv', header=None)
I have two dataframes, one contains screen names/display names and another contains individuals, and I am trying to create a third dataframe that contains all the data from each dataframe in a new row for each time a last name appears in the screen name/display name. Functionally this will create a list of possible matching names. My current code, which works perfectly but very slowly, looks like this:
# Original Social Media Screen Names
# cols = 'userid','screen_name','real_name'
usernames = pd.read_csv('social_media_accounts.csv')
# List Of Individuals To Match To Accounts
# cols = 'first_name','last_name'
individuals = pd.read_csv('individuals_list.csv')
userid, screen_name, real_name, last_name, first_name = [],[],[],[],[]
for index1, row1 in individuals.iterrows():
for index2, row2 in usernames.iterrows():
if (row2['Screen_Name'].lower().find(row1['Last_Name'].lower()) != -1) | (row2['Real_Name'].lower().find(row1['Last_Name'].lower()) != -1):
userid.append(row2['UserID'])
screen_name.append(row2['Screen_Name'])
real_name.append(row2['Real_Name'])
last_name.append(row1['Last_Name'])
first_name.append(row1['First_Name'])
cols = ['UserID', 'Screen_Name', 'Real_Name', 'Last_Name', 'First_Name']
index = range(0, len(userid))
match_list = pd.DataFrame(index=index, columns=cols)
match_list = match_list.fillna('')
match_list['UserID'] = userid
match_list['Screen_Name'] = screen_name
match_list['Real_Name'] = real_name
match_list['Last_Name'] = last_name
match_list['First_Name'] = first_name
Because I need the whole row from each column, the list comprehension methods I have tried do not seem to work.
The thing you want is to iterate through a dataframe faster. Doing that with a list comprehension is, taking data out of a pandas dataframe, handling it using operations in python, then putting it back in a pandas dataframe. The fastest way (currently, with small data) would be to handle it using pandas iteration methods.
The next thing you want to do is work with 2 dataframes. There is a tool in pandas called join.
result = pd.merge(usernames, individuals, on=['Screen_Name', 'Last_Name'])
After the merge you can do your filtering.
Here is the documentation: http://pandas.pydata.org/pandas-docs/stable/merging.html