Merging many multiple dataframes within a list into one dataframe - pandas

i have several dataframes, with all the same columns, within one list that i would like to have within one dataframe.
For instance, i have these three dataframes here:
df1 = pd.DataFrame(np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]]),
columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.array([[11, 22, 33], [44, 55, 66], [77, 88, 99]]),
columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
within one list:
dfList = [df1,df2,df3]
I know i can use the following which provides me with exactly what I'm looking for:
df_merge = pd.concat([dfList[0],dfList[1],dfList[2]])
However, my in my actual data i have 100s of dataframes within a list, so I'm trying to find a way to loop through and concat:
dfList_all = pd.DataFrame()
for i in range(len(dfList)):
dfList_all = pd.concat(dfList[i])
I tried the following above, but it provides me with the following error:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
Any ideas would be wonderful. Thanks

Related

Merge similar columns and add extracted values to dict

Given this input:
pd.DataFrame({'C1': [6, np.NaN, 16, np.NaN], 'C2': [17, np.NaN, 1, np.NaN],
'D1': [8, np.NaN, np.NaN, 6], 'D2': [15, np.NaN, np.NaN, 12]}, index=[1,1,2,2])
I'd like to combine columns beginning in the same letter (the Cs and Ds), as well as rows with same index (1 and 2), and extract the non-null values to the simplest representation without duplicates, which I think is something like:
{1: {'C': [6.0, 17.0], 'D': [8.0, 15.0]}, 2: {'C': [16.0, 1.0], 'D': [6.0, 12.0]}}
Using stack or groupby gets me part of the way there, but I feel like there is a more efficient way to do it.
You can rename columns by lambda function for first letters with aggregate lists after DataFrame.stack and then create nested dictionary in dict comprehension:
s = df.rename(columns=lambda x: x[0]).stack().groupby(level=[0,1]).agg(list)
d = {level: s.xs(level).to_dict() for level in s.index.levels[0]}
print (d)
{1: {'C': [6.0, 17.0], 'D': [8.0, 15.0]}, 2: {'C': [16.0, 1.0], 'D': [6.0, 12.0]}}

Pandas Convert Data Type of List inside a Dictionary

I have the following data structure:
import pandas as pd
names = {'A': [20, 5, 20],
'B': [18, 7, 13],
'C': [19, 6, 18]}
I was able to convert the Data Type for A, B, C from an object to a string as follows:
df = df.astype({'Team-A': 'string', 'Team-B': 'string', 'Team-C': 'string'}, errors='raise')
How can I convert the data types in the list to float64?
You can convert the dictionary to a dataframe and then change the dataframe to float.
import pandas as pd
names = {'A': [20, 5],
'B': [18, 7],
'C': [19, 6]}
df=pd.DataFrame(names)
df.astype('float64')
If you dont want to use dataframe you can do like this
names={k:[float(i) for i in v] for k,v in names.items()}

How to return a list into a dataframe based on matching index of other column

I have a two data frames, one made up with a column of numpy array list, and other with two columns. I am trying to match the elements in the 1st dataframe (df) to get two columns, o1 and o2 from the df2, by matching based on index. I was wondering i can get some inputs.. please note the string 'A1' in column in 'o1' is repeated twice in df2 and as you may see in my desired output dataframe the duplicates are removed in column o1.
import numpy as np
import pandas as pd
array_1 = np.array([[0, 2, 3], [3, 4, 6], [1,2,3,6]])
#dataframe 1
df = pd.DataFrame({ 'A': array_1})
#dataframe 2
df2 = pd.DataFrame({ 'o1': ['A1', 'B1', 'A1', 'C1', 'D1', 'E1', 'F1'], 'o2': [15, 17, 18, 19, 20, 7, 8]})
#desired output
df_output = pd.DataFrame({ 'A': array_1, 'o1': [['A1', 'C1'], ['C1', 'D1', 'F1'], ['B1','A1','C1','F1']],
'o2': [[15, 18, 19], [19, 20, 8], [17,18,19,8]] })
# please note in the output, the 'index 0 of df1 has 0&2 which have same element i.e. 'A1', the output only shows one 'A1' by removing duplicated one.
I believe you can explode df and use that to extract information from df2, then finally join back to df
s = df['A'].explode()
df_output= df.join(df2.loc[s].groupby(s.index).agg(lambda x: list(set(x))))
Output:
A o1 o2
0 [0, 2, 3] [C1, A1] [18, 19, 15]
1 [3, 4, 6] [F1, D1, C1] [8, 19, 20]
2 [1, 2, 3, 6] [F1, B1, C1, A1] [8, 17, 18, 19]

Create new panda dataframe with fixed distance using interpolate

I have a dataframe of the following form.
df = {'X': [0, 3, 6, 7, 8, 11],
'Y1': [8, 5, 4, 3, 2, 1.5],
'Y2': [1, 2, 4, 5, 5, 5]}
I would like to create a new dataframe where I use interpolate where 'X' is stepping in fixed steps [0, 2, 4, 6, 8, 10].
To find the new 'Y' values I need to find f(x)=Y1 and then I can evaluate for each step in X. But since I have many Y's I think there must be a more clever way to do this.
The solution I found was the following:
step_size = 0.25
no_steps = int(np.floor(max(b['X'])/step_size))
for i in range(0,no_steps+1):
b = b.append({'X' : 0.25*i, 'StepNo' : 10, 'PointNo' : 23+i}, ignore_index=True)
b = b.sort_values(['X'])
b = b.set_index(['X'])
c = b.interpolate('index')
c = c.reset_index()
c = c.sort_values(['PointNo'])
So first I define step size. Then I calculate number of steps. Then I append the steps into the dataframe. Sort the dataframe and reindex so I can use interpolate using 'index' as values.

Trying to create a Seaborn heatmap from a Pandas Dataframe

This is first time trying this. I actually have a dict of lists I am generating in a program, but since this is my first time ever trying this, I am using a dummy dict just for testing.
I am following this:
python Making heatmap from DataFrame
but I am failing with the following:
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 20, in <module>
sns.heatmap(df, cmap='RdYlGn_r', linewidths=0.5, annot=True)
File "C:\Users\Mark\AppData\Roaming\Python\Python36\site-packages\seaborn\matrix.py", line 517, in heatmap
yticklabels, mask)
File "C:\Users\Mark\AppData\Roaming\Python\Python36\site-packages\seaborn\matrix.py", line 168, in __init__
cmap, center, robust)
File "C:\Users\Mark\AppData\Roaming\Python\Python36\site-packages\seaborn\matrix.py", line 205, in _determine_cmap_params
calc_data = plot_data.data[~np.isnan(plot_data.data)]
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
My code:
import pandas as pd
import seaborn as sns
Index = ['key1', 'key2', 'key3', 'key4', 'key5']
Cols = ['A', 'B', 'C', 'D']
testdict = {
"key1": [1, 2, 3, 4],
"key2": [5, 6, 7, 8],
"key3": [9, 10, 11, 12],
"key4": [13, 14, 15, 16],
"key5": [17, 18, 19, 20]
}
df = pd.DataFrame(testdict, index=Index, columns=Cols)
df = df.transpose()
sns.heatmap(df, cmap='RdYlGn_r', linewidths=0.5, annot=True)
You need to switch your column and index labels
Cols = ['key1', 'key2', 'key3', 'key4', 'key5']
Index = ['A', 'B', 'C', 'D']