Trying to create a Seaborn heatmap from a Pandas Dataframe - pandas

This is first time trying this. I actually have a dict of lists I am generating in a program, but since this is my first time ever trying this, I am using a dummy dict just for testing.
I am following this:
python Making heatmap from DataFrame
but I am failing with the following:
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 20, in <module>
sns.heatmap(df, cmap='RdYlGn_r', linewidths=0.5, annot=True)
File "C:\Users\Mark\AppData\Roaming\Python\Python36\site-packages\seaborn\matrix.py", line 517, in heatmap
yticklabels, mask)
File "C:\Users\Mark\AppData\Roaming\Python\Python36\site-packages\seaborn\matrix.py", line 168, in __init__
cmap, center, robust)
File "C:\Users\Mark\AppData\Roaming\Python\Python36\site-packages\seaborn\matrix.py", line 205, in _determine_cmap_params
calc_data = plot_data.data[~np.isnan(plot_data.data)]
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
My code:
import pandas as pd
import seaborn as sns
Index = ['key1', 'key2', 'key3', 'key4', 'key5']
Cols = ['A', 'B', 'C', 'D']
testdict = {
"key1": [1, 2, 3, 4],
"key2": [5, 6, 7, 8],
"key3": [9, 10, 11, 12],
"key4": [13, 14, 15, 16],
"key5": [17, 18, 19, 20]
}
df = pd.DataFrame(testdict, index=Index, columns=Cols)
df = df.transpose()
sns.heatmap(df, cmap='RdYlGn_r', linewidths=0.5, annot=True)

You need to switch your column and index labels
Cols = ['key1', 'key2', 'key3', 'key4', 'key5']
Index = ['A', 'B', 'C', 'D']

Related

Merging many multiple dataframes within a list into one dataframe

i have several dataframes, with all the same columns, within one list that i would like to have within one dataframe.
For instance, i have these three dataframes here:
df1 = pd.DataFrame(np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]]),
columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.array([[11, 22, 33], [44, 55, 66], [77, 88, 99]]),
columns=['a', 'b', 'c'])
df3 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
within one list:
dfList = [df1,df2,df3]
I know i can use the following which provides me with exactly what I'm looking for:
df_merge = pd.concat([dfList[0],dfList[1],dfList[2]])
However, my in my actual data i have 100s of dataframes within a list, so I'm trying to find a way to loop through and concat:
dfList_all = pd.DataFrame()
for i in range(len(dfList)):
dfList_all = pd.concat(dfList[i])
I tried the following above, but it provides me with the following error:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
Any ideas would be wonderful. Thanks

Error trying to solve a matrix using numpy

Im trying to solve for x1 x2 x3 and x4 for this matrix but I keep getting errors.
Matrix A contains all the coefficients for x1 x2 x3 x4 respectively and Matrix B contains what it is equal to.
I wrote the following code which in theory should work but it keeps saying I provided 5 arguments or something like that
import numpy as np
a = np.matrix([2, 5, 6, 4], [5, 10, 9, 5], [7, 17.5, 21, 14], [0, 0, 2, 5])
b = np.matrix([23.5, 34, 82.25, -13])
x = np.linalg.solve(a,b)
print(x)
I shouldn't have to do this, since you should show the full traceback with the error:
In [396]: a = np.matrix([2, 5, 6, 4], [5, 10, 9, 5], [7, 17.5, 21, 14], [0, 0,
...: 2, 5])
...: b = np.matrix([23.5, 34, 82.25, -13])
...:
...: x = np.linalg.solve(a,b)
Traceback (most recent call last):
File "<ipython-input-396-710e1fc00100>", line 1, in <module>
a = np.matrix([2, 5, 6, 4], [5, 10, 9, 5], [7, 17.5, 21, 14], [0, 0, 2, 5])
TypeError: __new__() takes from 2 to 4 positional arguments but 5 were given
Look at that error message! See the np.matrix? Now go to np.matrix docs, and you'll see that the you need to provide ONE list of lists. And extra lists are interpreted as added arguments.
Thus you should use: (note the added [] - they are important.
In [397]: a = np.matrix([[2, 5, 6, 4], [5, 10, 9, 5], [7, 17.5, 21, 14], [0, 0,
...: 2, 5]])
...: b = np.matrix([23.5, 34, 82.25, -13])
...:
...: x = np.linalg.solve(a,b)
Traceback (most recent call last):
File "<ipython-input-397-b90e1785a311>", line 4, in <module>
x = np.linalg.solve(a,b)
File "<__array_function__ internals>", line 180, in solve
File "/usr/local/lib/python3.8/dist-packages/numpy/linalg/linalg.py", line 393, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
ValueError: solve: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)->(m,n) (size 1 is different from 4)
In [398]: a.shape
Out[398]: (4, 4)
In [399]: b.shape
Out[399]: (1, 4)
Note the shape of b. solve doesn't like that mix of shapes. A (4,1) would probably work. But since we looked at np.matrix docs, lets follow its recommendations, and switch to np.array:
In [400]: a = np.array([[2, 5, 6, 4], [5, 10, 9, 5], [7, 17.5, 21, 14], [0, 0,
...: 2, 5]])
...: b = np.array([23.5, 34, 82.25, -13])
...:
...: x = np.linalg.solve(a,b)
Traceback (most recent call last):
File "<ipython-input-400-b1b6c06db25c>", line 4, in <module>
x = np.linalg.solve(a,b)
File "<__array_function__ internals>", line 180, in solve
File "/usr/local/lib/python3.8/dist-packages/numpy/linalg/linalg.py", line 393, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
File "/usr/local/lib/python3.8/dist-packages/numpy/linalg/linalg.py", line 88, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix
In [401]: a
Out[401]:
array([[ 2. , 5. , 6. , 4. ],
[ 5. , 10. , 9. , 5. ],
[ 7. , 17.5, 21. , 14. ],
[ 0. , 0. , 2. , 5. ]])
In [402]: np.linalg.det(a)
Out[402]: 0.0
I assume you know enough linear algebra to understand that problem, and undertake your own fix.

Pandas Convert Data Type of List inside a Dictionary

I have the following data structure:
import pandas as pd
names = {'A': [20, 5, 20],
'B': [18, 7, 13],
'C': [19, 6, 18]}
I was able to convert the Data Type for A, B, C from an object to a string as follows:
df = df.astype({'Team-A': 'string', 'Team-B': 'string', 'Team-C': 'string'}, errors='raise')
How can I convert the data types in the list to float64?
You can convert the dictionary to a dataframe and then change the dataframe to float.
import pandas as pd
names = {'A': [20, 5],
'B': [18, 7],
'C': [19, 6]}
df=pd.DataFrame(names)
df.astype('float64')
If you dont want to use dataframe you can do like this
names={k:[float(i) for i in v] for k,v in names.items()}

How to return a list into a dataframe based on matching index of other column

I have a two data frames, one made up with a column of numpy array list, and other with two columns. I am trying to match the elements in the 1st dataframe (df) to get two columns, o1 and o2 from the df2, by matching based on index. I was wondering i can get some inputs.. please note the string 'A1' in column in 'o1' is repeated twice in df2 and as you may see in my desired output dataframe the duplicates are removed in column o1.
import numpy as np
import pandas as pd
array_1 = np.array([[0, 2, 3], [3, 4, 6], [1,2,3,6]])
#dataframe 1
df = pd.DataFrame({ 'A': array_1})
#dataframe 2
df2 = pd.DataFrame({ 'o1': ['A1', 'B1', 'A1', 'C1', 'D1', 'E1', 'F1'], 'o2': [15, 17, 18, 19, 20, 7, 8]})
#desired output
df_output = pd.DataFrame({ 'A': array_1, 'o1': [['A1', 'C1'], ['C1', 'D1', 'F1'], ['B1','A1','C1','F1']],
'o2': [[15, 18, 19], [19, 20, 8], [17,18,19,8]] })
# please note in the output, the 'index 0 of df1 has 0&2 which have same element i.e. 'A1', the output only shows one 'A1' by removing duplicated one.
I believe you can explode df and use that to extract information from df2, then finally join back to df
s = df['A'].explode()
df_output= df.join(df2.loc[s].groupby(s.index).agg(lambda x: list(set(x))))
Output:
A o1 o2
0 [0, 2, 3] [C1, A1] [18, 19, 15]
1 [3, 4, 6] [F1, D1, C1] [8, 19, 20]
2 [1, 2, 3, 6] [F1, B1, C1, A1] [8, 17, 18, 19]

how to vectorize an operation on a 1 dimensionsal array to produce 2 dimensional matrix in numpy

I have a 1d array of values
i = np.arange(0,7,1)
and a function
# Returns a column matrix
def fn(i):
return np.matrix([[i*2,i*3]]).T
fnv = np.vectorize(fn)
then writing
fnv(i)
gives me an error
File "<stdin>", line 1, in <module>
File "c:\Python33\lib\site-packages\numpy\lib\function_base.py",
line 1872, in __call__
return self._vectorize_call(func=func, args=vargs)
File "c:\Python33\lib\site-packages\numpy\lib\function_base.py",
line 1942, in _vectorize_call
copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.
The result I am looking for is a matrix with two rows and as many columns as in the input array. What is the best notation in numpy to achieve this?
For example i would equal
[1,2,3,4,5,6]
and the output would equal
[[2,4,6,8,10,12],
[3,6,9,12,15,18]]
EDIT
You should try to avoid using vectorize, because it gives the illusion of numpy efficiency, but inside it's all python loops.
If you really have to deal with user supplied functions that take ints and return a matrix of shape (2, 1) then there probably isn't much you can do. But that seems like a really weird use case. If you can replace that with a list of functions that take an int and return an int, and that use ufuncs when needed, i.e. np.sin instead of math.sin, you can do the following
def vectorize2(funcs) :
def fnv(arr) :
return np.vstack([f(arr) for f in funcs])
return fnv
f2 = vectorize2((lambda x : 2 * x, lambda x : 3 * x))
>>> f2(np.arange(10))
array([[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
[ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27]])
Just for your reference, I have timed this vectorization against your proposed one:
f = vectorize(fn)
>>> timeit.timeit('f(np.arange(10))', 'from __main__ import np, f', number=1000)
0.28073329263679625
>>> timeit.timeit('f2(np.arange(10))', 'from __main__ import np, f2', number=1000)
0.023139129945661807
>>> timeit.timeit('f(np.arange(10000))', 'from __main__ import np, f', number=10)
2.3620706288432984
>>> timeit.timeit('f2(np.arange(10000))', 'from __main__ import np, f2', number=10)
0.002757072593169596
So there is an order of magnitude in speed even for small arrays, that grows to a x1000 speed up, available almost for free, for larger arrays.
ORIGINAL ANSWER
Don't use vectorize unless there is no way around it, it's slow. See the following examples
>>> a = np.array(range(7))
>>> a
array([0, 1, 2, 3, 4, 5, 6])
>>> np.vstack((a, a+1))
array([[0, 1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6, 7]])
>>> np.vstack((a, a**2))
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 0, 1, 4, 9, 16, 25, 36]])
Whatever your function is, if it can be constructed with numpy's ufuncs, you can do something like np.vstack((a, f(a))) and get what you want
A simple reimplementation of vectorize gives me what I want
def vectorize( fn):
def do_it (array):
return np.column_stack((fn(p) for p in array))
return do_it
If this is not performant or there is a better way then let me know.