Recently I moved from Matlab to python.
In Matlab, it is very convenient to check all the data content.
But in ipython, it is not the case. Besides using print() and saving to text file, is there any plugin or whatever that could check data the same way as Matlab's "Variable Bar"?
Sorry, I didn't make it clear. When the array size was large, print() or vars(),locals() mentioned by Baruchel would truncate the array like this even if there were non-zero values in the array:
'region': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
I searched and know that by setting the 'threshold' parameter to 'nan' would make print() print all the data out.
I am looking for something that will show the index and content in array without truncating. If there weren't any then I'll settle for print() or np.savetxt(). Just a little inconvenient.
Thanks for your time, Baruchel and Solo. I learned something new, but the magic command %who seems preferable to dir() for my purpose.
The spyder Python IDE has a MATLAB-like interface, including a variable explorer.
You can use the dir() function.
A nice post about the functionality: How to print all variables values when debugging Python with pdb, without specifying each variable?
You can use the vars() or locals() functions but the output isn't really nice.
Related
A = [[2,2,4,2,2,2]
[2,6,2,2,2,2]
[2,2,2,2,8,2]]
I want matrix B to be equal to:
B = [[0,0,4,0,0,0]
[0,6,0,0,0,0]
[0,0,0,0,8,0]]
So I want to find the maximum value of each row and replace other values with 0. Is there any way to do this without using for loops?
Thanks in advance for your comments.
Instead of looking at the argmax, you could take the max values for each row directly, then mask the elements which are lower and replace them with zeros:
Inplace this would look like (here True stands for keepdims=True):
>>> A[A < A.max(1, True)] = 0
>>> A
array([[0, 0, 4, 0, 0, 0],
[0, 6, 0, 0, 0, 0],
[0, 0, 0, 0, 8, 0]])
An out of place alternative is to use np.where:
>>> np.where(A == A.max(1, True), A, 0)
array([[0, 0, 4, 0, 0, 0],
[0, 6, 0, 0, 0, 0],
[0, 0, 0, 0, 8, 0]])
In François Chollet's Deep Learning with Python, appears this function:
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
I understand what this function does. This function is asked about in this quesion and in this question as well, also mentioned here, here, here, here, here & here. Despite being so wide-spread, this vectorization is, according to Chollet's book is done "manually for maximum clarity." I am interested whether there is a standard, not "manual" way of doing it.
Is there a standard Keras / Tensorflow / Scikit-learn / Pandas / Numpy implementation of a function which behaves very similarly to the function above?
Solution with MultiLabelBinarizer
Assuming sequences is an array of integers with maximum possible value upto dimension-1, we can use MultiLabelBinarizer from sklearn.preprocessing to replicate the behaviour of the function vectorize_sequences
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=range(dimension))
mlb.fit_transform(sequences)
Solution with Numpy broadcasting
Assuming sequences is an array of integers with maximum possible value upto dimension-1
(np.array(sequences)[:, :, None] == range(dimension)).any(1).view('i1')
Worked out example
>>> sequences
[[4, 1, 0],
[4, 0, 3],
[3, 4, 2]]
>>> dimension = 10
>>> mlb = MultiLabelBinarizer(classes=range(dimension))
>>> mlb.fit_transform(sequences)
array([[1, 1, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0, 0, 0]])
>>> (np.array(sequences)[:, :, None] == range(dimension)).any(1).view('i1')
array([[0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 0, 0, 0, 0]])
I have a matrix of size (456, 456). I would like to make it of size (460, 460) but adding a frame of two zeros all around it.
Here is an example with a smaller matrix. I would like to transform matrixsmall into matrixbig. What is the best way to do it? The original code operates on lots of data to it would be great to have an efficient solution. Thank you in advance for your help!
import numpy as np
matrixsmall = np.array([[1,2],[2,1]])
matrixbig = np.array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 2, 0, 0],
[0, 0, 2, 1, 0, 0],
[0 ,0 ,0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
np.pad(matrixsmall, (2,2), "constant", constant_values=(0,0))
will do the trick
I have a array from my teacher, he gave me an array like below:
array contains 0,1,none
[[1, 1, 0, 0, none, 0, 1], [1, 0, 0, 0, none,0, 1], [1, 1, none,0, 1, 0, none], [1,1,1,0,none, 0, 0], [1, 1,0, none, 0, 0,1]]
and asked me to duplicate the array ten times but however each column must have similar percent distribution say each column have no more than 8% percent compared with the origin array.
how should i achieve the goal?
I'm trying to do something with feature engineer. So, I try to use the method RFE of Sklearn to do with it. But after I got the dataset who returned by RFE, I have no idea, which features is choosed, and which featured are deleted. So, is there any solution can make me know that?
v = trainDF.loc[:,['A','B','C','D']].as_matrix()
t = trainDF.loc[:,['y']].values.ravel()
RFE(estimator=LogisticRegression(), n_features_to_select=3).fit_transform(v,t)
=>
array([[2, 0, 0],
[4, 0, 0],
[1, 0, 0],
...,
[2, 0, 0],
[1, 0, 0],
[3, 0, 0]])
You can use the RFE fitted object:
estimator = RFE(estimator=LogisticRegression(), n_features_to_select=3)
v_transform = estimator.fit_transform(v,t)
print(estimator.support_) # The mask of selected features.
print(estimator.ranking_) # The feature ranking