Configure PyCharm Debugger to Display Array/Tensor Shape? - intellij-idea

When I'm debugging with PyCharm, I'd like for the debugger to display the shapes of my NumPy arrays / Jax arrays / PyTorch Tensors. Instead, I see their values:
Is there a way to configure PyCharm's debugger so that the shapes of these multi-dimensional arrays is displayed?

I achieve this by hacking member function torch.Tensor.__repr__, because PyCharm debugger calls __repr__ to get string representation of the object.
import torch
old_repr = torch.Tensor.__repr__
def tensor_info(tensor):
return repr(tensor.shape)[6:] + ' ' + repr(tensor.dtype)[6:] + '#' + str(tensor.device) + '\n' + old_repr(tensor)
torch.Tensor.__repr__ = tensor_info
In PyCharm debugger, you will see representation like:
>>>print(torch.ones(3,3))
Size([3, 3]) float32#cpu
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Snapshot of my variable inspector:

Debugger just shows the objects in memory. So if you create an object representing shape, you should see it in the debugger.
For example, for numpy it's numpy.array().shape

Related

How to do one hot encoding in tft using hardcoded values

I want to apply one hot encoding to my categorical features. I see how one can use tf.one_hot to do that but one_hot accepts indices so I'd need to map the tokens to indices. But all of the examples I've found are computing the vocab over the entire dataset. I don't want to do that as I have hard-coded dict of possible values. Something like:
CATEG = {
'feature1': ['a', 'b', 'c'],
'feature2': ['foo', 'bar']
}
I just need the proprocessing_fn to simply map the tokens to an index then run it through tf.one_hot. How can I do that?
For example, tft.apply_vocabulary sounds like what I need but then I see that it takes a deferred_vocab_filename_tensor of type common_types.TemporaryAnalyzerOutputType? The description says:
The deferred vocab filename tensor as returned by tft.vocabulary, as long as the frequencies were not stored.
And I see that tft.vocabulary is again computing the vocab:
Computes The unique values taken by x, which can be a Tensor or CompositeTensor of any size. The unique values will be aggregated over all dimensions of x and all instances.
Why doesn't something simple like this exist?
The simplest option is probably to use tf.equal as follows
import tensorflow as tf
CATEG = {
'feature1': ['a', 'b', 'c'],
'feature2': ['foo', 'bar']
}
tokens = tf.constant(CATEG['feature2'])
inputs = tf.constant(["foo", "foo", "bar", "none"])
onehot = tf.cast(tf.expand_dims(tokens, 1) == tf.expand_dims(inputs, 0), dtype=tf.float32)
print(onehot)
# [[1., 1., 0., 0.],
# [0., 0., 1., 0.]]
Add batch dims if needed.

Astropy quantity in-place conversion

Is there a way to convert an astropy quantity to another set of units "in-place"? The to method always returns a copy so that's not so useful. Something like:
import astropy.units as u
data = [1, 2, 3]*u.g
data.convert_to('kg')
Both Pint and yt.units have in-place conversions:
from pint import UnitRegistry
u = UnitRegistry()
data = [1, 2, 3]*u.g
data.ito('kg')
and
from yt.units import g
data = [1, 2, 3]*g
data.convert_to_units('kg')
A cursory glance at the astropy docs and source code indicates that the answer is "no" but perhaps I'm missing something.
There's a few ways you can do it at the moment. Given your example:
>>> import astropy.units as u
>>> data = [1, 2, 3] * u.g
>>> data
<Quantity [1., 2., 3.] g>
You can do this:
>>> data.value * u.kg
<Quantity [1., 2., 3.] kg>
Or this:
>>> data * u.kg / data.unit
<Quantity [1., 2., 3.] kg>
Or this:
>>> data._unit = u.kg
>>> data
<Quantity [1., 2., 3.] kg>
None of these ways copy the Numpy array, so are OK performance-wise for many applications.
I don't think there is a method available so that setting data._unit becomes possible without reaching for the private data member. This was discussed a bit (in the context of Column and Quantity objects) here and I think the conclusion was that a set_unit method would be a useful addition, but it hasn't been implemented yet. So you could open an issue with that feature request here.

Get specific elements from ndarray of ndarrays with shape (n,)

Given the ndarray:
A = np.array([np.array([1], dtype='f'),
np.array([2, 3], dtype='f'),
np.array([4, 5], dtype='f'),
np.array([6], dtype='f'),
np.array([7, 8, 9], dtype='f')])
which displays as:
A
array([array([ 1.], dtype=float32), array([ 2., 3.], dtype=float32),
array([ 4., 5.], dtype=float32), array([ 6.], dtype=float32),
array([ 7., 8., 9.], dtype=float32)], dtype=object)
I am trying to create a new array from the first elements of each "sub-array" of A. To show you what I mean, below is some code creating the array that I want using a loop. I would like to achieve the same thing but as efficiently as possible, since my array A is quite large (~50000 entries) and I need to do the operation many times.
B = np.zeros(len(A))
for i, val in enumerate(A):
B[i] = val[0]
B
array([ 1., 2., 4., 6., 7.])
Here's an approach that concatenates all elements into an 1D array and then select the first elements by linear-indexing. The implementation would look like this -
lens = np.array([len(item) for item in A])
out = np.concatenate(A)[np.append(0,lens[:-1].cumsum())]
The bottleneck would be with the concatenation part, but that might be offsetted if there are huge number of elements with small lengths. So, the efficiency would depend on the format of the input array.
I suggest transforming your original jagged array of arrays into a single masked array:
B = np.ma.masked_all((len(A), max(map(len, A))))
for ii, row in enumerate(A):
B[ii,:len(row)] = row
Now you have:
[[1.0 -- --]
[2.0 3.0 --]
[4.0 5.0 --]
[6.0 -- --]
[7.0 8.0 9.0]]
And you can get the first column this way:
B[:,0].data

Disable numpy fancy indexing and assignment?

This post identifies a "feature" that I would like to disable.
Current numpy behavior:
>>> a = arange(10)
>>> a[a>5] = arange(10)
array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3])
The reason it's a problem: say I wanted an array to have two different sets of values on either side of a breakpoint (e.g., for making a "broken power-law" or some other simple piecewise function). I might accidentally do something like this:
>>> x = empty(10)
>>> a = arange(10)
>>> x[a<=5] = 0 # this is fine
>>> x[a>5] = a**2 # this is not
# but what I really meant is this
>>> x[a>5] = a[a>5]**2
The first behavior, x[a>5] = a**2 yields something I would consider counterintuitive - the left side and right side shapes disagree and the right side is not scalar, but numpy lets me do this assignment. As pointed out on the other post, x[5:]=a**2 is not allowed.
So, my question: is there any way to make x[a>5] = a**2 raise an Exception instead of performing the assignment? I'm very worried that I have typos hiding in my code because I never before suspected this behavior.
I don't know of a way offhand to disable a core numpy feature. Instead of disabling the behavior you could try using np.select:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html
In [110]: x = np.empty(10)
In [111]: a = np.arange(10)
In [112]: x[a<=5] = 0
In [113]: x[a>5] = a**2
In [114]: x
Out[114]: array([ 0., 0., 0., 0., 0., 0., 0., 1., 4., 9.])
In [117]: condlist = [a<=5,a>5]
In [119]: choicelist=[0,a**2]
In [120]: x = np.select(condlist,choicelist)
In [121]: x
Out[121]: array([ 0, 0, 0, 0, 0, 0, 36, 49, 64, 81])

How do I make set_printoptions(suppress=True) permanent?

In numpy there is a function that makes arrays print prettier.
set_printoptions(suppress = True)
In other words, instead of this:
array([[ 0.00000000e+00, -3.55271368e-16, 0.00000000e+00,
1.74443793e-16, 9.68149172e-17],
[ 5.08273978e-17, -4.42527959e-16, 1.57859836e-17,
1.35982590e-16, 5.59918137e-17],
[ 3.00000000e+00, 6.00000000e+00, 9.00000000e+00,
2.73835608e-16, 7.37061982e-17],
[ 2.00000000e+00, 4.00000000e+00, 6.00000000e+00,
4.50218574e-16, 2.87467529e-16],
[ 1.00000000e+00, 2.00000000e+00, 3.00000000e+00,
2.75582605e-16, 1.88929494e-16]])
You get this:
array([[ 0., -0., 0., 0., 0.],
[ 0., -0., 0., 0., 0.],
[ 3., 6., 9., 0., 0.],
[ 2., 4., 6., 0., 0.],
[ 1., 2., 3., 0., 0.]])
How do I make this setting permanent so it does this whenever I'm using IPython?
I added this to the main() function in ~/.ipython/ipy_user_conf.py:
from numpy import set_printoptions
set_printoptions(suppress = True)
and it seems to work.
In later IPython versions, run ipython profile create, then open ~\.ipython\profile_default\ipython_config.py and edit the following line to add the command:
c.InteractiveShellApp.exec_lines = [
...
'import numpy as np',
'np.set_printoptions(suppress=True)',
...
]
You can add those to your ipythonrc file (located in ~/.ipython on Unix). You'd need the lines:
import_mod numpy
execute numpy.set_printoptions(suppress = True)
You can also add it to a custom profile or use another configuration method:
http://ipython.scipy.org/doc/stable/html/config/customization.html
Well, one way to streamline it would be to create a wee module somewhere on your $PYTHONPATH, printopts say, containing:
import numpy
numpy.set_printoptions(suppress = True)
And then import that whenever you want to change the printing. You could also import numpy in your code as from printopts import numpy. That way you'd only need one import statement.
Cheers.
ADDENDUM: A solution for interactive use only is to set the $PYTHONSTARTUP environment variable to the path to the printopts.py file. The interpreter executes that file before anything else, when in interactive mode. Of course, then python will always load numpy, hurting start times.
Having considered a little more, what I would do is create a module, say np.py containing
from numpy import *
set_printoptions(supress=True)
Then always import np to get your modified version, for the reason in my comment below.
If you don't mind a hack, just add the set_printoptions() call to the numpy/__init__.py file, but you have to have write access to the numpy installation and remember to repeat the hack when you update python or numpy. I don't think this is a good idea.