Reading in single str column with loadtxt - numpy

I have the file called mda_bk-adds-gro.inp:
# -*- mode:python -*-
0.5, 0.5, 0.5, walp_fixed.gro
0.5, 0.5, 0.4, walp.gro
I think I'll read the numbers and the the word separately. I've succeded in parsing the numbers:
loadtxt('mda_bk-adds-gro.inp', comments='#', delimiter=',', usecols=(0,1,2))
But can't read in just words:
loadtxt('mda_bk-adds-gro.inp', comments='#', delimiter=',', dtype=[('fileName', '|S100')], usecols=(3))
it gives an error:
TypeError: 'int' object is not iterable
So my question is - how do I read the forth column with loadtxt provided the column is str?

You get the TypeError because (3) is not a tuple, but just a parenthesized int-typed expression. Try usecols=(3,) instead.
See the comments at this issue for an explanation why this is so.

Related

DataFrame.apply(func, raw=True) doesn't seem to take effect?

I am trying to hash together only a few columns of my dataframe df so I do
temp = df['field1', 'field2']
df["hash"] = temp.apply(lambda x: hash(x), raw=True, axis=1)
I set raw to true because the doc (I am using 0.22) says it will pass a numpy array instead of a mutable Series but even with raw=True I am getting a Series, why?
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/frame.py", line 4973, in _apply_standard
results[i] = func(v)
File "/home/teto/mptcpanalyzer/mptcpanalyzer/data.py", line 190, in _hash_row
return hash(x)
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/generic.py", line 1045, in __hash__
' hashed'.format(self.__class__.__name__))
TypeError: ("'Series' objects are mutable, thus they cannot be hashed", 'occurred at index 1')
It's strange, as I can't reproduce your exact error (that is, by me, raw=True indeed results in an np.ndarray being passed). In any case, neither a Series nor a np.ndarray are hashable. The following works, though:
temp.apply(lambda x: hash(tuple(x)), axis=1)
A tuple is hashable.

Position Randomisation by shuffle () in psychopy

I have 4 text stimuli which I want to randomise their locations.
I did this at the beginning of routine
Posi=[’[4.95,0]’,’[-4.95,0]’,’[0,4.95]’,’[0,-4.95]’]
shuffle(Posi)
Then, turning to the builder, I typed
$Posi[0], $Posi[1]
in the ‘position’ column and so on, for the 4 stmuli. I also set that to ‘set every repeat’
But I keep getting this
ValueError: could not convert string to float: [-4.95,0]
I don’t understand how I should change the input, because there is no problem if I just plainly put [x,y] into position.
Thanks!
When you use those single quotes you are telling python that you are creating a string, that is a list of characters - not a number. Programs have types which say what a value is. '0.44' is a string of characters not a number.
>>> pos = [[0.2,0.0],[0.1,1]]
>>> pos[0]
[0.2, 0.0]
>>> pos[0][0]
0.2
>>> pos[0][0]+ 3.3
3.5
Produce a list of numerical coordinates, not strings
Like brittUWaterloo already stated, you are currently effectively creating a list of strings, not a list of lists (of coordinates), as you intended:
>>> pos = ['[4.95, 0]', '[-4.95, 0]', '[0, 4.95]', '[0, -4.95]']
>>> pos[0]
'[4.95, 0]'
>>> type(pos[0])
<class 'str'>
Note that I also changed the variable name and inserted spaces to produce more readable code that follows common coding style guidelines.
So, the first thing you need to do is remove the quotation marks to stop producing strings:
>>> pos = [[4.95, 0], [-4.95, 0], [0, 4.95], [0, -4.95]]
>>> pos[0]
[4.95, 0]
>>> type(pos[0])
<class 'list'>
Putting it to work in the Builder
Then, turning to the builder, I typed
$Posi[0], $Posi[1]
What you are trying to achieve here is, I believe, using the x, y coordinates of the very first element of the (shuffled) list of possible coordinates. I believe the current syntax is not fully correct; but let's have a closer look what would potentially happen if it were:
>>> pos[0], pos[1]
([4.95, 0], [-4.95, 0])
This would produce two coordinate pairs (the first two of the shuffled list). That's not what you want. You want the x and y coordinates of the first list pair only. To get the first coordinate pair only, you would to (in "pure" Python):
>>> pos[0]
[4.95, 0]
Or, in the Builder, you would enter
$pos[0]
into the respective coordinates field.
Summary
So to sum this up, in your Code component you need to do:
pos = [[4.95, 0], [-4.95, 0], [0, 4.95], [0, -4.95]]
shuffle(pos)
And as coordinate of the Text components, you can then use
$pos[0]

matplotlib scatter plot: How to use the data= argument

The matplotlib documentation for scatter() states:
In addition to the above described arguments, this function can take a data keyword argument. If such a data argument is given, the following arguments are replaced by data[]:
All arguments with the following names: ‘s’, ‘color’, ‘y’, ‘c’, ‘linewidths’, ‘facecolor’, ‘facecolors’, ‘x’, ‘edgecolors’.
However, I cannot figure out how to get this to work.
The minimal example
import matplotlib.pyplot as plt
import numpy as np
data = np.random.random(size=(3, 2))
props = {'c': ['r', 'g', 'b'],
's': [50, 100, 20],
'edgecolor': ['b', 'g', 'r']}
plt.scatter(data[:, 0], data[:, 1], data=props)
plt.show()
produces a plot with the default color and sizes, instead of the supplied one.
Anyone has used that functionality?
This seems to be an overlooked feature added about two years ago. The release notes have a short example (
https://matplotlib.org/users/prev_whats_new/whats_new_1.5.html#working-with-labeled-data-like-pandas-dataframes). Besides this question and a short blog post (https://tomaugspurger.github.io/modern-6-visualization.html) that's all I could find.
Basically, any dict-like object ("labeled data" as the docs call it) is passed in the data argument, and plot parameters are specified based on its keys. For example, you can create a structured array with fields a, b, and c
coords = np.random.randn(250, 3).view(dtype=[('a', float), ('b', float), ('c', float)])
You would normally create a plot of a vs b using
pyplot.plot(coords['a'], coords['b'], 'x')
but using the data argument it can be done with
pyplot.plot('a', 'b','x', data=coords)
The label b can be confused with a style string setting the line to blue, but the third argument clears up that ambiguity. It's not limited to x and y data either,
pyplot.scatter(x='a', y='b', c='c', data=coords)
Will set the point color based on column 'c'.
It looks like this feature was added for pandas dataframes, and handles them better than other objects. Additionally, it seems to be poorly documented and somewhat unstable (using x and y keyword arguments fails with the plot command, but works fine with scatter, the error messages are not helpful). That being said, it gives a nice shorthand when the data you want to plot has labels.
In reference to your example, I think the following does what you want:
plt.scatter(data[:, 0], data[:, 1], **props)
That bit in the docs is confusing to me, and looking at the sources, scatter in axes/_axes.py seems to do nothing with this data argument. Remaining kwargs end up as arguments to a PathCollection, maybe there is a bug there.
You could also set these parameters after scatter with the the various set methods in PathCollection, e.g.:
pc = plt.scatter(data[:, 0], data[:, 1])
pc.set_sizes([500,100,200])

Cython: storing unicode in numpy array

I'm new to cython, and I've been having a re-ocurring problem involving encoding unicode inside of a numpy array.
Here's an example of the problem:
import numpy as np
cimport numpy as np
cpdef pass_array(np.ndarray[ndim=1,dtype=np.unicode] a):
pass
cpdef access_unicode_item(np.ndarray a):
cdef unicode item = a[0]
Example errors:
In [3]: unicode_array = np.array([u"array",u"of",u"unicode"],dtype=np.unicode)
In [4]: pass_array(unicode_array)
ValueError: Does not understand character buffer dtype format string ('w')
In [5]: access_item(unicode_array)
TypeError: Expected unicode, got numpy.unicode_
The problem seems to be that the values are not real unicode, but instead numpy.unicode_ . Is there a way to encode the values in the array as proper unicode (so that I can type individual items for use in cython code)?
In Py2.7
In [375]: arr=np.array([u"array",u"of",u"unicode"],dtype=np.unicode)
In [376]: arr
Out[376]:
array([u'array', u'of', u'unicode'],
dtype='<U7')
In [377]: arr.dtype
Out[377]: dtype('<U7')
In [378]: type(arr[0])
Out[378]: numpy.unicode_
In [379]: type(arr[0].item())
Out[379]: unicode
In general x[0] returns an element of x in a numpy subclass. In this case np.unicode_ is a subclass of unicode.
In [384]: isinstance(arr[0],np.unicode_)
Out[384]: True
In [385]: isinstance(arr[0],unicode)
Out[385]: True
I think you'd encounter the same sort of issues between np.int32 and int. But I haven't worked enough with cython to be sure.
Where have you seen cython code that specifies a string (unicode or byte) dtype?
http://docs.cython.org/src/tutorial/numpy.html has expressions like
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
....
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f):
The purpose of the [] part is to improve indexing efficiency.
What we need to do then is to type the contents of the ndarray objects. We do this with a special “buffer” syntax which must be told the datatype (first argument) and number of dimensions (“ndim” keyword-only argument, if not provided then one-dimensional is assumed).
I don't think np.unicode will help because it doesn't specify character length. The full string dtype has to include the number of characters, eg. <U7 in my example.
We need to find working examples which pass string arrays - either in the cython documentation or other SO cython questions.
For some operations, you could treat the unicode array as an array of int32.
In [397]: arr.nbytes
Out[397]: 84
3 strings x 7 char/string * 4bytes/char
In [398]: arr.view(np.int32).reshape(-1,7)
Out[398]:
array([[ 97, 114, 114, 97, 121, 0, 0],
[111, 102, 0, 0, 0, 0, 0],
[117, 110, 105, 99, 111, 100, 101]])
Cython gives you the greatest speed improvement when you can bypass Python functions and methods. That would include bypassing much of the Python string and unicode functionality.

NumPy argmax and structured array error: expected a readable buffer object

I got the following error while using NumPy argmax method. Could some one help me to understand what happened:
import numpy as np
b = np.zeros(1, dtype={'names':['a','b'], 'formats': ['i4']*2})
b.argmax()
The error is
TypeError: expected a readable buffer object
While the following runs without a problem:
a = np.zeros(3)
a.argmax()
It seems the error dues to the structured array. But could you anyone help to explain the reason?
Your b is:
array([(0, 0)], dtype=[('a', '<i4'), ('b', '<i4')])
I get a different error message with argmax:
TypeError: Cannot cast array data from dtype([('a', '<i4'), ('b', '<i4')]) to dtype('V8') according to the rule 'safe'
But this works:
In [88]: b['a'].argmax()
Out[88]: 0
Generally you can't do math operations across the fields of a structured array. You can operate within each field (if it is numeric). Since the fields could be a mix of numbers, strings and other objects, so there's been no effort to handle special cases where such operations might make sense.
If you really must to operations across the fields, try a different view, eg:
In [94]: b.view('<i4').argmax()
Out[94]: 0