How can one assign an item contextualized Array to a positional? - raku

In Rakudo Perl 6 item or $ can be used to evaluate an expression in item context. See https://docs.perl6.org/routine/item
I am using a library that returns an item contextualized Array. What is the proper way to remove the contextualization so it may be assigned to a # variable?
For example:
my #a = $[<a b c>];
dd #a; # Outputs: Array #a = [["a", "b", "c"],]

Perl being Perl, there's more than one way to do it, such as
dd my # = #$[<a b c>]; # via original array, equivalent to .list
dd my # = $[<a b c>][]; # via original array, using zen slicing
dd my # = |$[<a b c>]; # via intermediate Slip, equivalent to .Slip
dd my # = $[<a b c>].flat; # via intermediate flattening Seq
The most clear solution is probably enforcing list context via # or .list, and I'd avoid the .flat call because it has slightly different semantic connotations.
Just as a reminder, note that list assignment is copying, but if you used one of the approaches that just pull out the original array from its scalar container, you could also use binding. However, in that case you wouldn't even need to manually decontainerize as
dd my # := $[<a b c>];
also gets you back your array as something list-y.

Flatten it:
my #a = $[<a b c>].flat;
dd #a; # Array #a = ["a", "b", "c"]

Related

Using CuPy/cuDF, remove elements that are not distant enough to their previous elements from a sorted list

The purpose of the code is similar to this post
I have a code that runs on CPUs:
import pandas as pd
def remove(s: pd.Series, thres:int):
pivot = -float("inf")
new_s = []
for e in s:
if (e-pivot)>thres:
new_s.append(e)
pivot=e
return pd.Series(new_s)
# s is an ascending sequence
s = pd.Series([0,1,2,4,6,9])
remove(s, thres=3)
# Out:
# 0 0
# 1 4
# 2 9
# dtype: int64
The input is an ascending sequence with integer values.
This function simply removes those points s[i] where d(s[i], s[i-1]) < thres
My problem is that CuPy/cuDF do not support loops, so I can't use GPUs to accelerate the code. I only have options like cumsum, diff, and mod that don't fit my needs.
Is there a function like scan in tensorflow?
The remove function can be reformulated in a form that is similar to prefix sum (scan):
For a sequence [a1, a2, a3], the output should be [a1, a1⨁a2, (a1⨁a2)⨁a3], and ⨁ is equal to
⨁=lambda x,y: x if (y-x)>thres else y
Then set(output) is what I want.
Note that (a1⨁a2)⨁a3 != a1⨁(a2⨁a3), in the absence of associative property, parallel computation might not be feasible.
Update
I found that there is already a function called Inclusive Scan, all I need is a python wrapper.
Or is there any other way?

OrderedDict contain list of DataFrame

I don't understand why method1 is fine but not the second ...
method 1
import pandas as pd
import collections
d = collections.OrderedDict([('key', []), ('key2', [])])
df = pd.DataFrame({'id': [1], 'test': ['ok']})
d['key'].append(df)
d
OrderedDict([('key', [ id test
0 1 ok]), ('key2', [])])
method 2
l = ['key', 'key2']
dl = collections.OrderedDict(zip(l, [[]]*len(l)))
dl
OrderedDict([('key', []), ('key2', [])])
dl['key'].append(df)
dl
OrderedDict([('key', [ id test
0 1 ok]), ('key2', [ id test
0 1 ok])])
dl == d True
The issue stems from creating empty lists like so: [[]] * len(l) what this is actually doing is copying the reference to the empty list multiple times. So what you end up with is a list of empty lists that all point to the same underlying object. When this happens, any change you make to the underlying list via inplace operations (such as append) will change the values inside of all references to that list.
The same type issue comes about when assigning variables to one another:
a = []
b = a
# `a` and `b` both point to the same underlying object.
b.append(1) # inplace operation changes underlying object
print(a, b)
[1], [1]
To circumvent your issue instead of using [[]] * len(l) you can use a generator expression or list comprehension to ensure a new empty list is created for each element in list l:
collections.OrderedDict(zip(l, ([] for _ in l))
using the generator expression ([] for _ in l) creates a new empty list for every element in l instead of copying the reference to a single empty list. The easiest way to check this is to use the id function to check the underlying ids of the objects. Here we'll compare your original method to the new method:
# The ids come out the same, indicating that the objects are reference to the same underlying list
>>> [id(x) for x in [[]] * len(l)]
[2746221080960, 2746221080960]
# The ids come out different, indicating that they point to different underlying lists
>>> [id(x) for x in ([] for _ in l)]
[2746259049600, 2746259213760]

How to obtain a matrix by adding one more new row vector within an iteration?

I am generating arrays (technically they are row vectors) with a for-loop. a, b, c ... are the outputs.
Can I add the new array to the old ones together to form a matrix?
import numpy as np
# just for example:
a = np.asarray([2,5,8,10])
b = np.asarray([1,2,3,4])
c = np.asarray([2,3,4,5])
... ... ... ... ...
I have tried ab = np.stack((a,b)), and this could work. But my idea is to always add a new row to the old matrix in a new loop, but with abc = np.stack((ab,c)) then there would be an error ValueError: all input arrays must have the same shape.
Can anyone tell me how I could add another vector to an already existing matrix? I couldn´t find a perfect answer in this forum.
np.stack wouldn't work, you can only stack arrays with same dimensions.
One possible solution is to use np.concatenate((original, to_append), axis = 0) each time. Check the docs.
You can also try using np.append.
Thanks to the ideas from everybody, the best solution of this problem is to append nparray or list to a list during the iteration and convert the list to a matrix using np.asarray in the end.
a = np.asarray([2,5,8,10]) # or a = [2,5,8,10]
b = np.asarray([1,2,3,4]) # b = [1,2,3,4]
c = np.asarray([2,3,4,5]) # c = [2,3,4,5]
... ...
l1 = []
l1.append(a)
l1.append(b)
l1.append(c)
... ...
l1don´t have to be empty, however, the elements which l1 already contained should be the same type as the a,b,c
For example, the difference between l1 = [1,1,1,1] and l1 = [[1,1,1,1]] is huge in this case.

Numpy index array of unknown dimensions?

I need to compare a bunch of numpy arrays with different dimensions, say:
a = np.array([1,2,3])
b = np.array([1,2,3],[4,5,6])
assert(a == b[0])
How can I do this if I do not know either the shape of a and b, besides that
len(shape(a)) == len(shape(b)) - 1
and neither do I know which dimension to skip from b. I'd like to use np.index_exp, but that does not seem to help me ...
def compare_arrays(a,b,skip_row):
u = np.index_exp[ ... ]
assert(a[:] == b[u])
Edit
Or to put it otherwise, I wan't to construct slicing if I know the shape of the array and the dimension I want to miss. How do I dynamically create the np.index_exp, if I know the number of dimensions and positions, where to put ":" and where to put "0".
I was just looking at the code for apply_along_axis and apply_over_axis, studying how they construct indexing objects.
Lets make a 4d array:
In [355]: b=np.ones((2,3,4,3),int)
Make a list of slices (using list * replicate)
In [356]: ind=[slice(None)]*b.ndim
In [357]: b[ind].shape # same as b[:,:,:,:]
Out[357]: (2, 3, 4, 3)
In [358]: ind[2]=2 # replace one slice with index
In [359]: b[ind].shape # a slice, indexing on the third dim
Out[359]: (2, 3, 3)
Or with your example
In [361]: b = np.array([1,2,3],[4,5,6]) # missing []
...
TypeError: data type not understood
In [362]: b = np.array([[1,2,3],[4,5,6]])
In [366]: ind=[slice(None)]*b.ndim
In [367]: ind[0]=0
In [368]: a==b[ind]
Out[368]: array([ True, True, True], dtype=bool)
This indexing is basically the same as np.take, but the same idea can be extended to other cases.
I don't quite follow your questions about the use of :. Note that when building an indexing list I use slice(None). The interpreter translates all indexing : into slice objects: [start:stop:step] => slice(start, stop, step).
Usually you don't need to use a[:]==b[0]; a==b[0] is sufficient. With lists alist[:] makes a copy, with arrays it does nothing (unless used on the RHS, a[:]=...).

How to get pandas.read_csv not to perform any conversions?

For example, the values in '/tmp/test.csv' (namely, 01, 02, 03) are meant to represent strings that happen to match /^\d+$/, as opposed to integers:
In [10]: print open('/tmp/test.csv').read()
A,B,C
01,02,03
By default, pandas.read_csv converts these values to integers:
In [11]: import pandas
In [12]: pandas.read_csv('/tmp/test.csv')
Out[12]:
A B C
0 1 2 3
I want to tell pandas.read_csv to leave all these values alone. I.e., perform no conversions whatsoever. Furthermore, I want this "please do nothing" directive to be applied across-the-board, without my having to specify any column names or numbers.
I tried this, which achieved nothing:
In [13]: import csv
In [14]: pandas.read_csv('/tmp/test.csv', quoting=csv.QUOTE_ALL)
Out[14]:
A B C
0 1 2 3
The only thing that worked was to define a big ol' ConstantDict class, and use an instance of it that always returns the identity function (lambda x: x) as the value for the converters parameter, and thereby trick pandas.read_csv into doing nothing:
In [15]: %cpaste
class ConstantDict(dict):
def __init__(self, value):
self.__value = value
def get(self, *args):
return self.__value
--
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
::::::
In [16]: pandas.read_csv('/tmp/test.csv', converters=ConstantDict(lambda x: x))
Out[16]:
A B C
0 01 02 03
That's a lot of gymnastics to get such a simple "please do nothing" request across. (It would be even more gymnastics if I were to make ConstantDict bullet-proof.)
Isn't there a simpler way to achieve this?
df = pd.read_csv('temp.csv', dtype=str)
From the docs:
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’). Use str or object to preserve and not interpret dtype.