How to remove duplicates items in list (Raku) - raku

FAQ: In Raku, how to remove duplicates from a list to only get unique values?
my $arr = [1, 2, 3, 2, 3, 1, 1, 0];
# desired output [1, 2, 3, 0]

Use The built-in unique
#arr.unique # (1 2 3 0)
Use a Hash (alias map, dictionary)
my %unique = map {$_ => 1}, #arr;
%unique.keys; # (0 1 2 3) do not rely on order
Use a Set: same method as before but in one line and optimized by the dev team
set(#arr).keys
Links:
Answer on Roseta Code
Hash solution on Think Perl6
Same question for Perl, Python -> always same methods: a Hash or a Set

Related

Initialize a list in cells in specific indexes (the indexes are in a list)

I have a list of indexes in each of which I need to initialize a list in a specific column. I tried this:
index = [0, 1, 2, 3, 4]
dataframe.at[indexes, 'column_x'] = [] * len(indexes)
which resulted in the error message:
pandas.errors.InvalidIndexError: Int64Index([0, 1, 2, 3, 4], dtype='int64')
I tried using loc and iloc instead of at, which also resulted in errors. I couldn't find relevant solutions.
Any suggestions will be welcomed.
Thanks!
You can create an empty series with [] then use combine_first to fill right index:
sr = pd.Series([[]] * len(df))
df['column_x'] = df['column_x'].mask(df.index.isin(index)).combine_first(sr)

What does the [1] do when using .where()?

I m practicing on a Data Cleaning Kaggle excercise.
In parsing dates example I can´t figure out what the [1] does at the end of the indices object.
Thanks..
# Finding indices corresponding to rows in different date format
indices = np.where([date_lengths == 24])[1]
print('Indices with corrupted data:', indices)
earthquakes.loc[indices]
As described in the documentation, numpy.where called with a single argument is equivalent to calling np.asarray([date_lengths == 24]).nonzero().
numpy.nonzero return a tuple with as many items as the dimensions of the input array with the indexes of the non-zero values.
>>> np.nonzero([1,0,2,0])
(array([0, 2]),)
Slicing [1] enables to get the second element (i.e. second dimension) but as the input was wrapped into […], this is equivalent to doing:
np.where(date_lengths == 24)[0]
>>> np.nonzero([1,0,2,0])[0]
array([0, 2])
It is an artefact of the extra [] around the condition. For example:
a = np.arange(10)
To find, for example, indices where a>3 can be done like this:
np.where(a > 3)
gives as output a tuple with one array
(array([4, 5, 6, 7, 8, 9]),)
So the indices can be obtained as
indices = np.where(a > 3)[0]
In your case, the condition is between [], which is unnecessary, but still works.
np.where([a > 3])
returns a tuple of which the first is an array of zeros, and the second array is the array of indices you want
(array([0, 0, 0, 0, 0, 0]), array([4, 5, 6, 7, 8, 9]))
so the indices are obtained as
indices = np.where([a > 3])[1]

Sort one list from another list in TensorFlow

I have two tf.Tensors A: [x0, x1, x2, x3, x4] and B: [2, 2, 1, 3, 2]. I would like to sort A using B.
Basically I would like to do the following, but using only TF operators:
list1, list2 = zip(*sorted(zip(list1, list2)))
I tried tf.sort() with tf.stack, but it seem to sort each dimension independently. I think I need to use tf.argsort similarly to this answer Sort array's rows by another array in Python but the indexing fails as tensor indexing do not seems to be supported.
I think I found the solution:
list1 = [2, 2, 1, 3, 2]
list2 = [0, 1, 2, 3, 4]
ids = tf.argsort(list1)
out = tf.gather(list2, ids) # [2, 0, 1, 4, 3]

Python 3.x How to access the element of the index before the index of element of my i in for loop?

a = [1,2,3,4,5]
for i in a:
list1.append(i)
list1.append(i-2) `i-2` is not functioning why?
Like for example, I am now in the index of element 4
i is not an index. It is the element present in the list itself. When you say you are now in the index of element 4, you are actually having element 4 not the index. So, you cannot treat like an index.
Python For loop is an interator based loop. It is used to step through items in lists, strings etc
The code:
a = [1,2,3,4,5]
list1 = []
for i in a:
print(i)
list1.append(i)
list1.append(i-2)
print list1
will produce the below output:
1
2
3
4
5
[1, -1, 2, 0, 3, 1, 4, 2, 5, 3]

Get indices for values of one array in another array

I have two 1D-arrays containing the same set of values, but in a different (random) order. I want to find the list of indices, which reorders one array according to the other one. For example, my 2 arrays are:
ref = numpy.array([5,3,1,2,3,4])
new = numpy.array([3,2,4,5,3,1])
and I want the list order for which new[order] == ref.
My current idea is:
def find(val):
return numpy.argmin(numpy.absolute(ref-val))
order = sorted(range(new.size), key=lambda x:find(new[x]))
However, this only works as long as no values are repeated. In my example 3 appears twice, and I get new[order] = [5 3 3 1 2 4]. The second 3 is placed directly after the first one, because my function val() does not track which 3 I am currently looking for.
So I could add something to deal with this, but I have a feeling there might be a better solution out there. Maybe in some library (NumPy or SciPy)?
Edit about the duplicate: This linked solution assumes that the arrays are ordered, or for the "unordered" solution, returns duplicate indices. I need each index to appear only once in order. Which one comes first however, is not important (neither possible based on the data provided).
What I get with sort_idx = A.argsort(); order = sort_idx[np.searchsorted(A,B,sorter = sort_idx)] is: [3, 0, 5, 1, 0, 2]. But what I am looking for is [3, 0, 5, 1, 4, 2].
Given ref, new which are shuffled versions of each other, we can get the unique indices that map ref to new using the sorted version of both arrays and the invertibility of np.argsort.
Start with:
i = np.argsort(ref)
j = np.argsort(new)
Now ref[i] and new[j] both give the sorted version of the arrays, which is the same for both. You can invert the first sort by doing:
k = np.argsort(i)
Now ref is just new[j][k], or new[j[k]]. Since all the operations are shuffles using unique indices, the final index j[k] is unique as well. j[k] can be computed in one step with
order = np.argsort(new)[np.argsort(np.argsort(ref))]
From your original example:
>>> ref = np.array([5, 3, 1, 2, 3, 4])
>>> new = np.array([3, 2, 4, 5, 3, 1])
>>> np.argsort(new)[np.argsort(np.argsort(ref))]
>>> order
array([3, 0, 5, 1, 4, 2])
>>> new[order] # Should give ref
array([5, 3, 1, 2, 3, 4])
This is probably not any faster than the more general solutions to the similar question on SO, but it does guarantee unique indices as you requested. A further optimization would be to to replace np.argsort(i) with something like the argsort_unique function in this answer. I would go one step further and just compute the inverse of the sort:
def inverse_argsort(a):
fwd = np.argsort(a)
inv = np.empty_like(fwd)
inv[fwd] = np.arange(fwd.size)
return inv
order = np.argsort(new)[inverse_argsort(ref)]