Just add new (different) elements to the array in Ruby (Rails)? - sql

I want to create an array in Rails that contains every value of two columns but each just one time. So, for example, there is in column "A" {1,5,7,1,7} and in column "B" {3,2,3,1,4}.
When I just wanted an array with all elements of "A", I would write:
Model.uniq.pluck(:A)
And I would get {1,5,7}.
Is there an option in Rails to make the same thing with two columns, so just getting all values one time that are contained in two columns? (Here it would be {1,5,7,3,2,4})
Thanks for help!

Yup, pass multiple column names to pluck:
Model.pluck(:A, :B)
#=> [[1, 3], [5, 2], [7, 3], [1, 1], [7, 4]]
But of course you want the values together and uniqued so:
Model.pluck(:A, :B).flatten.uniq
#=> [1, 3, 5, 2, 7, 4]
Doing Model.uniq.pluck(:A, :B).flatten won’t work since it will just get distinct rows (i.e. combinations of A & B), so you’d still have to uniq again after flattening.

records = []
Model.all.map {|e| records << [e.A, e.B] }
uniq_records = records.flatten.uniq
Hope this would help you.
Thanks

Related

Sorting an array based on one column, then based on a second column

I would like sort an array based on one column, then for all the columns values that are equal - sort them based on a second column. For example: suppose that I have the array:
a = np.array([[0,1,1],[0,3,1],[1,7,2],[0,2,1]])
I can sort it by column 0 using:
sorted_array = a[np.argsort(a[:, 0])]
however, I want rows that have similar values at the [0] column to be sorted by the [1] column, so my result would look like:
desired_result = np.array([[0,1,1],[0,2,1],[0,3,1],[1,7,2]])
What is the best way to achieve that? Thanks.
You can sort them as tuple, then convert back to numpy array:
out = np.array(sorted(map(tuple,a)))
Output:
array([[0, 1, 1],
[0, 2, 1],
[0, 3, 1],
[1, 7, 2]])
You first sort the array in the secondary column, then you sort in the primary axis, making sure to use a stable sorting method.
sorted_array = a[np.argsort(a[:, 1])]
sorted_array = sorted_array[np.argsort(sorted_array[:, 0], kind='stable')]
Or you can use lexsort
sorted_array = a[np.lexsort((a[:,1], a[:, 0])), :]

Numpy, how to retrieve sub-array of array (specific indices)?

I have an array:
>>> arr1 = np.array([[1,2,3], [4,5,6], [7,8,9]])
array([[1 2 3]
[4 5 6]
[7 8 9]])
I want to retrieve a list (or 1d-array) of elements of this array by giving a list of their indices, like so:
indices = [[0,0], [0,2], [2,0]]
print(arr1[indices])
# result
[1,6,7]
But it does not work, I have been looking for a solution about it for a while, but I only found ways to select per row and/or per column (not per specific indices)
Someone has any idea ?
Cheers
Aymeric
First make indices an array instead of a nested list:
indices = np.array([[0,0], [0,2], [2,0]])
Then, index the first dimension of arr1 using the first values of indices, likewise the second:
arr1[indices[:,0], indices[:,1]]
It gives array([1, 3, 7]) (which is correct, your [1, 6, 7] example output is probably a typo).

Find records having at least one element of a given array in an array column

I use PG's array type to store some integers in a Order table:
Order
id: 1
array_column: [1, 2, 3]
id: 2
array_column: [3, 4, 5]
And I'd like to have a query returning all Orders having at least one element of a given array (let's say [3]) in array_column.
So for [3], it should return both orders since they both have 3 in array_column. For [4, 5], it should only return the second order since the first one doesn't have any element in common, and for [9, 10, 49], it shouldn't return anything.
How can I achieve this with ActiveRecord ? If it's not feasible, how can I do this using a plain SQL query ?

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?
Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

Numpy remove rows with same column values

How do I remove rows from ndarray arrays which have the same nth column value?
For eg,
a = np.ndarray([[1, 3, 4],
[1, 3, 4],
[1, 3, 5]])
And I want to have rows unique by third column.
I want to have just the [1, 3, 5] row left.
numpy.unique does not do it. It will check for uniqueness in every column; I can't specify the
column by which to check uniqueness.
How can I do this efficiently for thousand + rows?
Thank you.
You could try a combination of bincount, nonzero and in1d
import numpy as np
a = np.array([[1, 3, 4],
[1, 3, 4],
[1, 3, 5]])
#A tuple containing the values which are unique in column 3
unique_in_column = (np.bincount(a[:,2]) == 1).nonzero()
a[:,2] == unique_in_column[0]
unique_index = np.in1d(a[:,2], unique_in_column[0])
unique_a = a[unique_index]
This should do the trick. However, I'm not sure how this method scales with 1000+ rows.
I had done this finally:
repeatdict = {}
todel = []
for i, row in enumerate(kplist):
if repeatdict.get(row[2], 0):
todel.append(i)
else:
repeatdict[row[2]] = 1
kplist = np.delete(kplist, todel, axis=0)
Basically, I iterated over the list store the values of the third column, and if in the next iteration the same value is already found in the repeatdict dict, that row is marked for deletion, by storing its index in todel list.
Then we can get rid of the unwanted rows by calling np.delete with the list of all row indexes which we want to delete.
Also, I'm not picking my answer as the picked answer, because I know there's probably a better way to do this with just numpy magic.
I'll wait.