sqlAlchemy dynamic where clause - dynamic

I have an array of dictionaries that contains an array for each value. The values of each dictionary are the conditions for an update where clause. Since the length of each array in the dictions is variable I need to be able to dynamically create the where clause.
I'd like to do something like below.
sqlAlUpdateList = []
indexHash = [ {1: [1, 6, 11]}, {2: [7, 12]}, {3: [3, 8, 13, 74]}
for (key, values) in indexHash.iteritems():
stmt = xtable.update().value(xtable.c.ykey=key).
where(or_(xtable.c.id == values))
sqlAlcUpdateList.append(stmt)
for sqlAlcCommand in sqlAlcUpdateList:
conn.execute(sqlAlcCommand)
I know this could be split into multiple update commands but I would like to create one command.

I think there are no reason to prefer one single sentence. You're assigning different values to different rows, so I think they are separated actions. But if someone could correct me I'd like to know how to do it!

Related

Rearranging numpy arrays

I was not able to find a duplicate of my question, unfortunately, although I am sure that this is a problem which has been solved before
I have a numpy array with a certain set of indices, eg.
ind1 = np.array([1, 3, 5, 7])
With these indices, I can filter some values from another array. Lets call this other array rows. As an example, I can retrieve
rows[ind1] = [1, 10, 20, 15]
The order of rows[ind1] must not be changed in the following.
I have another index array, ind2
ind2 = np.array([4, 5, 6, 7])
I also have an array cols, where I can filter values from using ind2. I know that cols[ind2] results in an array which has the size of rows[ind1] and the entries are the same, but the order is different. An example:
cols[ind2] = [15, 20, 10, 1]
I would like to rearrange the order of cols[ind2], so that it corresponds to rows[ind1]. I am interested in the corresponding order of ind2.
In the example, the result should be
cols[ind2] = [1, 10, 20, 15]
ind2 = [7, 6, 5, 4]
Using numpy, I did not find a way to do this. Any ideas would be helpful. Thanks in advance.
There may be a better way, but you can do this using argsorts.
Let's call your "reordered ind2" ind3.
If you are sure that rows[ind1] and cols[ind2] will have the same length and all of the same elements, then the sorted versions of both will be the same i.e np.sort(rows[ind1]) = np.sort(cols[ind2]).
If this is the case, and you don't run into any problems with repeated elements (unsure of your exact use case), then what you can do is find the indices to put cols[ind2] in order, and then from there, find the indices to put np.sort(cols[ind2]) into the order of rows[ind1].
So, if
p1 = np.argsort(rows[ind1])
and
p2 = np.argsort(cols[ind2])
and
p3 = np.argsort(p1)
Then
ind3 = ind2[p2][p3]. The reason this works is because if you do an argsort of an argsort, it gives you the indices you need to reverse the first sort. p2 sorts cols[ind2] (that's the definition of argsort), and p3 unsorts the result of that back into the order of rows[ind1].

Removing selected features from dataset

I am following this program: https://scikit-learn.org/dev/auto_examples/inspection/plot_permutation_importance_multicollinear.html
since I have a problem with highly correlated features in my model (different from that one shown in the example). In this step
selected_features = [v[0] for v in cluster_id_to_feature_ids.values()]
I can get information on the features that I will need to remove from my classifier. They are given as numbers ([0, 3, 5, 6, 8, 9, 10, 17]). How can I get names of these features?
Ok, there are two different elements to this problem I think.
First, you need to get a list of the column names. In the example code you linked, it looks like the list of feature names is stored like this:
data.feature_names
Once you have the feature names, you'd need a way to loop through them and grab only the ones you want. Something like this should work:
columns = ['a', 'b', 'c', 'd']
keep_index = [0, 3]
new_columns = [columns[i] for i in keep_index]
new_columns
['a', 'b']

Get indices for values of one array in another array

I have two 1D-arrays containing the same set of values, but in a different (random) order. I want to find the list of indices, which reorders one array according to the other one. For example, my 2 arrays are:
ref = numpy.array([5,3,1,2,3,4])
new = numpy.array([3,2,4,5,3,1])
and I want the list order for which new[order] == ref.
My current idea is:
def find(val):
return numpy.argmin(numpy.absolute(ref-val))
order = sorted(range(new.size), key=lambda x:find(new[x]))
However, this only works as long as no values are repeated. In my example 3 appears twice, and I get new[order] = [5 3 3 1 2 4]. The second 3 is placed directly after the first one, because my function val() does not track which 3 I am currently looking for.
So I could add something to deal with this, but I have a feeling there might be a better solution out there. Maybe in some library (NumPy or SciPy)?
Edit about the duplicate: This linked solution assumes that the arrays are ordered, or for the "unordered" solution, returns duplicate indices. I need each index to appear only once in order. Which one comes first however, is not important (neither possible based on the data provided).
What I get with sort_idx = A.argsort(); order = sort_idx[np.searchsorted(A,B,sorter = sort_idx)] is: [3, 0, 5, 1, 0, 2]. But what I am looking for is [3, 0, 5, 1, 4, 2].
Given ref, new which are shuffled versions of each other, we can get the unique indices that map ref to new using the sorted version of both arrays and the invertibility of np.argsort.
Start with:
i = np.argsort(ref)
j = np.argsort(new)
Now ref[i] and new[j] both give the sorted version of the arrays, which is the same for both. You can invert the first sort by doing:
k = np.argsort(i)
Now ref is just new[j][k], or new[j[k]]. Since all the operations are shuffles using unique indices, the final index j[k] is unique as well. j[k] can be computed in one step with
order = np.argsort(new)[np.argsort(np.argsort(ref))]
From your original example:
>>> ref = np.array([5, 3, 1, 2, 3, 4])
>>> new = np.array([3, 2, 4, 5, 3, 1])
>>> np.argsort(new)[np.argsort(np.argsort(ref))]
>>> order
array([3, 0, 5, 1, 4, 2])
>>> new[order] # Should give ref
array([5, 3, 1, 2, 3, 4])
This is probably not any faster than the more general solutions to the similar question on SO, but it does guarantee unique indices as you requested. A further optimization would be to to replace np.argsort(i) with something like the argsort_unique function in this answer. I would go one step further and just compute the inverse of the sort:
def inverse_argsort(a):
fwd = np.argsort(a)
inv = np.empty_like(fwd)
inv[fwd] = np.arange(fwd.size)
return inv
order = np.argsort(new)[inverse_argsort(ref)]

How to filter the values of a model column, that don't exist in the database?

Given an array of model ids:
ids = [1, 2, 3]
How can I get an array of the ids, which do not already exist in the database? I can do:
ids.reject { |id| Model.exists?(id: id) }
but I don't want to make a separate database query for every id. What is the way to get the non-existent ids in a single database query?
You can use like given below if you are using rails 4 or above
ids = [1, 2, 3]
existing_ids = Model.where(id: ids).ids
ids - existing_ids
If you are using rails 3, you need
ids = [1, 2, 3]
existing_ids = Model.where(id: ids).pluck(:id)
ids - existing_ids
Read more about pluck and ids here and here respectively.
Thank you Stefan for pointing me about ids method.
TL;DR : If ids is the array of ids you want to check and MyModel is the name of your model then the following is an array of the ids that do not exist : ids - MyModel.where(id: ids).pluck(:id)
MyModel.where(id: ids) will return all records for MyModel that have an id matching one of the values of your ids array. But if you add .pluck(:id) will return an array of only the ids for those records. So MyModel.where(id: ids).pluck(:id) is the array of record ids that match your ids array values.
Then you can use the minus operator to make the difference of the two arrays. array1 - array2 returns only the elements of array1 that are not in array2.
So putting everything together you get ids - MyModel.where(id: ids).pluck(:id).

Just add new (different) elements to the array in Ruby (Rails)?

I want to create an array in Rails that contains every value of two columns but each just one time. So, for example, there is in column "A" {1,5,7,1,7} and in column "B" {3,2,3,1,4}.
When I just wanted an array with all elements of "A", I would write:
Model.uniq.pluck(:A)
And I would get {1,5,7}.
Is there an option in Rails to make the same thing with two columns, so just getting all values one time that are contained in two columns? (Here it would be {1,5,7,3,2,4})
Thanks for help!
Yup, pass multiple column names to pluck:
Model.pluck(:A, :B)
#=> [[1, 3], [5, 2], [7, 3], [1, 1], [7, 4]]
But of course you want the values together and uniqued so:
Model.pluck(:A, :B).flatten.uniq
#=> [1, 3, 5, 2, 7, 4]
Doing Model.uniq.pluck(:A, :B).flatten won’t work since it will just get distinct rows (i.e. combinations of A & B), so you’d still have to uniq again after flattening.
records = []
Model.all.map {|e| records << [e.A, e.B] }
uniq_records = records.flatten.uniq
Hope this would help you.
Thanks