Matching the DataLoader API using SQL only

Matching the DataLoader API using SQL only - sql

One of the examples for using DataLoader with Knex shows something this:
user: new DataLoader(ids => db.table('users')
.whereIn('id', ids).select()
.then(rows => ids.map(id => rows.find(x => x.id === id)))),
The map there is so that the keys in the array of keys always match up with the objects in the array of results, e.g. if object with id 2 is missing:
array of keys: [1,2,3]
array of results: [object1, undefined, object3]
If you left the map out, you'd get an unbalanced input/output (e.g. when querying for missing ids):
array of keys: [1,2,3]
array of results [object1, object3]
Is there any way to do the map bit with pure SQL?

It means that database doesn't have row with id 2 so whereIn doesn't return more that 2 rows. There is no way to do that in pure SQL with whereIn. With multiple subqueries it can be done, but it would be bad solution.

Related

Select $id from ids field containg $id1,$id2,$id3

My model in Laravel has a linked_ids string field like this:
echo $model->linked_ids
1,2,3,4,5
I want to make a query that gets me all records with a given id in linked_ids.
Currently I have:
Model::where('linked_ids', 'LIKE', '%' . $model->id . '%');
but this selects me more than I want to (if ex: $model->id is 3 => selects: 1,32,67)\
How can I avoid this since I don't know what position the id will be nor will the ids be ordered? I would like to do this in eloquent but can also use something like DB::raw() to run sql queries.

Bad way to keep your ids but if you really can't change it, you could take advantage of LazyCollections and filter with php.
I'm sure there's a way to do it directly in MySQL (or whatever dbms you're using) but this is what I have.
$id = 3;
Model::cursor()
->filter(function ($model) use ($id) {
return in_array($id, explode(',', $model->linked_ids));
})
// then chain one of these methods
->first(); // returns the first match or null
->collect(); // returns an Illuminate\Support\Collection of the results after the filtering
->all(); // returns an array of Models after the filtering
->toArray(); // returns an array and transforms the models to arrays as well.
->toJson(); // returns a json string
Take notice that this will still do a SELECT * FROM table without any filtering (unless you chain some where methods before cursor() but it won't load any model into memory (which is usually the bottleneck for big queries in Laravel)

How to specify which key/value pairs to exclude in spaCy's Doc.to_disk(path, exclude=['user_data'])?

My nlp pipeline has some doc extensions that store 3 items (a string for file name and two dicts which map non-serializable objects). I'd like only to exclude the non-serializable key/value pairs in the user data, but keep the filename.
doc.to_disk(path, exclude=['user_data'])
works as expected, excluding all user data. There are apparently options to instead exclude either 'user_data_keys' or 'user_data_values' but I find no explanation of their usage, and furthermore I can't think of any good reason to store either all the keys without the values or all the values without the keys!
I would like to exclude both keys and values of only certain fields in the doc.user_data. If this is possible, how is it done?

You will need to specify which keys or values you want to exclude.
https://spacy.io/api/doc#serialization-fields
data = doc.to_bytes(exclude=["text", "tensor"])
doc.from_disk("./doc.bin", exclude=["user_data"])
Per this thread here, you can try the following work around:
def remove_unserializable_results(doc):
doc.user_data = {}
for x in dir(doc._):
if x in ['get', 'set', 'has']: continue
setattr(doc._, x, None)
for token in doc:
for x in dir(token._):
if x in ['get', 'set', 'has']: continue
setattr(token._, x, None)
return doc
nlp.add_pipe(remove_unserializable_results, last=True)

How to Convert foor loop to NHibernate Futures for performance

NHibernate Version: 3.4.0.4000
I'm currently working on optimizing our code so that we can reduce the number of round trips to the database and am looking at a for loop that is one of the culprits. I'm having a hard time figuring out how to batch all of these iterations into a future that gets executed once when sent to SQL Server. Essentially each iteration of the loop causes 2 queries to hit the database!
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = _session.Query<OptionVersion>().Where(x => x.Option.Id == choice.OptionId).OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault();
choice.ChoiceVersion = _session.Query<ChoiceVersion>().OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).Where(x => x.Choice.Id == choice.ChoiceId).FirstOrDefault();
}

One option is to extract OptionId and ChoiceId from all the LineItemChoices into two lists in local memory. Then issue just two queries, one for options and one for choices, giving these lists in .Where(x => optionIds.Contains(x.Option.Id)). This corresponds to SQL IN operator. This requires some postprocessing. You will get two result lists (transform to dictionary or lookup if you expect many results), that you need to process to populate the choice objects. This postprocessing is local and tends to be very cheap compared to database roundtrips. This option can be a bit tricky if the existing FirstOrDefault part is absolutely necessary. Do you expect there to be more than result for a single optionId? If not, this code could instead have used SingleOrDefault, which could just be dropped if converting to use IN-queries.
The other option is to use futures (https://nhibernate.info/doc/nhibernate-reference/performance.html#performance-future). For Linq it means to use ToFuture or ToFutureValue at the end, which also conflicts with FirstOrDefault I believe. The important thing is that you need to loop over all line item choices to initialize ALL queries BEFORE you access the value of any of them. So this is likely to also result in some postprocessing, where you would first store the future values in some list, and then in a second loop access the real value from each query to populate the line item choice.
If you to expect that the queries can yield more than one result (before applying FirstOrDefault), I think you can just use Take(1) instead, as that will still return an IQueryable where you can apply the future method.
The first option is probably the most efficient, since it will just be two queries and allow the database engine to make just one pass over the tables.
Keep the limit on the maximum number of parameters that can be given in an SQL query in mind. If there can be thousands of line item choices, you may need to split them in batches and query for at most 2000 identifiers per round trip.

Adding on the Oskar answer, NHibernate Futures was implement in NHibernate 2.1. It is available on method Future for collections and FutureValue for single values.
In your case, you could separate the IDs of the list in memory ...
var optionIds = lineItem.LineItemChoices.Select(x => x.OptionId);
var choiceIds = lineItem.LineItemChoices.Select(x => x.ChoiceId);
... and execute two queries using Future<T> to get two lits in one hit over the database.
var optionVersions = _session.Query<OptionVersion>()
.Where(x => optionIds.Contains(x.Option.Id))
.OrderByDescending(x => x.OptionVersionNumber)
.Future<OptionVersion>();
var choiceVersions = _session.Query<ChoiceVersion>()
.Where(x => choiceIds.Contains(x.Choice.Id))
.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber)
.Future<ChoiceVersion>();
After with all you need in memory, you could loop on the original collection you have and search in memory the data to fill up the choice object.
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = optionVersions.OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault(x => x.Option.Id == choice.OptionId);
choice.ChoiceVersion = choiceVersions.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).FirstOrDefault(x => x.Choice.Id == choice.ChoiceId);
}

Rails ActiveRecord query to match all params

I need to have an ActiveRecord Postgres query that returns results which match all the parameters passed in through an array.
Some background: I have a User model, which has many Topics (through Specialties). I'm passing in the Topic ids as a string (Parameters: {"topics"=>"1,8,3"}) and then turning them into an array with .split(',') so I end up with topic_params = ["1","8","3"].
Now I'm trying to return all Users who have Topics that match/include all of those. After following the answer in this question, I managed to return Users who match ANY of the Topics with this:
#users = User.includes(:topics, :organization).where(:topics => {:id => topic_params})
But I need it to return results that match ALL. I'd also be open to better ways to accomplish this sort of task overall.

One way would be something like this
User.joins(:topics).where(topics: { id: [1, 2, 3] }).group('users.id').having('count(distinct topics.id) = 3')
Obviously I haven't your exact schema so you might have to tweak it a bit, but this is the basic setup.
Important is that the having clause counter must match the number of items you're matching with.

Rails 3 selecting only values

In rails 3, I would like to do the following:
SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id")
This works, but i get the following from the DB:
[{"some_other_connection_id":254},{"some_other_connection_id":315}]
Now, those id-s are the ones I need, but I am uncapable of making a query that only gives me the ids. I do not want to have to itterate over the resulst, only to get those numbers out. Are there any way for me to do this with something like :
SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id").values()
Or something of that nautre?
I have been trying with the ".select_values()" found at Git-hub, but it only returns "some_other_connection_id".
I am not an expert in rails, so this info might be helpful also:
The "SomeModel" is a connecting table, for a many-to-many relation in one of my other models. So, accually what I am trying to do is to, from the array of IDs, get all the entries from the other side of the connection. Basicly I have the source ids, and i want to get the data from the models with all the target ids. If there is a magic way of getting these without me having to do all the sql myself (with some help from active record) it would be really nice!
Thanks :)

Try pluck method
SomeModel.where(:some => condition).pluck("some_field")
it works like
SomeModel.where(:some => condition).select("some_field").map(&:some_field)

SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id").map &:some_other_connection_id
This is essentially a shorthand for:
results = SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id")
results.map {|row| row.some_other_connection_id}
Look at Array#map for details on map method.
Beware that there is no lazy loading here, as it iterates over the results, but it shouldn't be a problem, unless you want to add more constructs to you query or retrieve some associated objects(which should not be the case as you haven't got the ids for loading the associated objects).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas