I have data organized such way:
There are 1k of teachers, 10k of pupils, every pupil has ~100 homeworks.
I need get all homeworks of pupils, related to a teacher via classes, or by direct link between them. All vertices and edges have some attributes, and let's suppose all required indices are already built, or we can discuss them a bit later.
I can get all required pupils ids by such fast enough query:
$query1 = "FOR v1 IN 1..1 INBOUND #teacherId teacher_pupil FILTER v1.deleted == false RETURN DISTINCT v1._id";
$query2 = "FOR v2 IN 2..2 INBOUND #teacherId OUTBOUND teacher_class, INBOUND pupil_class FILTER v2.deleted == false RETURN DISTINCT v2._id";
$queryUnion = "FOR x IN UNION_DISTINCT (($query1), ($query2)) RETURN x";
Then I wrote the following:
$query = "
LET pupilIds = ($queryUnion)
FOR pupilId IN pupilIds
LET homeworks = (
FOR homework IN 1..1 ANY pupilId pupil_homework
return [homework._id, pupilId]
)
RETURN homeworks";
I got my homeworks, and I even can try filter them, but the query is too slow - that's an incorrect way, I believe.
Question 1 How can I do it without getting all Homeworks huge amount to memory at a time (LIMIT or whatever), sorting and filtering Homeworks by vertex' attributes fast and efficient? I'm sure limiting pupils, or pupil-related homeworks in the query/subquery's FOR leads to incorrect sorting/pagination.
I did another try with pure graph AQL query:
$query1 = "FOR v1 IN 2..2 INBOUND #teacherId pupil_teacher, OUTBOUND pupil_homework RETURN v1._id";
$query2 = "FOR v2 IN 3..3 INBOUND #teacherId teacher_class, pupil_class, OUTBOUND pupil_homework RETURN v2._id";
$query = "FOR x IN UNION_DISTINCT (($query1), ($query2)) LIMIT 500, 500 RETURN x";
It isn't much faster, and I don't know how filter Teacher vertices by attributes.
Question 2 What approach is the best for building such AQL queries, how can I access vertices of a graph filtering all path's parts by attributes? Can I paginate the result to save memory and speedup the query? How can I speed up it at all?
Thank you!
Assuming teacher and pupil are related to each other via classes(2 outbound links) or directly(single outbound link) and with no other way you can do something like this
FOR v IN 1..2 OUTBOUND "teacher_id" GRAPH "graph_name"
FILTER LIKE(v._id, "pupil_collection_name/%")
FOR homeworks IN 1 OUTBOUND v GRAPH "graph_name"
LIMIT lowerLimit,numberOfItems
RETURN homeworks
But if there is a possibility that a teacher and pupil can be related to each other with something other than a class we would have to filter our query with respect to the edge we are seeing as well
FOR v IN 1..2 OUTBOUND "teacher_id" GRAPH "graph_name"
FILTER LIKE(v._id, "pupil_collection_name/%") && (e.name == "ClassPupil" || e.name == "TeacherPupil")
FOR homeworks IN 1 OUTBOUND v GRAPH "graph_name"
LIMIT lowerLimit,numberOfItems
RETURN homeworks
Note that since same teacher can be related to a pupil directly as well as via a class, we can have non unique homeworks . Hence using a RETURN DISTINCT homeworks is suggested. But if duplications are not a problem, the above query should work
Related
I have 3 Models: Offer, Request and Assignment. Assignment makes a connection between Request and Offer. Now I want to do this:
select *
from offer as a
where places > (
select count(*)
from assignment
where offer_id = a.id and
to_date > "2014-07-07");
I am not quiet sure how to achieve this with a django QuerySet... Any tips?
Edit: The query above is just an example, how the query in general should look like. The django model looks like this:
class Offer(models.Model):
...
places = models.IntegerField()
...
class Request(models.Model):
...
class Assignment(models.Model):
from_date = models.DateField()
to_data = models.DateField()
request = models.ForeignKey("Request",related_name="assignments")
offer = models.ForeignKey("Offer",related_name="assignments")
People now can create a offer with a given amount of places or a request. The admin then will connect a request with an offer for a given time. This is saved as an assignment. The query above should give me a list of offers, which have still places left. Therefore I want to count the number of valid assignments for a given offer to compare it with its number of places. This list should be used to find a possible offer for a given request to create a new assignment.
I hope this describes the problem better.
Unfortunately related subqueries aren't directly supported by ORM operations. Usage of .extra(where=...) should be possible in this case.
To get the same results without using a subquery something like the following should work:
Offer.objects.filter(
assignment__to_date__gt=thedate
).annotate(
assignment_cnt=Count('assignment')
).filter(
assignment_cnt__lte=F('places')
)
The exact query depends on the model definitions.
query = '''select *
from yourapp_offer as a
where places > (
select count(*)
from yourapp_assignment
where offer_id = a.id and
to_date > "2014-07-07");'''
offers = Offer.objects.raw(query):
https://docs.djangoproject.com/en/1.6/topics/db/sql/
I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.
This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.
How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.
This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id
Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end
Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!
Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.
Sorry for the naff title, but I'm not really sure how to explain this, I'm one of the new generation whose SQL skills have degraded thanks to the active record patterns!
Basically I have three tables in PostgreSQL
Client (One Client has many maps)
- id
Maps (Map has one client and many layers)
- id
- client_id
Layer (Layer has one map)
- id
- map_id
I would like to write an SQL query that returns Cliend.id along with a count of how many maps that client has and the total number of layers the client has across all maps.
Is this possible with a single query? Speed isn't of concern as this is just for analytical purposes so will be run infrequently.
I'd use a pair of subqueries for this. Something like:
SELECT
id,
(
SELECT count(map.id)
FROM map
WHERE map.client_id = client.id
) AS n_maps,
(
SELECT count(layer.id)
FROM map INNER JOIN layer ON (layer.map_id = map.id)
WHERE map.client_id = client.id
) AS n_layers
FROM client;
I'd do it like this, a single SQL Query inside a method in the Client model:
def self.statistics
Client.select("
clients.id AS client_id,
COUNT(DISTINCT(maps.id)) AS total_maps,
COUNT(layers.id) AS total_layers")
.joins(maps: :layers)
.group("clients.id")
end
In order for this to work, you need the associations declared between your models (Client has_many :maps, Map has_many :layers)
You can go depper in the ActiveRecord's query interface here
I have a database table called Event which in CakePHP has its relationships coded to like so:
var $belongsTo = array('Sport');
var $hasOne = array('Result', 'Point', 'Writeup', 'Timetable', 'Photo');
Now am doing a query and only want to pull out Sport, Point, and Timetable
Which would result in me retrieving Sports, Events, Points, and Timetable.
Reason for not pulling everything is due the results having 17000+ rows.
Is there a way to only select those tables using:
$this->Event->find('all');
I have had a look at the API but can't see how its done.
You should set recursive to -1 in your app_model and only pull the things you require. never use recursive of 2 and http://book.cakephp.org/view/1323/Containable is awesome.
just $this->Event->find('all', array('contain' => array()));
if you do the trick of recursive as -1 in app_model, this is not needed, if would just be find('all') like you have
I have a simple test object model in which there are schools, and a school has a collection of students.
I would like to retrieve a school and all its students who are above a certain age.
I carry out the following query, which obtains a given school and the children which are above a certain age:
public School GetSchoolAndStudentsWithDOBAbove(int schoolid, DateTime dob)
{
var school = this.Session.CreateCriteria(typeof(School))
.CreateAlias("Students", "students")
.Add(Expression.And(Expression.Eq("SchoolId", schoolid), Expression.Gt("students.DOB", dob)))
.UniqueResult<School>();
return school;
}
This all works fine and I can see the query going to the database and returning the expected number of rows.
However, when I carry out either of the following, it gives me the total number of students in the given school (regardless of the preceding request) by running another query:
foreach (Student st in s.Students)
{
Console.WriteLine(st.FirstName);
}
Assert.AreEqual(s.Students.Count, 3);
Can anyone explain why?
You made your query on the School class and you restricted your results on it, not on the mapped related objects.
Now there are many ways to do this.
You can make a static filter as IanL said, however its not really flexible.
You can just iterate the collection like mxmissile but that is ugly and slow (especially considering lazy loading considerations)
I would provide 2 different solutions:
In the first you maintain the query you have and you fire a dynamic filter on the collection (maintaining a lazy-loaded collection) and doing a round-trip to the database:
var school = GetSchoolAndStudentsWithDOBAbove(5, dob);
IQuery qDob = nhSession.CreateFilter(school.Students, "where DOB > :dob").SetDateTime("dob", dob);
IList<Student> dobedSchoolStudents = qDob.List<Student>();
In the second solution just fetch both the school and the students in one shot:
object result = nhSession.CreateQuery(
"select ss, st from School ss, Student st
where ss.Id = st.School.Id and ss.Id = :schId and st.DOB > :dob")
.SetInt32("schId", 5).SetDateTime("dob", dob).List();
ss is a School object and st is a Student collection.
And this can definitely be done using the criteria query you use now (using Projections)
Unfortunately s.Students will not contain your "queried" results. You will have to create a separate query for Students to reach your goal.
foreach(var st in s.Students.Where(x => x.DOB > dob))
Console.WriteLine(st.FirstName);
Warning: That will still make second trip to the db depending on your mapping, and it will still retrieve all students.
I'm not sure but you could possibly use Projections to do all this in one query, but I am by no means an expert on that.
You do have the option of filtering data. If it there is a single instance of the query mxmissle option would be the better choice.
Nhibernate Filter Documentation
Filters do have there uses, but depending on the version you are using there can be issues where filtered collections are not cached correctly.